MS student, Imperial College London
1 paper at NeurIPS 2025
We evaluate frontier LM agents' capabilities to sabotage and sandbag ML engineering tasks without being detected by automated monitors.