today local_bar

Raja Mehta Moreno

MS student, Imperial College London

1 paper at NeurIPS 2025

OpenReview· Semantic Scholar· Google Scholar

Poster Session 3

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

CTRL-ALT-DECEIT Sabotage Evaluations for Automated AI R&D

#1208 Spotlight · Francis Rhys Ward, Teun van der Weij, Hanna Gábor, Sam Martin, Raja Mehta Moreno, Harel Lidar, Louis Makower, Thomas Jodrell, Lauren Robson

We evaluate frontier LM agents' capabilities to sabotage and sandbag ML engineering tasks without being detected by automated monitors.