Dangerous Capability Evals - NeurIPS 2025

today local_bar

Dangerous Capability Evals

1 paper across 1 session

Poster Session 3

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

CTRL-ALT-DECEIT Sabotage Evaluations for Automated AI R&D

#1208 Spotlight · Francis Ward, Teun van der Weij, Hanna Gábor, Sam Martin, Raja Moreno, Harel Lidar, Louis Makower, Thomas Jodrell, Lauren Robson

We evaluate frontier LM agents' capabilities to sabotage and sandbag ML engineering tasks without being detected by automated monitors.