Principal Researcher, ELLIS Institute Tübingen
1 paper at NeurIPS 2025
We introduce a benchmark to measure safety of general computer use agents across diverse categories of harm