Research Scientist, Facebook
3 papers at NeurIPS 2025
A benchmark with realistic security scenarios for web agents based on LLMs
A new LLM jailbreak objective that enables more nuanced control over jailbroken responses, exploits undergeneralization of safety alignment, and improves success rates of existing jailbreaks from 14% to 80%.
introducing a novel privacy benchmark for AI agents that evaluates their adherence to the data minimization principle on full-stack end-to-end environment.