Researcher, Independent
1 paper at NeurIPS 2025
We introduce a novel method leveraging noise injection as a tool to elicit the latent capabilities of sandbagging LLMs.