Researcher, School of Computer Science, Carnegie Mellon University
1 paper at NeurIPS 2025
We build a training recipe called TARS using reinforcement learning that teaches models to reason about safety using chain-of-thought traces and a reward signal that balances safety with task completion to improve safety and reduce refusal.