PhD student, Beihang University
1 paper at NeurIPS 2025
We introduce DeceptionBench, the first comprehensive benchmark evaluating deceptive behaviors in LLMs across real-world scenarios, revealing critical vulnerabilities especially under reinforcement dynamics.