Professor, National University of Singapore
1 paper at NeurIPS 2025
We reveal that reasoning-oriented RL fine-tuning significantly increases the prevalence of hallucinations and propose a FSPO approach via reinforcement learning.