Postdoc, City University of Hong Kong
2 papers at NeurIPS 2025
We reveal that reasoning-oriented RL fine-tuning significantly increases the prevalence of hallucinations and propose a FSPO approach via reinforcement learning.