PhD student, Shanghai Jiao Tong University
1 paper at NeurIPS 2025
This paper introduces a simple and scalable semi-off-policy reinforcement learning method, i.e., SOPHIA, to enhance LVLMs’ ability to perform visual slow-thinking reasoning.