Assistant Professor, Tsinghua University
3 papers at NeurIPS 2025
We conduct an empirical study to evaluate the generalization benefits of reinforcement learning fine-tuning versus supervised fine-tuning for vision-language-action models and provide some findings and analyses.
This paper introduces a nove metric (REG) for evaluating the reasoning efficiency of LRMs and a reinforcement learning method (REO-RL) that significantly reduces reasoning redundancy while maintaining accuracy.