Full Professor, Tsinghua University, Tsinghua University
2 papers at NeurIPS 2025
We conduct an empirical study to evaluate the generalization benefits of reinforcement learning fine-tuning versus supervised fine-tuning for vision-language-action models and provide some findings and analyses.
This paper proposed a LLM-based plug-in, which is compatible with various RL algorithms, that enhances the efficiency of policy exploration in RL training.