PhD student, Shanghai Jiaotong University
1 paper at NeurIPS 2025
Training a new reasoning paradigm of LLMs explicitly contains meta-thinking in a multi-agent and multi-turn setting with RL