PhD student, Shanghai Jiao Tong University
2 papers at NeurIPS 2025
Training a new reasoning paradigm of LLMs explicitly contains meta-thinking in a multi-agent and multi-turn setting with RL