PhD student, Fudan University
4 papers at NeurIPS 2025
We propose Adaptive Reasoning Model (ARM), a reasoning model capable of adaptively selecting appropriate reasoning formats based on the task at hand.
We introduced Multi-agent KTO, a method that trains LLM to play Werewolf through direct gameplay. Our approach outperforms GPT-4o and RL+LLM methods, achieving human-competitive performance.