Full Professor, Nanjing University
4 papers at NeurIPS 2025
We propose a token- and task-wise MoE structure to in-context RL models, harnessing architectural advances of MoE to unleash RL's in-context learning capacity.
We learn offline meta-policies from natural language supervision with contrastive language-decision pre-training, aligning text embeddings to comprehend environment dynamics.
We propose a novel system identification framework for sim-2-real transfer in reinforcement learning that combines Diffusion Evolution with Adversarial Learning (DEAL) to iteratively infer physical parameters with limited real-world data.
A novel value decomposition framework of Continued Fraction Q-Learning (QCoFr) is proposed to model rich cooperation for multi-agent reinforcement learning without combinatorial explosion.