PhD student, Tsinghua University
2 papers at NeurIPS 2025
We propose BOOM, a model-based RL that uses a soft value-weighted likelihood-free alignment loss to bootstrap the policy from non-parametric planner with world model, achieving state-of-the-art performance.
We propose MoGE, which enhances the Off-policy RL exploration by critical experiences generaion, leading to significant improvements in sample efficiency and performance ceilings across various tasks.