Undergrad student, Tsinghua University, Tsinghua University
1 paper at NeurIPS 2025
We pioneer training world models through reinforcement learning with verifiable rewards (RLVR), demonstrating substantial performance gains on both language- and video-based world models.