PhD student, ShanghaiTech University
1 paper at NeurIPS 2025
A systematic multimodal RL framework that improves the policy exploration and advantage estimation.