PhD student, Shanghai Jiaotong University
2 papers at NeurIPS 2025
In this paper, we decompose the reward value into prompt-free reward and prompt-related reward from a information-theoretic perspective, and use the former to guide reward training.
A unified and scalable RL framework applicable to online, offline, and offline-to-online settings.