1 paper across 1 session
In this paper, we decompose the reward value into prompt-free reward and prompt-related reward from a information-theoretic perspective, and use the former to guide reward training.