Undergrad student, Tsinghua University
2 papers at NeurIPS 2025
We propose Reward Reasoning Models, which leverage additional test-time compute for complex queries where appropriate rewards are not immediately apparent.