3 papers across 3 sessions
We propose Reward Reasoning Models, which leverage additional test-time compute for complex queries where appropriate rewards are not immediately apparent.
We propose an uncertainty-based routing framework that efficiently complements a fast RM with a strong but costly LLM judge.