2 papers across 2 sessions
We introduce a novel inference-time steering methodology called Reinforcing Cognitive Experts (RICE), designed to improve reasoning depth and efficiency without additional training or complex heuristics.
This paper answers the question: "under what conditions will model-free reinforcement learning give rise to thinking as a strategy for reward maximization?"