PhD student, University of Chicago
1 paper at NeurIPS 2025
We propose ShorterBetter, a reinforcement learning method that trains reasoning models to generate concise yet accurate Chain-of-Thought traces by rewarding the shortest correct response among sampled outputs.