1 paper across 1 session
We show that limiting a model's confidence during training can improve test-time scaling in mathematical reasoning.