1 paper across 1 session
We present that off-the-shelf PRMs are often poorly calibrated. To this end, we introduce a quantile-regression calibration that aligns their outputs with success probabilities. We show calibration unlocks instance-adaptive inference-time scaling.