1 paper across 1 session
We introduce a Particle Filtering approach to Inference Scaling that is robust against inherently imperfect reward models and performs significantly better than previous methods.