Postdoc, Facebook
1 paper at NeurIPS 2025
In the context of off-policy RL, we give a theoretical analysis of the role of an additive reward correction in improving performance, accompanied by experiments on bandits and LLM posttraining.