REINFORCE - NeurIPS 2025

REINFORCE

3 papers across 3 sessions

Poster Session 4

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Policy Gradient Methods Converge Globally in Imperfect-Information Extensive-Form Games

#304 · Fivos Kalogiannis, Gabriele Farina

We contribute provable guarantees that regularized policy gradient methods converge in approximate Nash equilibria in imperfect-information extensive-form zero-sum games.

Poster Session 5

1 paper

Friday, December 5, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Tapered Off-Policy REINFORCE - Stable and efficient reinforcement learning for large language models

#206 · Nicolas Le Roux, Marc Bellemare, Jonathan Lebensold, Arnaud Bergeron, Joshua Greaves, Alexandre Fréchette, Carolyne Pelletier, Eric Thibodeau-Laufer, Sándor Tóth, Sam Work

A simple general purpose off-policy REINFORCE method which outperforms PPO, DPO and STaR on recent benchmarks.

Poster Session 6

1 paper

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Amortized Active Generation of Pareto Sets

#911 · Daniel Steinberg, Asiri Wijesinghe, Rafael Oliveira, Piotr Koniusz, Cheng Soon Ong, Edwin Bonilla

We learn a generative model of the Pareto set that can be conditioned on subjective preferences, without retraining, for online multi-objective optimization tasks on discrete/mixed spaces.