RL theory - NeurIPS 2025

RL theory

5 papers across 3 sessions

Poster Session 1

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

To Distill or Decide? Understanding the Algorithmic Trade-off in Partially Observable RL

#510 Spotlight · Yuda Song, Dhruv Rohatgi, Aarti Singh, J. Bagnell

Through theoretical models and empirical testbeds, we characterize the algorithmic tradeoff between privileged expert distillation and RL, and better options for expert distillation.

Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL

#3203 · Tong Yang, Bo Dai, Lin Xiao, Yuejie Chi

Poster Session 2

1 paper

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Avoiding exp(R) scaling in RLHF through Preference-based Exploration

#3318 · Mingyu Chen, Yiding Chen, Wen Sun, Xuezhou Zhang

We introduce a new online RLHF algorithm that for the first time achieves a sample complexity that scales polynomially with the reward scale.

Poster Session 4

2 papers

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL

#3314 · Matthew Zurek, Guy Zamir, Yudong Chen

On the Entropy Calibration of Language Models

#3318 · Steven Cao, Gregory Valiant, Percy Liang