exploration-exploitation trade-off

2 papers across 2 sessions

Poster Session 1

1 paper

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL

#3203 · Tong Yang, Bo Dai, Lin Xiao, Yuejie Chi

Poster Session 3

1 paper

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models

#306 · Jiajun Fan, Tong Wei, Chaoran Cheng, Yuxin Chen, Ge Liu

We propose ADRPO, a method that dynamically adjusts divergence regularization strength based on advantage estimates, enabling more effective fine-tuning of generative models by automatically balancing exploration and exploitation at the sample level.