Preference Optimisation

1 paper across 1 session

Poster Session 2

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Meta-Learning Objectives for Preference Optimization

#212 · Carlo Alfano, Silvia Sapora, Jakob Foerster, Patrick Rebeschini, Yee Whye Teh

Using a new suite of MuJoCo tasks for systematic evaluation, we develop specialized mirror descent-based preference optimization algorithms that outperform existing methods in both MuJoCo and LLM alignment tasks.