PhD student, University of Oxford
1 paper at NeurIPS 2025
Using a new suite of MuJoCo tasks for systematic evaluation, we develop specialized mirror descent-based preference optimization algorithms that outperform existing methods in both MuJoCo and LLM alignment tasks.