Postdoc, University of Cambridge
1 paper at NeurIPS 2025
We provide theoretical analysis for forward and reverse KL-regularized RLHF under multiple reference models.