PhD student, Korea Advanced Institute of Science and Technology
1 paper at NeurIPS 2025
We propose BPO; a generalized DPO objective based on Bregman divergence from the perspective of likelihood ratio estimation.