PhD student, University of Cambridge
1 paper at NeurIPS 2025
We derive and show the effectiveness of two DPO variants that explicitly model the possibility of declaring a tie in pair-wise comparisons.