Researcher, Huawei Technologies Ltd.
1 paper at NeurIPS 2025
We derive and show the effectiveness of two DPO variants that explicitly model the possibility of declaring a tie in pair-wise comparisons.