1 paper across 1 session
We derive and show the effectiveness of two DPO variants that explicitly model the possibility of declaring a tie in pair-wise comparisons.