Intern, Tencent Inc.
1 paper at NeurIPS 2025
We propose SoPo, a semi-online preference optimization method, combining the strengths of online and offline direct preference optimization to overcome their individual shortcomings.