2 papers across 1 session
We propose InfiFPO, a novel model fusion method for preference alignment that integrates multi-source probability information to enhance LLM performance, outperforming existing approaches across 11 benchmarks.