Researcher, Alibaba Group
1 paper at NeurIPS 2025
Our empirically and theoretically informed method, which treats diversity as a reward, achieves new SOTA average performance across 7 benchmarks on SOTA LLMs with domain-undetermined data.