Assistant Professor, Renmin University of China
3 papers at NeurIPS 2025
A versatile data mixture ratio optimization framework for LLM training that enjoy both theoretical and practical advantages.
In this paper, we propose the continuous SGD/SVRG flow of minimizing KL divergence on Wasserstein space
We derive high probability excess risk bounds to at most $\tilde{O}(1/n^2)$ for ERM, GD and SGD and our high probability results on the generalization error of gradients for nonconvex problems are also the sharpest.