PhD student, Renmin University of China
2 papers at NeurIPS 2025
We present a theoretical framework that interprets masked diffusion models (MDMs) as solutions to energy minimization problems and an efficient post-training schedule tuning method without model modification.
We present LLaDA, a diffusion language model trained from scratch that is competitive to LLaMA 3 in performance.