PhD student, School of Computer Science, Carnegie Mellon University
1 paper at NeurIPS 2025
We show that diffusion language models are a lot more sample-efficient than standard autoregressive language models, due to their ability to learn from different token orderings.