1 paper across 1 session
We show that diffusion language models are a lot more sample-efficient than standard autoregressive language models, due to their ability to learn from different token orderings.