Breaking AR’s Sampling Bottleneck: Provable Acceleration via Diffusion Language Models

Chinese University of Hong Kong· University of Michigan

diffusion model large language model (LLM)iteration complexity mutual information

Abstract

Diffusion models have emerged as a powerful paradigm for modern generative modeling, demonstrating strong potential for large language models (LLMs). Unlike conventional autoregressive (AR) models that generate tokens sequentially, diffusion models allow for parallel sampling, offering a promising path to accelerate generation and eliminate the left-to-right generation constraints. Despite their empirical success, theoretical understandings of diffusion language models remain underdeveloped.

In this work, we develop convergence guarantees for diffusion language models from an information-theoretic perspective. Our analysis demonstrates that the sampling error, measured by the Kullback-Leibler (KL) divergence, decays inversely with the number of iterations

T

and scales linearly with the mutual information between tokens in the target text sequence. Crucially, our theory covers the regime

T