1 paper across 1 session
We propose an adaptive multi-token unmasking sampler for masked language diffusion models, gaining speed up of 2-3x on code generation and math reasoning benchmarks without loss in accuracy.