1 paper across 1 session
We propose the Multi-Reward Optimization (MRO) approach, which enhances token correlation during the denoising process in diffusion language models, improving reasoning performance and sampling efficiency.