3 papers across 2 sessions
We propose unified latent diffusion framework for simultaneous music generation, source imputation, and query-driven arbitrary source extraction.
LeVo generates high-quality songs that closely follow instruction by pairing paralled mixed and dual-track token prediction with three-stage training and DPO-based multi-preference alignment.
This paper presents SongBloom, the first unified autoregressive diffusion model for long-form song generation, achieving state-of-the-art performance compared to both commercial and non-commercial methods.