1 paper across 1 session
We propose a foundation model for unified speech generation with masked generative pre-training.