Associate Professor, Chinese University of Hong Kong, Shenzhen
2 papers at NeurIPS 2025
A novel speech tokenizer with an end-to-end diffusion autoencoder and text-aware decoding, operating at 6.25 Hz and 0.0875 kbps
We propose a foundation model for unified speech generation with masked generative pre-training.