2 papers across 2 sessions
A novel speech tokenizer with an end-to-end diffusion autoencoder and text-aware decoding, operating at 6.25 Hz and 0.0875 kbps