1 paper across 1 session
We propose BlockDecoder, a novel ASR decoder architecture that separates textual context building from audio-text integration, achieving a ~2x speed-up over traditional decoders without performance degradation across datasets, languages and tasks.