Full Professor, Hong Kong University of Science and Technology
1 paper at NeurIPS 2025
In this paper, we propose a novel Cascade Adaptive Self-Speculative Decoding (CAS-Spec) algorithm which constructs speculative draft models by leveraging dynamically switchable inference acceleration (DSIA) strategies