Associate Professor, Shanghai Jiao Tong University
1 paper at NeurIPS 2025
In this paper, we propose a novel Cascade Adaptive Self-Speculative Decoding (CAS-Spec) algorithm which constructs speculative draft models by leveraging dynamically switchable inference acceleration (DSIA) strategies