1 paper across 1 session
In this paper, we propose a novel Cascade Adaptive Self-Speculative Decoding (CAS-Spec) algorithm which constructs speculative draft models by leveraging dynamically switchable inference acceleration (DSIA) strategies