MS student, National Yang Ming Chiao Tung University
1 paper at NeurIPS 2025
We propose a lossless and training-free speculative decoding method to accelerate LLMs that requires offloading on a single memory-limited cosumer GPU.