PhD student, Institute of Software, Chinese Academy of Sciences
1 paper at NeurIPS 2025
This paper presents a layer-parallel speculation strategy for optimizing the effiency of multi-GPU utilization during the drafting stage of speculative decoding.