1 paper across 1 session
This paper presents a layer-parallel speculation strategy for optimizing the effiency of multi-GPU utilization during the drafting stage of speculative decoding.