3 papers across 2 sessions
This paper presents a layer-parallel speculation strategy for optimizing the effiency of multi-GPU utilization during the drafting stage of speculative decoding.