1 paper across 1 session
Enhancing cost efficiency in LLM serving through an edge-assisted speculative decoding framework.