2 papers across 1 session
Enhancing cost efficiency in LLM serving through an edge-assisted speculative decoding framework.
We introduce ParallelPrompt, the first benchmark and dataset for studying intra-query parallelism in real-world LLM queries, enabling reproducible evaluation of structured execution strategies.