LLM Serving

2 papers across 1 session

Poster Session 5

2 papers

Friday, December 5, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs

#805 Spotlight · Jinwoo Park, Seunggeun Cho, Dongsu Han

Enhancing cost efficiency in LLM serving through an edge-assisted speculative decoding framework.

PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries

#804 · Steven Kolawole, Keshav Santhanam, Virginia Smith, Pratiksha Thaker

We introduce ParallelPrompt, the first benchmark and dataset for studying intra-query parallelism in real-world LLM queries, enabling reproducible evaluation of structured execution strategies.