Machine Learning Systems

3 papers across 2 sessions

Poster Session 4

1 paper

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location

#806 · Ting Sun, Penghan Wang, Fan Lai

Colocating online and offline LLM inference requests in the same inference engine.

Poster Session 5

2 papers

Friday, December 5, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity

#3513 · Susav Shrestha, Bradley Settlemyer, Nikoli Dryden, Narasimha Reddy

Polar Sparsity scales contextual sparsity to large batches by exploiting stable attention head sparsity and using efficient GPU kernels, achieving up to 2.2× speedups with minimal accuracy loss.

SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs

#805 Spotlight · Jinwoo Park, Seunggeun Cho, Dongsu Han

Enhancing cost efficiency in LLM serving through an edge-assisted speculative decoding framework.