Efficient LLM Inference - NeurIPS 2025

today local_bar

Efficient LLM Inference

3 papers across 2 sessions

Poster Session 3

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments

#5510 · Junyoung Park, Dalton Jones, Matthew Morse, Raghavv Goel, Mingu Lee, Christopher Lott

MUSTAFAR: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference

#906 · Donghyeon Joo, Helya Hosseini, Ramyad Hadidi, Bahar Asgari

Our work, Mustafar, unlocks 70% sparsity in KV cache pruning by leveraging unstructured sparsity pattern, supported by a custom attention kernel, and boosts the inference efficiency of LLMs.

Poster Session 6

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference

#3418 Spotlight · Yi Zhao, Yajuan Peng, Nguyen Cam-Tu, Zuchao Li, Xiaoliang Wang, Hai Zhao, Xiaoming Fu