Unstructured Pruning

1 paper across 1 session

Poster Session 3

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

MUSTAFAR: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference

#906 · Donghyeon Joo, Helya Hosseini, Ramyad Hadidi, Bahar Asgari

Our work, Mustafar, unlocks 70% sparsity in KV cache pruning by leveraging unstructured sparsity pattern, supported by a custom attention kernel, and boosts the inference efficiency of LLMs.