token eviction

2 papers across 1 session

Poster Session 3

2 papers

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments

#5510 · Junyoung Park, Dalton Jones, Matthew Morse, Raghavv Goel, Mingu Lee, Christopher Lott

Inference-Time Hyper-Scaling with KV Cache Compression

#3418 · Adrian Łańcucki, Konrad Staniszewski, Piotr Nawrot, Edoardo Maria Ponti

Inference-time hyper-scaling uses key–value cache compression with Delayed Memory Sparsification (DMS) to boost Transformer LLM reasoning accuracy for equivalent compute or memory costs.