Piotr Nawrot

PhD student, University of Edinburgh

1 paper at NeurIPS 2025

Homepage· OpenReview· Semantic Scholar· Google Scholar

Poster Session 3

1 paper

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Inference-Time Hyper-Scaling with KV Cache Compression

#3418 · Adrian Łańcucki, Konrad Staniszewski, Piotr Nawrot, Edoardo Ponti

Inference-time hyper-scaling uses key–value cache compression with Delayed Memory Sparsification (DMS) to boost Transformer LLM reasoning accuracy for equivalent compute or memory costs.