2 papers across 1 session
Inference-time hyper-scaling uses key–value cache compression with Delayed Memory Sparsification (DMS) to boost Transformer LLM reasoning accuracy for equivalent compute or memory costs.