text summarisation

1 paper across 1 session

Poster Session 4

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

A novel method for compressing the attention Key-Value cache along the temporal dimension, greatly reducing inference-time GPU memory usage and improving decoding speed.