Keqi Deng

PhD student, University of Cambridge

1 paper at NeurIPS 2025

OpenReview· Semantic Scholar· Google Scholar

Poster Session 4

1 paper

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Multi-head Temporal Latent Attention

#3509 · Keqi Deng, Phil Woodland

A novel method for compressing the attention Key-Value cache along the temporal dimension, greatly reducing inference-time GPU memory usage and improving decoding speed.