Long context inference

2 papers across 2 sessions

Poster Session 4

1 paper

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Efficient Low Rank Attention for Long-Context Inference in Large Language Models

#3512 · Li Tenghui, Guoxu Zhou, Xuyang Zhao, Yuning Qiu, Qibin Zhao

Use lighweight low rank (q K) to help indexing offloaded KVCached

Poster Session 5

1 paper

Friday, December 5, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference

#3508 Spotlight · Weizhi Fei, Xueyan Niu, XIE GUOQING, Yingqing Liu, Bo Bai, Wei Han

We propose an efficient, training-free prompt compression method that retains key information within long inputs using the evaluator heads we identified in transformer-based LLMs.