Assistant Professor, Sungkyunkwan University
1 paper at NeurIPS 2025
Reasoning Path Compression accelerates inference of reasoning LLMs by periodically compressing KV cache of generated tokens exploiting their semantic sparsity.