2 papers across 2 sessions
Reasoning Path Compression accelerates inference of reasoning LLMs by periodically compressing KV cache of generated tokens exploiting their semantic sparsity.
A framework that dynamically adjusts computational resources for robot controllers based on real-time task difficulty, reducing computation time by 2.6-4.4× while maintaining success rates, using the Stochastic Interpolant (SI) framework.