3 papers across 3 sessions
It proposes a new learned eviction algorithm that predicts the conversation continuation probability to guide LLM prefix cache eviction.
Theoretical analysis of scheduling algorithms for LLM queries with latency constraints when using RadixAttention along with a novel scheduling algorithm.