1 paper across 1 session
We propose the first efficient, training-free online routing algorithm for high-volume LLM serving under token budget constraints, achieving significant improvements in both routing performance and cost efficiency.