PhD student, University of Wisconsin - Madison
1 paper at NeurIPS 2025
We propose the first efficient, training-free online routing algorithm for high-volume LLM serving under token budget constraints, achieving significant improvements in both routing performance and cost efficiency.