Researcher, Lionrock AI Lab
1 paper at NeurIPS 2025
Colocating online and offline LLM inference requests in the same inference engine.