logo
today local_bar
Poster Session 4 · Thursday, December 4, 2025 4:30 PM → 7:30 PM
#4000 Spotlight

Cost-Aware Contrastive Routing for LLMs

NeurIPS Project Page OpenReview

Abstract

We study cost-aware routing for large language models across diverse and dynamic pools of models. Existing approaches often overlook prompt-specific context, rely on expensive model profiling, assume a fixed set of experts, or use inefficient trial-and-error strategies.
We introduce Cost-Spectrum Contrastive Routing (CSCR), a lightweight framework that maps both prompts and models into a shared embedding space to enable fast, cost-sensitive selection.
CSCR uses compact, fast-to-compute logit footprints for open-source models and perplexity fingerprints for black-box APIs. A contrastive encoder is trained to favor the cheapest accurate expert within adaptive cost bands.
At inference time, routing reduces to a single ‑NN lookup via a FAISS index, requiring no retraining when the expert pool changes and enabling microsecond latency. Across multiple benchmarks, CSCR consistently outperforms baselines, improving the accuracy–cost tradeoff by up to 25\%, while generalizing robustly to unseen LLMs and out-of-distribution prompts.