3 papers across 3 sessions
We propose a new algorithm that introduces guarantees for minimum user satisfaction rates in language model zoos while optimizing for operating cost, which can be practical for inference endpoint services.
We introduce a lightweight, post-hoc routing framework, with provable guarantees, that safely delegates between language models with competing objectives.