1 paper across 1 session
We propose a Cascaded multi-LLM framework with deferral and abstention policies and online learning to balance accuracy, cost, and abstention, delegating tasks across models and humans to improve efficiency in QA tasks.