1 paper across 1 session
We propose Router-R1, an RL-based framework that interleaves multi-round reasoning with dynamic LLM selection, supports zero-shot integration of new models, and optimizes performance-cost trade-offs