4 papers across 3 sessions
We propose Ensemble++, a scalable framework that achieves the low regret of Thompson Sampling using a tiny, computationally-efficient ensemble, making it practical for large-scale models.
This paper introduce CoRT, a post-training framework for teaching large reasoning LLMs to leverage CI effectively and efficiently.