4 papers across 2 sessions
Adaptive Branching MCTS, a novel inference-time framework for LLMs, generalizes repeated sampling with multi-turn exploration and exploitation.
Neurons in brains use timing and synchronization in the way that they compute, so we built a model that does the same.
We introduce a new class of Reinforcement Learned Teachers trained to provide effective reasoning traces for downstream distillation, yielding more effective data for distillation and cold-starting than orders of magnitude larger reasoning LMs.
We introduce ALE-bench, a new benchmark for evaluating AI systems on score-based algorithmic programming contests.