4 papers across 2 sessions
Adaptive Branching MCTS, a novel inference-time framework for LLMs, generalizes repeated sampling with multi-turn exploration and exploitation.
We exactly characterize the expressive power of transformers with padding tokens as $\mathsf{TC}^0$, and we also characterize transformers with looping and padding.
We show that limiting a model's confidence during training can improve test-time scaling in mathematical reasoning.