1 paper across 1 session
The paper proposes a principled reward design framework for training LLMs on tool use via reinforcement learning, leading to significant gains over SFT and baseline models in generalization and performance.