Postdoc, University of Edinburgh
1 paper at NeurIPS 2025
The paper proposes a principled reward design framework for training LLMs on tool use via reinforcement learning, leading to significant gains over SFT and baseline models in generalization and performance.