3 papers across 2 sessions
The paper proposes a principled reward design framework for training LLMs on tool use via reinforcement learning, leading to significant gains over SFT and baseline models in generalization and performance.
We propose RF-Agent, an automated RL reward function design framework via language agent tree search.
We construct a Progress Reward Model with convergence guarantee for Reinforcement Learning via Large Language Models.