5 papers across 3 sessions
We propose RF-Agent, an automated RL reward function design framework via language agent tree search.
An agentic pipeline for multi-turn synthetic data generation that produces high-quality training data for AI agents.