2 papers across 2 sessions
This paper presents ZeroTIR, revealing agent‑level RL scaling laws that tie training steps, code‑call frequency, response length, and accuracy, and surpassing ZeroRL and SFT baselines on challenging math benchmarks.