1 paper across 1 session
This paper presents ZeroTIR, revealing agent‑level RL scaling laws that tie training steps, code‑call frequency, response length, and accuracy, and surpassing ZeroRL and SFT baselines on challenging math benchmarks.