Assistant Professor, East China Normal University
2 papers at NeurIPS 2025
This paper presents ZeroTIR, revealing agent‑level RL scaling laws that tie training steps, code‑call frequency, response length, and accuracy, and surpassing ZeroRL and SFT baselines on challenging math benchmarks.
We propose a unified conformal prediction framework for infinite-horizon policy evaluation that seamlessly accommodates both on-policy and off-policy scenarios.