The AGI Company - NeurIPS 2025

🏛 The AGI Company

2 papers across 1 session

Poster Session 2

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Thinking vs. Doing: Improving Agent Reasoning by Scaling Test-Time Interaction

#515 · Junhong Shen, Hao Bai, Lunjun Zhang, Yifei Zhou, Amrith Setlur, Peter Tong, Diego Caples, Nan Jiang, Tong Zhang, Ameet Talwalkar, Aviral Kumar

We propose to scale the number of interaction steps for agents as a new axis of test-time scaling and develop a curriculum-based online RL algorithm for training agents to scale interaction.

REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites

#3602 · Div Garg, Diego Caples, Andis Draguns, Nikil Ravi, Pranav Putta, Naman Garg, Prannay Hebbar, Youngchul Joo, Jindong Gu, Charles London, Christian Schroeder de Witt, Sumeet Motwani

Ultra-realistic benchmark environments and evaluation framework for web agents