2 papers across 1 session
We propose to scale the number of interaction steps for agents as a new axis of test-time scaling and develop a curriculum-based online RL algorithm for training agents to scale interaction.
Ultra-realistic benchmark environments and evaluation framework for web agents