1 paper across 1 session
Ultra-realistic benchmark environments and evaluation framework for web agents