3 papers across 3 sessions
This paper investigates what kind of R1-Zero-like training is suitable for grounding tasks in GUI agents.
We train RL agents directly from high-level specifications, without reward functions or domain-specific oracles.