3 papers across 3 sessions
This paper investigates what kind of R1-Zero-like training is suitable for grounding tasks in GUI agents.
We propose GUI-Actor, a VLM-based, coordinate-free GUI grounding method with an attention-based action head and verifier, achieving state-of-the-art results and strong generalization.