1 paper across 1 session
We propose GUI-Actor, a VLM-based, coordinate-free GUI grounding method with an attention-based action head and verifier, achieving state-of-the-art results and strong generalization.