Principal Researcher, Research, Microsoft
3 papers at NeurIPS 2025
A hybrid architecture with linear pre-filling complexity and up-to10x higher throughput on decoding.
We propose GUI-Actor, a VLM-based, coordinate-free GUI grounding method with an attention-based action head and verifier, achieving state-of-the-art results and strong generalization.
We only need one example for RLVR on LLMs to achieve significant improvement on math tasks