4 papers across 3 sessions
We present Time-R1, a reinforcement learning-based post-training framework that achieves state-of-the-art performance across large vision language models for temporal video grounding.
AgMMU is a challenging real‑world benchmark for evaluating and advancing vision-language models (VLMs) in the knowledge‑intensive domain of agriculture.
We introduce GUI Exploration Lab, a flexible simulator for GUI agent navigation. Experiments show a staged SFT + RL approach (especially multi-turn RL) significantly boosts navigation and exploration capabilities.