3 papers across 3 sessions
We present Time-R1, a reinforcement learning-based post-training framework that achieves state-of-the-art performance across large vision language models for temporal video grounding.