1 paper across 1 session
TempSamp-R1 is a reinforcement fine-tuning framework that integrates off-policy supervision, soft advantage shaping, and hybrid Chain-of-Thought training to enhance the temporal grounding capabilities of MLLMs.