1 paper across 1 session
Finding that today's LMMs poorly grasp the arrow of time in video, we propose ArrowRL to enhance their temporal perception and AoTBench for rigorous evaluation.