1 paper across 1 session
We propose a causal framework for video temporal grounding that mitigates confounding biases and improves robustness to linguistic variations and irrelevant queries.