2 papers across 1 session
This paper introduces CERES that uses dual-modal causal intervention (backdoor adjustment for language bias and front-door adjustment with vision-depth fusion for visual bias) to achieve robust, SOTA egocentric referring video object segmentation.