3 papers across 3 sessions
We present a vision-centric token compression in LLM, inspired by human selective reading strategy.
We propose OmniGaze, a semi-supervised learning framework, which utilizes large-scale unlabeled data and reward-driven pseudo-labeling strategy to effectively generalize gaze estimation in the wild.