Senior Lecturer, Monash University
2 papers at NeurIPS 2025
We propose a training-free test-time adaptation method that significantly improves zero-shot skeleton action recognition by using a training-free cache model during inference time.
We introduce a new triple-modality MLLM TriSense, which achieves comprehensive understanding of video moments by adaptively integrating visual, audio, and speech information. To support this, we propose a newly constructed dataset TriSense-2M.