Full Professor, Beijing University of Post and Telecommunication
4 papers at NeurIPS 2025
This paper reveals that LVLMs in video anomaly detection rely on pre-trained statistical shortcuts instead of scene-aware reasoning.
We introduce a calibration-free 3D pose estimator that uses a discrete prior learned via VQ-VAE and integrates it through proposed discrete-continuous attention for robust, accurate prediction.
Exploration of rule-based reinforcement learning (RL) in MLLM post-training for perception policy learning.