Undergrad student, SUN YAT-SEN UNIVERSITY
4 papers at NeurIPS 2025
VLMs lack deep causal reasoning. CF-VLM, via counterfactuals & novel training, boosts causal logic. It surpasses SOTA in reasoning/generalization, cuts hallucinations, aids real-world VLM use.
We propose GAM-Agent, a game-theoretic multi-agent framework where visual and logic agents debate via structured communication and uncertainty control, boosting VLM performance, robustness, and interpretability. It is modular, scalable, and general.
Tri-MARF, a novel tri-modal multi-agent framework, integrates 2D images, text, and 3D point clouds with specialized agents to enhance 3D object annotation, achieving superior accuracy, retrieval, and throughput.