Intern, University of North Carolina at Chapel Hill
3 papers at NeurIPS 2025
We propose SRPO, a reflection-aware RL method that significantly improves multimodal LLM reasoning by explicitly teaching self-reflection, outperforming state-of-the-art models on multiple benchmarks.
A novel semi-supervised learning paradigm that unifies view-wise co-training, meta-learned supervision, and adversarial perturbation through a structured triadic game.
ReAgent-V enables reward-driven, multi-agent video understanding with dynamic reflection and frame selection.