Principal Researcher, Microsoft
3 papers at NeurIPS 2025
We propose Point-RFT, a multimodal framework using visually grounded Chain-of-Thought reasoning with two-stage finetuning, which exhibits superior generalization capability and potentials in complex real-world scenarios.