PhD student, Hong Kong Polytechnic University
2 papers at NeurIPS 2025
We propose Point-RFT, a multimodal framework using visually grounded Chain-of-Thought reasoning with two-stage finetuning, which exhibits superior generalization capability and potentials in complex real-world scenarios.