PhD student, Nanyang Technological University
2 papers at NeurIPS 2025
We introduce an RL framework that unify the training of answer generation and verification in a single model.
We introduce a novel object selection mechanism to allow sim-trained policies to rapidly adapt to real-world visual perturbations.