Postdoc, Princeton University
1 paper at NeurIPS 2025
ViGoRL is a vision-language model trained with reinforcement learning to ground each reasoning step in image coordinates, improving performance on spatial and web-based reasoning tasks through better attention and visual verification.