Researcher, Robotics and AI Institute
1 paper at NeurIPS 2025
We introduce ROVER, a recursive framework that improves the video reasoning accuracy and efficiency of vision-language models in embodied settings.