Assistant Professor, New York University
3 papers at NeurIPS 2025
We generalize CLIP training to worldwide web-scale, with +0.8% better than English only counterpart on zero-shot ImageNet classification (no compromise), SoTA on zero-shot multilingual: 57.4% on CVQA and 50.2% on Babel-ImageNet.
A framework that dynamically adjusts computational resources for robot controllers based on real-time task difficulty, reducing computation time by 2.6-4.4× while maintaining success rates, using the Stochastic Interpolant (SI) framework.