PhD student, University of California Berkeley
1 paper at NeurIPS 2025
We develop a CLIP model that is SotA on both image and video zero-shot recognition. Using its strong, general features we further create SotA encoders for language and spatial tasks.