Researcher, Physical Intelligence
3 papers at NeurIPS 2025
We introduce MJ-Bench-Video, a large-scale video preference dataset for comprehensively evaluating the reward models of text-to-video generation as well as a MoE-based video reward model
A novel benchmark using a comprehensive preference dataset to evaluate multimodal judges across multiple key perspectives
SutureBot: A benchmark and dataset for evaluating goal-conditioned VLAs on precision suturing.