Principal Researcher, Apple
3 papers at NeurIPS 2025
We propose a text-aligned visual representation to unify both visual understanding and generation within a single MLLM.
This work investigates whether MLLMs can be used as unified evaluator for AI-generated videos and proposes a benchmark to assess this ability.
Real-time interactive streaming video generation. 24fps up to a minute long.