Principal Researcher, Apple
3 papers at NeurIPS 2025
We introduce UniGen, a unified multimodal large language model capable of image understanding and generation.
3d object detection (with or without known camera pose) can be considered mapping and localization with objects as 3D oriented boxes.
We present StreamBridge, a framework that transforms offline Video-LLMs into streaming models.