Professor, Electrical Engineering & Computer Science Department
4 papers at NeurIPS 2025
We introduce a diffusion-based video model that predicts egocentric futures from full-body 3D motion, enabling realistic and controllable first-person simulation.
A new multi-modal dataset and an initial benchmark model for Geo-spatial Artificial Intelligence
The way we rasterize images to 1D sequences to feed into long sequence models is sub-optimal! We show that orders other than row major can be better, and provide an RL method to learn the optimal ordering.
We introduce REVERSE, the first framework to integrate generation adjustment with online post-hoc verification within a single VLM architecture. REVERSE detects, backtracks, and corrects hallucinations during the decoding process.