Postdoc, University of California, Berkeley
2 papers at NeurIPS 2025
We introduce a diffusion-based video model that predicts egocentric futures from full-body 3D motion, enabling realistic and controllable first-person simulation.
The way we rasterize images to 1D sequences to feed into long sequence models is sub-optimal! We show that orders other than row major can be better, and provide an RL method to learn the optimal ordering.