Associate Professor, Carnegie Mellon University
4 papers at NeurIPS 2025
PartCrafter is a structured 3D generative model that jointly generates multiple parts and objects from a single RGB image in one shot.
We show that diffusion language models are a lot more sample-efficient than standard autoregressive language models, due to their ability to learn from different token orderings.
Long-term feed-forward 3D point tracking in persistent 3D point maps.
ViGoRL is a vision-language model trained with reinforcement learning to ground each reasoning step in image coordinates, improving performance on spatial and web-based reasoning tasks through better attention and visual verification.