Postdoc, EPFL - EPF Lausanne
3 papers at NeurIPS 2025
Based on a newly discovered "free lunch" in voxel labels, VoxDet reformulates 3D semantic scene completion as dense object detection using a VoxNT trick.
Training- and GPU-free Spatial Prompting for Multimodal Large Language Models
We introduce IR3D-Bench, a benchmark that challenges vision-language models to demonstrate real scene understanding by recreating 3D structures from images using tools, not just describing them.