2 papers across 1 session
A Planning Representation and Paradigm Investigation of Vision-Language-Action Models
We introduce PhyBlock, a progressive benchmark evaluating large vision-language models on physical understanding and spatial planning via robotic 3D block assembly tasks.