2 papers across 2 sessions
Our structured dataset allows us to analyze how model vision compares to human perception and to determine whether VLMs perform similar visual reasoning algorithms as humans can.
A unified online framework for open-world 3D object extraction.