Poster Session 4 · Thursday, December 4, 2025 4:30 PM → 7:30 PM
#1505
BeliefMapNav: 3D Voxel-Based Belief Map for Zero-Shot Object Navigation
Abstract
Zero-shot object navigation (ZSON) allows robots to find target objects in unfamiliar environments using natural language instructions, without relying on pre-built maps or task-specific training. Recent general-purpose models, such as large language models (LLMs) and vision-language models (VLMs), equip agents with semantic reasoning abilities to estimate target object locations in a zero-shot manner. However, these models often greedily select the next goal without maintaining a global understanding of the environment and are fundamentally limited in the spatial reasoning necessary for effective navigation.
To overcome these limitations, we propose a novel 3D voxel-based belief map that estimates the target’s prior presence distribution within a voxelized 3D space. This approach enables agents to integrate semantic priors from LLMs and visual embeddings with hierarchical spatial structure, alongside real-time observations, to build a comprehensive 3D global posterior belief of the target’s location.
Building on this 3D voxel map, we introduce BeliefMapNav, an efficient navigation system with two key advantages:
- grounding LLM semantic reasoning within the 3D hierarchical semantics voxel space for precise target position estimation,
- integrating sequential path planning to enable efficient global navigation decisions.
Experiments on HM3D and HSSD benchmarks show that BeliefMapNav achieves state-of-the-art (SOTA) Success Rate (SR) and Success weighted by Path Length (SPL), with a notable 9.7 SPL improvement over the previous best SR method, validating its effectiveness and efficiency.