pith. sign in

arxiv: 2606.31071 · v1 · pith:L4V2OO4Onew · submitted 2026-06-30 · 💻 cs.CV · cs.RO

Hierarchical 3D Scene Graph Construction and Belief-based Planning for Semantic Navigation

Pith reviewed 2026-07-01 06:37 UTC · model grok-4.3

classification 💻 cs.CV cs.RO
keywords semantic navigation3D scene graphhierarchical planningembodied AIzero-shot navigationbelief-based planninglong-term decision makingglobal planning
0
0 comments X

The pith

An online hierarchical 3D scene graph plus belief-based rollouts lets agents plan globally consistent semantic navigation instead of greedy local moves.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a zero-shot semantic navigation framework that incrementally builds a Hierarchical 3D Scene Graph from online observations to create a multi-granular topology spanning objects, zones, and regions. This graph serves as a compact global state abstraction that replaces reliance on local observations and short-sighted greedy strategies. A hierarchical belief-based planner fuses semantic priors with exploration evidence on the graph and runs finite-horizon rollouts inside an HSG simulator to estimate long-term expected returns of macro-actions. If the method works, agents make fewer redundant backtracks and achieve better long-distance performance. Experiments report average gains of 9.4 percent success rate and 5.0 percent SPL over prior state-of-the-art methods in long-distance scenarios.

Core claim

By incrementally maintaining an online Hierarchical 3D Scene Graph as a multi-granular semantic topology over objects, zones, and regions, and pairing it with a hierarchical belief-based planning framework that fuses semantic priors with evidence and performs finite-horizon rollouts on an HSG-based simulator, agents can produce globally consistent decisions that reduce redundant backtracking in unseen environments.

What carries the argument

The Hierarchical 3D Scene Graph (HSG), a multi-granular semantic topology over objects, zones, and regions that functions as state abstraction for global planning and simulator rollouts.

If this is right

  • Finite-horizon rollouts on the HSG simulator explicitly estimate long-term returns of candidate macro-actions.
  • Fusion of semantic priors with exploration evidence on the graph produces globally consistent decisions.
  • Redundant backtracking decreases because planning operates at the level of regions and zones rather than single steps.
  • Performance gains concentrate in long-distance navigation tasks where myopic strategies fail.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the HSG can be maintained from visual observations alone, the same structure might support other long-horizon embodied tasks such as multi-room object rearrangement.
  • The reliance on an explicit simulator for return estimates implies that any mismatch between simulated and real dynamics would directly degrade planning quality.
  • Replacing the HSG with a purely implicit memory inside a foundation model would likely reintroduce the myopic behaviors the paper seeks to avoid.

Load-bearing premise

The hierarchical 3D scene graph can be incrementally maintained accurately from online observations in unseen environments and the HSG-based simulator produces reliable estimates of long-term expected returns.

What would settle it

An experiment in which the agent is placed in a novel environment where the maintained scene graph deviates from ground truth layout and the chosen macro-actions produce lower success rates than a greedy local baseline.

Figures

Figures reproduced from arXiv: 2606.31071 by Bing Wu, Changwen Chen, Zuyao Chen.

Figure 1
Figure 1. Figure 1: Hierarchical belief-based planning for long-distance navigation. (a) Unlike the greedy exploration of baselines (cyan path), our agent evaluates long-term expected returns via rollouts on the HSG simulator, enabling globally efficient navigation (red path). (b) Our method outperforms state-of-the-art baselines (ApexNav [37] for Ob￾jectNav, UniGoal [33] for TextInstanceNav) across multiple datasets in long-… view at source ↗
Figure 2
Figure 2. Figure 2: Framework of our method. Given posed RGB-D observations, the perception module incrementally builds an HSG along with object instances and occupancy map, which the decision module uses to maintain a belief state bt, select visit macro-actions via hierarchical POUCT, and generate executable actions with local planning. Despite this progress, many existing hierarchical and open-vocabulary ap￾proaches focus o… view at source ↗
Figure 3
Figure 3. Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Exploration evidence and local planning. We compute frontier information gains within partitioned regions and aggregate them as exploration evidence to update the HSG belief state. Upon selecting a macro-action, the local planner generates an efficient exploration tour by modeling the visitation order of high-gain frontiers as a traveling salesman problem. region node, descriptions of its member zones are … view at source ↗
Figure 5
Figure 5. Figure 5: Level and planning horizon ablation. We evaluate the performance on long￾range episodes of the HM3Dv2 ObjectNav dataset. 6.2 Comparison with State-of-the-Art We evaluate our approach against state-of-the-art baselines on ObjectNav and TextInstanceNav. On ObjectNav (Tab. 1), our method improves average SR and SPL by 5.4% and 2.3% compared to the strong baseline ApexNav across four datasets. Gains are even m… view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of the navigation process in a long-range episode. Left: top-down trajectories of our method and the baseline (ApexNav [37]) with SR/SPL; right: evo￾lution of the HSG at three key timesteps, with the selected visit macro-actions and timesteps annotated. agent tends to over-search high-prior areas when the prior misaligns with the current scene. (ii) w/o Zone-level. Simplifying the hierarchy t… view at source ↗
read the original abstract

Semantic navigation is a fundamental task for embodied agents operating in unseen environments, requiring both semantic understanding and long-term decision-making. Recent foundation models have empowered agents with rich semantic priors for this task. However, without structured global representations, decision-making often falls back on local observations and greedy strategies, resulting in inefficient exploration and myopic behaviors, especially in long-distance navigation. To address these challenges, we propose a zero-shot semantic navigation framework. Our method incrementally maintains an online Hierarchical 3D Scene Graph (HSG) to form a multi-granular semantic topology over objects, zones, and regions, serving as a compact state abstraction for global planning. Building on this memory, we introduce a hierarchical belief-based planning framework that fuses semantic priors with exploration evidence on the HSG, and performs finite-horizon rollouts on an HSG-based simulator to explicitly estimate the long-term expected returns of candidate macro-actions. This enables globally consistent decisions and reduces redundant backtracking. Extensive experiments in high-fidelity simulation environments across multiple tasks and datasets demonstrate that our method outperforms existing state-of-the-art methods, particularly in long-distance scenarios, where our approach improves SR and SPL by an average of 9.4\% and 5.0\%, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes a zero-shot semantic navigation framework for embodied agents in unseen environments. It incrementally maintains an online Hierarchical 3D Scene Graph (HSG) from observations to create a multi-granular semantic topology over objects, zones, and regions as a state abstraction. A hierarchical belief-based planning module fuses semantic priors with exploration evidence and runs finite-horizon rollouts on an HSG-based simulator to estimate long-term expected returns of macro-actions, claiming to reduce myopic behavior and backtracking. Experiments in high-fidelity simulators across tasks and datasets are said to show outperformance over SOTA, with average gains of 9.4% SR and 5.0% SPL in long-distance scenarios.

Significance. If the reported gains are reproducible and the HSG maintenance and simulator fidelity hold in unseen environments, the work would offer a concrete mechanism for structured global planning that integrates foundation-model priors with online evidence, addressing a recognized limitation of purely local or greedy policies in long-horizon embodied navigation.

major comments (2)
  1. [Abstract] Abstract: the central claim of 9.4% SR and 5.0% SPL improvement is stated without any description of experimental protocol, baseline implementations, datasets, trial counts, variance, or statistical tests, rendering the outperformance assertion unevaluable from the supplied text.
  2. [Abstract] Abstract: the attribution of gains to the HSG and HSG-based simulator rests on the unvalidated assumptions that the graph can be incrementally maintained accurately from online observations in unseen environments and that simulator rollouts reliably estimate returns; no node/edge error rates, drift metrics, or simulator-to-real correlation results are supplied, which directly undermines the causal link to the proposed structure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We agree that the abstract would benefit from additional context on the experimental protocol and validation of the HSG components to strengthen the presentation of results. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of 9.4% SR and 5.0% SPL improvement is stated without any description of experimental protocol, baseline implementations, datasets, trial counts, variance, or statistical tests, rendering the outperformance assertion unevaluable from the supplied text.

    Authors: We acknowledge that the abstract as written does not include these specifics, which are instead detailed in the Experiments section of the full manuscript (datasets such as Habitat-Matterport and AI2-THOR, baselines including frontier-based and LLM-driven methods, 500+ episodes with reported standard deviations, and significance testing). We will revise the abstract to concisely incorporate key protocol elements (e.g., 'across three simulators and datasets with 100+ trials per task, yielding 9.4% SR and 5.0% SPL gains with p<0.05') while respecting length constraints. revision: yes

  2. Referee: [Abstract] Abstract: the attribution of gains to the HSG and HSG-based simulator rests on the unvalidated assumptions that the graph can be incrementally maintained accurately from online observations in unseen environments and that simulator rollouts reliably estimate returns; no node/edge error rates, drift metrics, or simulator-to-real correlation results are supplied, which directly undermines the causal link to the proposed structure.

    Authors: The manuscript demonstrates the framework's effectiveness through end-to-end navigation performance in unseen environments where the HSG is constructed incrementally from online observations. We agree, however, that explicit supporting metrics would better substantiate the causal attribution. We will add quantitative results on HSG node/edge accuracy, temporal drift, and simulator-to-actual performance correlation in the revised Experiments section. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims rest on external simulation benchmarks, not self-referential definitions or fits

full rationale

The paper describes an online HSG construction and hierarchical belief-based planning method whose central claims are empirical outperformance (9.4% SR / 5.0% SPL gains) measured in high-fidelity simulators on multiple tasks and datasets. No equations, parameter fits, or derivation steps are supplied that reduce the reported metrics to quantities defined by the method itself. The HSG maintenance and simulator are presented as components whose accuracy is assumed for the planning to work, but this is an unvalidated modeling assumption rather than a circular reduction of the result to its inputs. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Ledger populated from abstract only; full paper would likely surface additional free parameters such as rollout horizon length, belief fusion weights, or graph update thresholds.

axioms (2)
  • domain assumption Foundation models supply rich semantic priors usable for navigation
    Abstract states that recent foundation models have empowered agents with these priors.
  • domain assumption Finite-horizon rollouts on an HSG simulator yield usable estimates of long-term returns
    Central mechanism of the belief-based planning component.
invented entities (1)
  • Hierarchical 3D Scene Graph (HSG) no independent evidence
    purpose: Compact multi-granular state abstraction over objects, zones, and regions for global planning
    Core new structure introduced to replace local-observation decision making.

pith-pipeline@v0.9.1-grok · 5752 in / 1270 out tokens · 38287 ms · 2026-07-01T06:37:14.387503+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 12 canonical work pages · 5 internal anchors

  1. [1]

    On Evaluation of Embodied Navigation Agents

    Anderson, P., Chang, A., Chaplot, D.S., Dosovitskiy, A., Gupta, S., Koltun, V., Kosecka, J., Malik, J., Mottaghi, R., Savva, M., et al.: On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757 (2018)

  2. [2]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Armeni, I., He, Z.Y., Gwak, J., Zamir, A.R., Fischer, M., Malik, J., Savarese, S.: 3D scene graph: A structure for unified semantics, 3D space, and camera. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 5664–5673 (2019)

  3. [3]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Cao, Y., Zhang, J., Yu, Z., Liu, S., Qin, Z., Zou, Q., Du, B., Xu, K.: CogNav: Cognitive process modeling for object goal navigation with LLMs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9550–9560 (2025) 16 B. Wu et al

  4. [4]

    In: 2017 International Conference on 3D Vision (3DV)

    Chang, A., Dai, A., Funkhouser, T., Halber, M., Nießner, M., Savva, M., Song, S., Zeng, A., Zhang, Y.: Matterport3D: Learning from RGB-D data in indoor environments. In: 2017 International Conference on 3D Vision (3DV). pp. 667–

  5. [5]

    IEEE Computer Society (2017)

  6. [6]

    arXiv preprint arXiv:2311.06430 (2023)

    Chang, M., Gervet, T., Khanna, M., Yenamandra, S., Shah, D., Min, S.Y., Shah, K., Paxton, C., Gupta, S., Batra, D., et al.: GOAT: Go to any thing. arXiv preprint arXiv:2311.06430 (2023)

  7. [7]

    In: Advances in Neural Information Processing Systems

    Chaplot, D.S., Gandhi, D.P., Gupta, A., Salakhutdinov, R.R.: Object goal naviga- tion using goal-oriented semantic exploration. In: Advances in Neural Information Processing Systems. vol. 33, pp. 4247–4258 (2020)

  8. [8]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Gadre, S.Y., Wortsman, M., Ilharco, G., Schmidt, L., Song, S.: CoWs on pas- ture: Baselines and benchmarks for language-driven zero-shot object navigation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 23171–23181 (2023)

  9. [9]

    In: 2024 IEEE Inter- national Conference on Robotics and Automation (ICRA)

    Gu, Q., Kuwajerwala, A., Morin, S., Jatavallabhula, K.M., Sen, B., Agarwal, A., Rivera, C., Paul, W., Ellis, K., Chellappa, R., et al.: ConceptGraphs: Open- vocabulary 3D scene graphs for perception and planning. In: 2024 IEEE Inter- national Conference on Robotics and Automation (ICRA). pp. 5021–5028. IEEE (2024)

  10. [10]

    The International Journal of Robotics Research43(10), 1457– 1505 (2024)

    Hughes, N., Chang, Y., Hu, S., Talak, R., Abdulhai, R., Strader, J., Carlone, L.: Foundations of spatial perception for robotics: Hierarchical representations and real-time systems. The International Journal of Robotics Research43(10), 1457– 1505 (2024)

  11. [11]

    In: 2016 IEEE international confer- ence on robotics and automation (ICRA)

    Isler, S., Sabzevari, R., Delmerico, J., Scaramuzza, D.: An information gain formu- lation for active volumetric 3D reconstruction. In: 2016 IEEE international confer- ence on robotics and automation (ICRA). pp. 3477–3484. IEEE (2016)

  12. [12]

    Robotics and Autonomous Systems57(2), 123–128 (2009)

    Kalra, N., Ferguson, D., Stentz, A.: Incremental reconstruction of generalized Voronoi diagrams on grids. Robotics and Autonomous Systems57(2), 123–128 (2009)

  13. [13]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Khanna, M., Mao, Y., Jiang, H., Haresh, S., Shacklett, B., Batra, D., Clegg, A., Undersander, E., Chang, A.X., Savva, M.: Habitat synthetic scenes dataset (HSSD- 200):Ananalysisof3DscenescaleandrealismtradeoffsforObjectGoalnavigation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 16384–16393 (2024)

  14. [14]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Koch, S., Vaskevicius, N., Colosi, M., Hermosilla, P., Ropinski, T.: Open3DSG: Open-vocabulary 3D scene graphs from point clouds with queryable objects and open-set relationships. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 14183–14193 (2024)

  15. [15]

    Openfm- nav: Towards open-set zero-shot object navigation via vision-language foundation models.arXiv preprint arXiv:2402.10670, 2024

    Kuang, Y., Lin, H., Jiang, M.: OpenFMNav: Towards open-set zero-shot object navigation via vision-language foundation models. arXiv preprint arXiv:2402.10670 (2024)

  16. [16]

    In: Conference on Robot Learning

    Long, Y., Cai, W., Wang, H., Zhan, G., Dong, H.: InstructNav: Zero-shot system for generic instruction navigation in unexplored environment. In: Conference on Robot Learning. pp. 2049–2060. PMLR (2025)

  17. [17]

    In: Advances in Neural Information Processing Systems

    Majumdar, A., Aggarwal, G., Devnani, B., Hoffman, J., Batra, D.: ZSON: Zero- shot object-goal navigation using multimodal goal embeddings. In: Advances in Neural Information Processing Systems. vol. 35, pp. 32340–32352 (2022)

  18. [18]

    Habitat 3.0: A co-habitat for humans, avatars and robots.arXiv preprint arXiv:2310.13724, 2023

    Puig, X., Undersander, E., Szot, A., Cote, M.D., Yang, T.Y., Partsey, R., Desai, R., Clegg, A.W., Hlavac, M., Min, S.Y., et al.: Habitat 3.0: A co-habitat for humans, avatars and robots. arXiv preprint arXiv:2310.13724 (2023) Hierarchical Scene Graphs for Semantic Navigation 17

  19. [19]

    Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI

    Ramakrishnan, S.K., Gokaslan, A., Wijmans, E., Maksymets, O., Clegg, A., Turner, J., Undersander, E., Galuba, W., Westbury, A., Chang, A.X., et al.: Habitat-Matterport 3D dataset (HM3D): 1000 large-scale 3D environments for embodied AI. arXiv preprint arXiv:2109.08238 (2021)

  20. [20]

    In: Conference on Robot Learning

    Rana, K., Haviland, J., Garg, S., Abou-Chakra, J., Reid, I., Suenderhauf, N.: Say- Plan: Grounding large language models using 3D scene graphs for scalable robot task planning. In: Conference on Robot Learning. pp. 23–72. PMLR (2023)

  21. [21]

    The International Journal of Robotics Research40(12–14), 1510–1546 (2021)

    Rosinol,A.,Violette,A.,Abate,M.,Hughes,N.,Chang,Y.,Shi,J.,Gupta,A.,Car- lone, L.: Kimera: From SLAM to spatial perception with 3D dynamic scene graphs. The International Journal of Robotics Research40(12–14), 1510–1546 (2021)

  22. [22]

    proceedings of the National Academy of Sciences93(4), 1591–1595 (1996)

    Sethian, J.A.: A fast marching level set method for monotonically advancing fronts. proceedings of the National Academy of Sciences93(4), 1591–1595 (1996)

  23. [23]

    In: Advances in neural information processing systems

    Silver, D., Veness, J.: Monte-carlo planning in large POMDPs. In: Advances in neural information processing systems. vol. 23 (2010)

  24. [24]

    In: European Conference on Computer Vision

    Sun, X., Liu, L., Zhi, H., Qiu, R., Liang, J.: Prioritized semantic learning for zero-shot instance navigation. In: European Conference on Computer Vision. pp. 161–178. Springer (2024)

  25. [25]

    Yoloe: Real-time seeing anything,

    Wang, A., Liu, L., Chen, H., Lin, Z., Han, J., Ding, G.: YOLOE: Real-time seeing anything. arXiv preprint arXiv:2503.07465 (2025)

  26. [26]

    arXiv preprint arXiv:2601.08665 (2026)

    Wang, S., Luo, Y., Chen, X., Luo, A., Li, D., Liu, C., Chen, S., Zhang, Y., Yu, J.: VLingNav: Embodied navigation with adaptive reasoning and visual-assisted linguistic memory. arXiv preprint arXiv:2601.08665 (2026)

  27. [27]

    ImagineNav++: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination

    Wang, T., Zhao, X., Cai, W., Sun, C.: ImagineNav++: Prompting vision- language models as embodied navigator through scene imagination. arXiv preprint arXiv:2512.17435 (2025)

  28. [28]

    In: Pro- ceedings of Robotics: Science and Systems

    Werby, A., Huang, C., Büchner, M., Valada, A., Burgard, W.: Hierarchical Open- Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation. In: Pro- ceedings of Robotics: Science and Systems. Delft, Netherlands (July 2024)

  29. [29]

    In: International Conference on Machine Learning

    Wu, P., Mu, Y., Wu, B., Hou, Y., Ma, J., Zhang, S., Liu, C.: VoroNav: Voronoi- based zero-shot object navigation with large language model. In: International Conference on Machine Learning. pp. 53757–53775. PMLR (2024)

  30. [30]

    In: Proceed- ings of the 41st International Conference on Machine Learning

    Xie, J., Zhang, K., Chen, J., Zhu, T., Lou, R., Tian, Y., Xiao, Y., Su, Y.: Trav- elPlanner: A benchmark for real-world planning with language agents. In: Proceed- ings of the 41st International Conference on Machine Learning. pp. 54590–54613 (2024)

  31. [31]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Yadav, K., Ramrakhya, R., Ramakrishnan, S.K., Gervet, T., Turner, J., Gokaslan, A., Maestre, N., Chang, A.X., Batra, D., Savva, M., et al.: Habitat-Matterport 3D semantics dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4927–4936 (2023)

  32. [32]

    Qwen3 Technical Report

    Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., et al.: Qwen3 technical report. arXiv preprint arXiv:2505.09388 (2025)

  33. [33]

    In: Advances in neural information processing systems

    Yin,H.,Xu,X.,Wu,Z.,Zhou,J.,Lu,J.:SG-Nav:Online3Dscenegraphprompting for LLM-based zero-shot object navigation. In: Advances in neural information processing systems. vol. 37, pp. 5285–5307 (2024)

  34. [34]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Yin, H., Xu, X., Zhao, L., Wang, Z., Zhou, J., Lu, J.: UniGoal: Towards universal zero-shot goal-oriented navigation. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 19057–19066 (2025)

  35. [35]

    In: 2024 IEEE International Con- ference on Robotics and Automation (ICRA)

    Yokoyama, N., Ha, S., Batra, D., Wang, J., Bucher, B.: VLFM: Vision-language frontier maps for zero-shot semantic navigation. In: 2024 IEEE International Con- ference on Robotics and Automation (ICRA). pp. 42–48. IEEE (2024) 18 B. Wu et al

  36. [36]

    In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

    Yu, B., Kasaei, H., Cao, M.: L3MVN: Leveraging large language models for vi- sual target navigation. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 3554–3560. IEEE (2023)

  37. [37]

    Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks

    Zhang, J., Wang, K., Wang, S., Li, M., Liu, H., Wei, S., Wang, Z., Zhang, Z., Wang, H.: Uni-NaVid: A video-based Vision-Language-Action model for unifying embodied navigation tasks. arXiv preprint arXiv:2412.06224 (2024)

  38. [38]

    IEEE Robotics and Automation Letters (2025)

    Zhang, M., Du, Y., Wu, C., Zhou, J., Qi, Z., Ma, J., Zhou, B.: ApexNav: An adaptive exploration strategy for zero-shot object navigation with target-centric semantic fusion. IEEE Robotics and Automation Letters (2025)

  39. [39]

    In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision

    Zhang, S., Yu, X., Song, X., Wang, Y., Jiang, S.: Function-centric bayesian network for zero-shot object goal navigation. In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision. pp. 19535–19545 (2025)

  40. [40]

    arXiv preprint arXiv:2410.09874 (2024)

    Zhao, X., Cai, W., Tang, L., Wang, T.: ImagineNav: Prompting vision- language models as embodied navigator through scene imagination. arXiv preprint arXiv:2410.09874 (2024)

  41. [41]

    IEEE Robotics and Automa- tion Letters6(2), 779–786 (2021)

    Zhou, B., Zhang, Y., Chen, X., Shen, S.: FUEL: Fast UAV exploration using incre- mental frontier structure and hierarchical planning. IEEE Robotics and Automa- tion Letters6(2), 779–786 (2021)

  42. [42]

    In: International Conference on Machine Learning

    Zhou, K., Zheng, K., Pryor, C., Shen, Y., Jin, H., Getoor, L., Wang, X.E.: ESC: Exploration with soft commonsense constraints for zero-shot object navigation. In: International Conference on Machine Learning. pp. 42829–42842. PMLR (2023)

  43. [43]

    arXiv preprint arXiv:2506.06487 (2025)

    Zhou, Z., Hu, Y., Zhang, L., Li, Z., Chen, S.: BeliefMapNav: 3D voxel-based belief map for zero-shot object navigation. arXiv preprint arXiv:2506.06487 (2025)

  44. [44]

    In: 2006 IEEE International Conference on Robotics and Automation (ICRA)

    Zivkovic, Z., Bakker, B., Krose, B.: Hierarchical map building and planning based on graph partitioning. In: 2006 IEEE International Conference on Robotics and Automation (ICRA). pp. 803–809 (2006)