Recognition: unknown
TrajRAG: Retrieving Geometric-Semantic Experience for Zero-Shot Object Navigation
Pith reviewed 2026-05-10 15:59 UTC · model grok-4.3
The pith
A retrieval system stores past navigation paths in compact geometric-semantic form and retrieves similar ones to guide large models toward objects in new scenes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TrajRAG incrementally stores episodic observations as topo-polar trajectories that compactly encode spatial layouts and semantic contexts. Hierarchical chunking groups similar trajectories into summaries that support coarse-to-fine retrieval. At inference time candidate frontiers spawn multiple trajectory hypotheses that query the memory for relevant past experiences; retrieved trajectories then steer large-model reasoning for waypoint selection, after which the new episode is folded back into the store.
What carries the argument
The topological-polar trajectory representation that encodes spatial layouts and semantic contexts while removing redundancies, together with hierarchical chunking that organizes similar trajectories into unified summaries for retrieval.
If this is right
- Zero-shot ObjectNav success rates rise on MP3D, HM3D-v1, and HM3D-v2 when relevant past trajectories are retrieved.
- Episodic observations become reusable lifelong memory instead of being discarded after each episode.
- Large-model waypoint selection receives concrete geometric-semantic examples rather than relying solely on pretrained commonsense.
- New scenes benefit from transfer of structured trajectory patterns without any task-specific fine-tuning.
Where Pith is reading between the lines
- The same structured memory could be queried during exploration or mapping to avoid previously visited dead-ends.
- Over repeated deployments the system might implicitly learn common building archetypes that transfer across different houses.
- Replacing or augmenting the large model with direct trajectory lookup could lower compute cost while preserving performance.
- Extending the representation to include action outcomes or failure cases would allow the agent to learn avoidance strategies.
Load-bearing premise
The topo-polar format and chunking summaries retain enough geometric and semantic detail from raw observations to support useful transfer to unseen environments.
What would settle it
An experiment in which TrajRAG returns trajectories that are geometrically or semantically dissimilar to the current scene, resulting in lower or equal success rates compared with a baseline that ignores the memory.
Figures
read the original abstract
Existing zero-shot Object Goal Navigation (ObjectNav) methods often exploit commonsense knowledge from large language or vision-language models to guide navigation. However, such knowledge arises from internet-scale text rather than embodied 3D experience, and episodic observations collected during navigation are typically discarded, preventing the accumulation of lifelong experience. To this end, we propose Trajectory RAG (TrajRAG), a retrieval-augmented generation framework that enhances large-model reasoning by retrieving geometric-semantic experiences. TrajRAG incrementally accumulates episodic observations from past navigation episodes. To structure these observations, we propose a topological-polar (topo-polar) trajectory representation that compactly encodes spatial layouts and semantic contexts, effectively removing redundancies in raw episodic observations. A hierarchical chunking structure further organizes similar topo-polar trajectories into unified summaries, enabling coarse-to-fine retrieval. During navigation, candidate frontiers generate multiple trajectory hypotheses that query TrajRAG for similar past trajectories, guiding large-model reasoning for waypoint selection. New experiences are continually consolidated into TrajRAG, enabling the accumulation of lifelong navigation experience. Experiments on MP3D, HM3D-v1, and HM3D-v2 show that TrajRAG effectively retrieves relevant geometric-semantic experiences and improves zero-shot ObjectNav performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes TrajRAG, a retrieval-augmented generation framework for zero-shot Object Goal Navigation that incrementally accumulates episodic observations, encodes them via a topological-polar (topo-polar) trajectory representation to remove redundancies, organizes similar trajectories with hierarchical chunking for coarse-to-fine retrieval, and queries this store during navigation to guide large-model reasoning for waypoint selection from candidate frontiers. New experiences are consolidated to enable lifelong accumulation. Experiments on MP3D, HM3D-v1, and HM3D-v2 are stated to demonstrate effective retrieval of geometric-semantic experiences and improved ObjectNav performance.
Significance. If the quantitative claims hold and the encoding preserves necessary spatial-semantic information, the work offers a practical mechanism for lifelong embodied experience reuse in navigation, moving beyond static LLM commonsense priors. The hierarchical retrieval design is a clear engineering strength for scaling experience. However, significance is limited by the absence of verifiable performance deltas, baselines, or preservation metrics, leaving the core transfer assumption untested.
major comments (2)
- [Abstract] Abstract: the central claim that TrajRAG 'improves zero-shot ObjectNav performance' is asserted without any quantitative results, baselines, ablation details, success rates, or error analysis, rendering the empirical contribution unverifiable from the provided text.
- [Abstract] Abstract / topo-polar representation description: the claim that the topo-polar encoding 'compactly encodes spatial layouts and semantic contexts, effectively removing redundancies' lacks any reconstruction error, information-preservation metric, or ablation on metric details (distances, obstacle densities, local geometry); this directly undermines the transfer assumption to novel MP3D/HM3D layouts.
minor comments (1)
- [Abstract] Abstract: the hierarchical chunking process is mentioned but the similarity criteria, chunk size, or summarization method are unspecified, hindering reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that the abstract can be improved to better convey the empirical results and the rationale behind the topo-polar representation. We address each major comment below and will revise the abstract accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that TrajRAG 'improves zero-shot ObjectNav performance' is asserted without any quantitative results, baselines, ablation details, success rates, or error analysis, rendering the empirical contribution unverifiable from the provided text.
Authors: We agree that the abstract should include key quantitative highlights to make the performance gains immediately verifiable. The full manuscript reports detailed results in Section 4, including success rates, SPL, and comparisons against baselines on MP3D and HM3D. In the revised abstract we will add a concise summary of these improvements (e.g., relative gains in success rate) while keeping the length appropriate. revision: yes
-
Referee: [Abstract] Abstract / topo-polar representation description: the claim that the topo-polar encoding 'compactly encodes spatial layouts and semantic contexts, effectively removing redundancies' lacks any reconstruction error, information-preservation metric, or ablation on metric details (distances, obstacle densities, local geometry); this directly undermines the transfer assumption to novel MP3D/HM3D layouts.
Authors: The topo-polar representation is constructed to retain topological connectivity and polar spatial-semantic relations while discarding redundant observations; its design rationale and implementation are detailed in Section 3.1. We do not compute reconstruction error because the representation is not intended for full scene reconstruction but for experience retrieval in navigation. Its information preservation is instead validated empirically through high retrieval accuracy and the resulting navigation performance gains shown in our experiments and ablations (Section 4, Figures 3-4, Table 2). We will revise the abstract to briefly articulate this design choice and reference the supporting empirical evidence. revision: partial
Circularity Check
No circularity: engineering framework with no derivations or fitted predictions
full rationale
The paper describes TrajRAG as an incremental retrieval-augmented framework built from proposed design choices (topo-polar trajectory encoding and hierarchical chunking) whose value is assessed solely through empirical experiments on MP3D and HM3D. No equations, closed-form derivations, parameter-fitting steps, or first-principles claims appear that could reduce to their own inputs by construction. Self-citations, if present, are not invoked as load-bearing uniqueness theorems or ansatzes that substitute for independent justification. The central performance claim therefore rests on external dataset results rather than any self-referential reduction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Meghan Booker, Grayson Byrd, Bethany Kemp, Aurora Schmidt, and Corban Rivera. Embodiedrag: Dynamic 3d scene graph retrieval for efficient and scalable robot task planning.arXiv preprint arXiv:2410.23968, 2024. 3
-
[2]
Bridging zero- shot object navigation and foundation models through pixel- guided navigation skill
Wenzhe Cai, Siyuan Huang, Guangran Cheng, Yuxing Long, Peng Gao, Changyin Sun, and Hao Dong. Bridging zero- shot object navigation and foundation models through pixel- guided navigation skill. In2024 IEEE International Confer- ence on Robotics and Automation (ICRA), pages 5228–5234. IEEE, 2024. 1
2024
-
[3]
Cognav: Cognitive process modeling for object goal navigation with llms
Yihan Cao, Jiazhao Zhang, Zhinan Yu, Shuzhen Liu, Zheng Qin, Qin Zou, Bo Du, and Kai Xu. Cognav: Cognitive process modeling for object goal navigation with llms. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9550–9560, 2025. 3
2025
-
[4]
Rq-rag: Learning to refine queries for retrieval augmented generation
Chi-Min Chan, Chunpu Xu, Ruibin Yuan, Hongyin Luo, Wei Xue, Yike Guo, and Jie Fu. Rq-rag: Learning to refine queries for retrieval augmented generation.arXiv preprint arXiv:2404.00610, 2024. 3
-
[5]
Matterport3d: Learning from rgb-d data in indoor environments
Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Hal- ber, Matthias Niebner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3d: Learning from rgb-d data in indoor environments. In2017 International Confer- ence on 3D Vision (3DV), pages 667–676. IEEE, 2017. 5, 6
2017
-
[6]
Object goal navi- gation using goal-oriented semantic exploration.Advances in Neural Information Processing Systems, 33:4247–4258,
Devendra Singh Chaplot, Dhiraj Prakashchand Gandhi, Ab- hinav Gupta, and Russ R Salakhutdinov. Object goal navi- gation using goal-oriented semantic exploration.Advances in Neural Information Processing Systems, 33:4247–4258,
-
[7]
Object goal navigation with recursive implicit maps
Shizhe Chen, Thomas Chabal, Ivan Laptev, and Cordelia Schmid. Object goal navigation with recursive implicit maps. In2023 IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS), pages 7089–7096. IEEE,
-
[8]
Search for or navi- gate to? dual adaptive thinking for object navigation
Ronghao Dang, Liuyi Wang, Zongtao He, Shuai Su, Jiagui Tang, Chengju Liu, and Qijun Chen. Search for or navi- gate to? dual adaptive thinking for object navigation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8250–8259, 2023. 1
2023
-
[9]
Procthor: Large-scale embodied ai using procedural generation.Ad- vances in Neural Information Processing Systems, 35:5982– 5994, 2022
Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Kiana Ehsani, Jordi Salvador, Winson Han, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. Procthor: Large-scale embodied ai using procedural generation.Ad- vances in Neural Information Processing Systems, 35:5982– 5994, 2022. 1
2022
-
[10]
Learning object rela- tion graph and tentative policy for visual navigation
Heming Du, Xin Yu, and Liang Zheng. Learning object rela- tion graph and tentative policy for visual navigation. InEu- ropean Conference on Computer Vision, pages 19–34, 2020. 1
2020
-
[11]
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropoli- tansky, Robert Osazuwa Ness, and Jonathan Larson. From local to global: A graph rag approach to query-focused sum- marization.arXiv preprint arXiv:2404.16130, 2024. 3
work page internal anchor Pith review arXiv 2024
-
[12]
Context-dependent decision-making in the primate hippocampal–prefrontal cir- cuit.Nature Neuroscience, 28(2):374–382, 2025
Thomas W Elston and Joni D Wallis. Context-dependent decision-making in the primate hippocampal–prefrontal cir- cuit.Nature Neuroscience, 28(2):374–382, 2025. 2
2025
-
[13]
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981
Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981. 5
1981
-
[14]
Cows on pasture: Base- lines and benchmarks for language-driven zero-shot object navigation
Samir Yitzhak Gadre, Mitchell Wortsman, Gabriel Ilharco, Ludwig Schmidt, and Shuran Song. Cows on pasture: Base- lines and benchmarks for language-driven zero-shot object navigation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23171– 23181, 2023. 1, 2
2023
-
[15]
Precise zero-shot dense retrieval without relevance labels
Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan. Precise zero-shot dense retrieval without relevance labels. InProceedings of the 61st Annual Meeting of the Associa- tion for Computational Linguistics (Volume 1: Long Papers), pages 1762–1777, 2023. 3
2023
-
[16]
Learning to map for active semantic goal navigation
Georgios Georgakis, Bernadette Bucher, Karl Schmeck- peper, Siddharth Singh, and Kostas Daniilidis. Learning to map for active semantic goal navigation. InThe Tenth In- ternational Conference on Learning Representations (ICLR 2022), 2022. 1
2022
-
[17]
LightRAG: Simple and Fast Retrieval-Augmented Generation
Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. Lightrag: Simple and fast retrieval-augmented gen- eration.arXiv preprint arXiv:2410.05779, 2024. 3
work page internal anchor Pith review arXiv 2024
-
[18]
Gamap: Zero-shot object goal navigation with multi-scale geometric-affordance guidance.Advances in Neural Information Processing Systems, 37:39386–39408,
Hao Huang, Yu Hao, Congcong Wen, Anthony Tzes, Yi Fang, et al. Gamap: Zero-shot object goal navigation with multi-scale geometric-affordance guidance.Advances in Neural Information Processing Systems, 37:39386–39408,
-
[19]
Self model for embodied artifi- cial intelligence.Journal of Computer Science and Technol- ogy, 2026
Shuqiang Jiang, Sixian Zhang, Shida Tao, Xihong Zhu, Tian- liang Qi, and Xinhang Song. Self model for embodied artifi- cial intelligence.Journal of Computer Science and Technol- ogy, 2026. 2
2026
-
[20]
Goat-bench: A benchmark for multi-modal lifelong navigation
Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani, Sriram Yenamandra, Theophile Gervet, Matthew Chang, Zsolt Kira, Devendra Singh Chaplot, Dhruv Batra, and Roozbeh Mot- taghi. Goat-bench: A benchmark for multi-modal lifelong navigation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16373– 16383, 2024. 1
2024
-
[21]
Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K¨uttler, Mike Lewis, Wen-tau Yih, Tim Rockt ¨aschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020. 3
2020
-
[22]
Distilling LLM prior to flow model for generaliz- able agent’s imagination in object goal navigation.Advances in Neural Information Processing Systems, 2025
Badi Li, Renjie Lu, Yu Zhou, Jingke Meng, and Wei-Shi Zheng. Distilling LLM prior to flow model for generaliz- able agent’s imagination in object goal navigation.Advances in Neural Information Processing Systems, 2025. 8
2025
-
[23]
Prediction, sequences and the hippocampus.Philosophical Transactions of the Royal Society B: Biological Sciences, 364(1521):1193–1201, 2009
John Lisman and A David Redish. Prediction, sequences and the hippocampus.Philosophical Transactions of the Royal Society B: Biological Sciences, 364(1521):1193–1201, 2009. 2
2009
-
[24]
Grounding dino: Marrying dino with grounded pre-training for open-set object detection
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. InEu- ropean Conference on Computer Vision, pages 38–55, 2024. 6
2024
-
[25]
Instructnav: Zero-shot system for generic instruction navigation in unexplored environment, 2024
Yuxing Long, Wenzhe Cai, Hongcheng Wang, Guanqi Zhan, and Hao Dong. Instructnav: Zero-shot system for generic instruction navigation in unexplored environment, 2024. 8
2024
-
[26]
Zson: Zero-shot object-goal navigation using multimodal goal embeddings.Advances in Neural Information Processing Systems, 35:32340–32352,
Arjun Majumdar, Gunjan Aggarwal, Bhavika Devnani, Judy Hoffman, and Dhruv Batra. Zson: Zero-shot object-goal navigation using multimodal goal embeddings.Advances in Neural Information Processing Systems, 35:32340–32352,
-
[27]
Learning structures: predictive represen- tations, replay, and generalization.Current Opinion in Be- havioral Sciences, 32:155–166, 2020
Ida Momennejad. Learning structures: predictive represen- tations, replay, and generalization.Current Opinion in Be- havioral Sciences, 32:155–166, 2020. 2
2020
-
[28]
Learning task-state representations.Nature neuro- science, 22(10):1544–1553, 2019
Yael Niv. Learning task-state representations.Nature neuro- science, 22(10):1544–1553, 2019. 2
2019
-
[29]
Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI
Santhosh K Ramakrishnan, Aaron Gokaslan, Erik Wijmans, Oleksandr Maksymets, Alex Clegg, John Turner, Eric Un- dersander, Wojciech Galuba, Andrew Westbury, Angel X Chang, et al. Habitat-matterport 3d dataset (hm3d): 1000 large-scale 3d environments for embodied ai.arXiv preprint arXiv:2109.08238, 2021. 5, 6
work page internal anchor Pith review arXiv 2021
-
[30]
Poni: Potential functions for objectgoal navigation with interaction-free learning
Santhosh Kumar Ramakrishnan, Devendra Singh Chap- lot, Ziad Al-Halah, Jitendra Malik, and Kristen Grauman. Poni: Potential functions for objectgoal navigation with interaction-free learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18890–18900, 2022. 1
2022
-
[31]
The grid code for ordered experience
Jon W Rueckemann, Marielena Sosa, Lisa M Giocomo, and Elizabeth A Buffalo. The grid code for ordered experience. Nature Reviews Neuroscience, 22(10):637–649, 2021. 2
2021
-
[32]
Habitat: A platform for embodied AI research
Manolis Savva, Jitendra Malik, Devi Parikh, Dhruv Batra, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, and Vladlen Koltun. Habitat: A platform for embodied AI research. In 2019 IEEE/CVF International Conference on Computer Vi- sion, ICCV 2019, Seoul, Korea (South), October 27 - Novem- ber 2, 2019, pag...
2019
-
[33]
Prioritized semantic learning for zero-shot in- stance navigation
Xinyu Sun, Lizhao Liu, Hongyan Zhi, Ronghe Qiu, and Jun- wei Liang. Prioritized semantic learning for zero-shot in- stance navigation. InEuropean Conference on Computer Vi- sion, pages 161–178. Springer, 2024. 8
2024
-
[34]
Ge- ometric transformation of cognitive maps for generalization across hippocampal-prefrontal circuits.Cell reports, 42(3),
Wenbo Tang, Justin D Shin, and Shantanu P Jadhav. Ge- ometric transformation of cognitive maps for generalization across hippocampal-prefrontal circuits.Cell reports, 42(3),
-
[35]
Qwen3 technical report, 2025
Qwen Team. Qwen3 technical report, 2025. 6
2025
-
[36]
g3d-lf: Generalizable 3d- language feature fields for embodied tasks
Zihan Wang and Gim Hee Lee. g3d-lf: Generalizable 3d- language feature fields for embodied tasks. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 14191–14202, 2025. 2
2025
-
[37]
Lookahead exploration with neural radiance representation for continuous vision- language navigation
Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Junjie Hu, Ming Jiang, and Shuqiang Jiang. Lookahead exploration with neural radiance representation for continuous vision- language navigation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 13753–13762, 2024. 1
2024
-
[38]
Dynam3d: Dynamic layered 3d tokens empower vlm for vision-and- language navigation
Zihan Wang, Seungjun Lee, and Gim Hee Lee. Dynam3d: Dynamic layered 3d tokens empower vlm for vision-and- language navigation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 3
2025
-
[39]
Zihan Wang, Yaohui Zhu, Gim Hee Lee, and Yachun Fan. Navrag: Generating user demand instructions for embodied navigation through retrieval-augmented llm.arXiv preprint arXiv:2502.11142, 2025. 3
-
[40]
How to build a cognitive map.Nature neuroscience, 25(10):1257–1272, 2022
James CR Whittington, David McCaffary, Jacob JW Baker- mans, and Timothy EJ Behrens. How to build a cognitive map.Nature neuroscience, 25(10):1257–1272, 2022. 2
2022
-
[41]
Dd- ppo: Learning near-perfect pointgoal navigators from 2.5 bil- lion frames
Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Ir- fan Essa, Devi Parikh, Manolis Savva, and Dhruv Batra. Dd- ppo: Learning near-perfect pointgoal navigators from 2.5 bil- lion frames. InInternational Conference on Learning Rep- resentations. 8
-
[42]
V oronav: V oronoi-based zero- shot object navigation with large language model
Pengying Wu, Yao Mu, Bingxian Wu, Yi Hou, Ji Ma, Shang- hang Zhang, and Chang Liu. V oronav: V oronoi-based zero- shot object navigation with large language model. InIn- ternational Conference on Machine Learning, pages 53757– 53775. PMLR, 2024. 2, 3, 8
2024
-
[43]
Quanting Xie, So Yeon Min, Pengliang Ji, Yue Yang, Tianyi Zhang, Kedi Xu, Aarav Bajaj, Ruslan Salakhutdinov, Matthew Johnson-Roberson, and Yonatan Bisk. Embodied- rag: General non-parametric embodied memory for retrieval and generation.arXiv preprint arXiv:2409.18313, 2024. 3
-
[44]
Habitat-matterport 3d semantics dataset
Karmesh Yadav, Ram Ramrakhya, Santhosh Kumar Ramakr- ishnan, Theo Gervet, John Turner, Aaron Gokaslan, Noah Maestre, Angel Xuan Chang, Dhruv Batra, Manolis Savva, et al. Habitat-matterport 3d semantics dataset. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4927–4936, 2023. 6
2023
-
[45]
Sg-nav: Online 3d scene graph prompting for llm-based zero-shot object navigation.Advances in Neural Information Processing Systems, 37:5285–5307, 2024
Hang Yin, Xiuwei Xu, Zhenyu Wu, Jie Zhou, and Jiwen Lu. Sg-nav: Online 3d scene graph prompting for llm-based zero-shot object navigation.Advances in Neural Information Processing Systems, 37:5285–5307, 2024. 2, 8
2024
-
[46]
Unigoal: Towards universal zero-shot goal- oriented navigation
Hang Yin, Xiuwei Xu, Linqing Zhao, Ziwei Wang, Jie Zhou, and Jiwen Lu. Unigoal: Towards universal zero-shot goal- oriented navigation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 19057–19066,
-
[47]
Vlfm: Vision-language frontier maps for zero-shot semantic navigation
Naoki Yokoyama, Sehoon Ha, Dhruv Batra, Jiuguang Wang, and Bernadette Bucher. Vlfm: Vision-language frontier maps for zero-shot semantic navigation. In2024 IEEE In- ternational Conference on Robotics and Automation (ICRA), pages 42–48. IEEE, 2024. 2, 3, 8
2024
-
[48]
L3mvn: Leveraging large language models for visual target naviga- tion
Bangguo Yu, Hamidreza Kasaei, and Ming Cao. L3mvn: Leveraging large language models for visual target naviga- tion. In2023 IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS), pages 3554–3560. IEEE,
-
[49]
Trajectory diffusion for objectgoal naviga- tion.Advances in Neural Information Processing Systems, 37:110388–110411, 2024
Xinyao Yu, Sixian Zhang, Xinhang Song, Xiaorong Qin, and Shuqiang Jiang. Trajectory diffusion for objectgoal naviga- tion.Advances in Neural Information Processing Systems, 37:110388–110411, 2024. 8
2024
-
[50]
Peanut: Predicting and navigating to unseen targets
Albert J Zhai and Shenlong Wang. Peanut: Predicting and navigating to unseen targets. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10926–10935, 2023. 1
2023
-
[51]
Faster segment anything: Towards lightweight sam for mobile applications,
Chaoning Zhang, Dongshen Han, Yu Qiao, Jung Uk Kim, Sung-Ho Bae, Seungkyu Lee, and Choong Seon Hong. Faster segment anything: Towards lightweight sam for mo- bile applications.arXiv preprint arXiv:2306.14289, 2023. 6
-
[52]
Apexnav: An adap- tive exploration strategy for zero-shot object navigation with target-centric semantic fusion.IEEE Robotics Autom
Mingjie Zhang, Yuheng Du, Chengkai Wu, Jinni Zhou, Zhenchao Qi, Jun Ma, and Boyu Zhou. Apexnav: An adap- tive exploration strategy for zero-shot object navigation with target-centric semantic fusion.IEEE Robotics Autom. Lett., 10(11):11530–11537, 2025. 6, 8
2025
-
[53]
Generative meta-adversarial network for unseen object navigation
Sixian Zhang, Weijie Li, Xinhang Song, Yubing Bai, and Shuqiang Jiang. Generative meta-adversarial network for unseen object navigation. InEuropean Conference on Com- puter Vision, pages 301–320. Springer, 2022. 1
2022
-
[54]
Layout-based causal inference for object navigation
Sixian Zhang, Xinhang Song, Weijie Li, Yubing Bai, Xinyao Yu, and Shuqiang Jiang. Layout-based causal inference for object navigation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 10792–10802. IEEE,
2023
-
[55]
Imagine before go: Self-supervised generative map for object goal navigation
Sixian Zhang, Xinyao Yu, Xinhang Song, Xiaohan Wang, and Shuqiang Jiang. Imagine before go: Self-supervised generative map for object goal navigation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 16414–16425, 2024. 1, 8
2024
-
[56]
Hoz++: Versa- tile hierarchical object-to-zone graph for object navigation
Sixian Zhang, Xinhang Song, Xinyao Yu, Yubing Bai, Xin- long Guo, Weijie Li, and Shuqiang Jiang. Hoz++: Versa- tile hierarchical object-to-zone graph for object navigation. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2025. 1
2025
-
[57]
Function-centric bayesian network for zero- shot object goal navigation
Sixian Zhang, Xinyao Yu, Xinhang Song, Yiyao Wang, and Shuqiang Jiang. Function-centric bayesian network for zero- shot object goal navigation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19535– 19545, 2025. 2
2025
-
[58]
Navgpt: Explicit reasoning in vision-and-language navigation with large lan- guage models
Gengze Zhou, Yicong Hong, and Qi Wu. Navgpt: Explicit reasoning in vision-and-language navigation with large lan- guage models. InProceedings of the AAAI Conference on Artificial Intelligence, pages 7641–7649, 2024. 1, 3
2024
-
[59]
Esc: Ex- ploration with soft commonsense constraints for zero-shot object navigation
Kaiwen Zhou, Kaizhi Zheng, Connor Pryor, Yilin Shen, Hongxia Jin, Lise Getoor, and Xin Eric Wang. Esc: Ex- ploration with soft commonsense constraints for zero-shot object navigation. InInternational Conference on Machine Learning, pages 42829–42842. PMLR, 2023. 2, 8
2023
-
[60]
Beliefmapnav: 3d voxel-based belief map for zero- shot object navigation.Advances in Neural Information Pro- cessing Systems, 2025
Zibo Zhou, Yue Hu, Lingkai Zhang, Zonglin Li, and Siheng Chen. Beliefmapnav: 3d voxel-based belief map for zero- shot object navigation.Advances in Neural Information Pro- cessing Systems, 2025. 2, 3, 8
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.