arxiv: 2605.09869 · v1 · submitted 2026-05-11 · 💻 cs.RO · cs.CV

Recognition: no theorem link

ConsistNav: Closing the Action Consistency Gap in Zero-Shot Object Navigation with Semantic Executive Control

Defeng Gu, Haosen Wang, Kai Li, Liaoyuan Fan, Lutao Jiang, Tingbang Liang, Wenjian Hou, Yibin Wen, Yinqiang Zhang, Yizhou Zhao, Zhenyang Li, Zongqi He

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:09 UTC · model grok-4.3

classification 💻 cs.RO cs.CV

keywords zero-shot object navigationsemantic executiveaction consistencypersistent memoryfinite-state controlembodied navigationrobot navigation

0 comments

The pith

A semantic executive with persistent memory and guarded phases closes the action consistency gap in zero-shot object navigation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies that zero-shot navigation agents repeatedly reinterpret semantic evidence at each step, causing oscillation between exploration and pursuit or abandonment near success. ConsistNav counters this by adding a training-free semantic executive on top of existing detectors and planners. The executive stages pursuit through finite-state phases, accumulates cross-frame evidence into stable hypotheses, and applies stability controls to suppress bad actions. This setup improves success rate by 11.4 percent and SPL by 7.9 percent over baseline on MP3D while remaining compatible with any open-vocabulary detector. A sympathetic reader would care because it turns inconsistent per-step decisions into coherent episode-long behavior without retraining or altering core perception modules.

Core claim

ConsistNav builds a semantic executive around three coordinated modules: a Finite-State Executive Controller that advances target pursuit through guarded semantic phases, a Persistent Candidate Memory that aggregates cross-frame evidence into stable object hypotheses, and Stability-Aware Action Control that suppresses rotational stagnation, ineffective pursuit, and unverified stopping. The design leaves the detector and low-level planner unchanged and instead decides when semantic evidence is allowed to influence navigation. Experiments on HM3D and MP3D show state-of-the-art results among compared zero-shot ObjectNav methods.

What carries the argument

The semantic executive, a training-free controller with three modules (Finite-State Executive Controller, Persistent Candidate Memory, and Stability-Aware Action Control) that manages when and how semantic evidence drives navigation decisions across an episode.

If this is right

Agents maintain persistent target hypotheses instead of oscillating between exploration and pursuit.
Success rate rises 11.4 percent and SPL rises 7.9 percent over the controlled baseline on MP3D.
The framework works with any open-vocabulary detector and low-level planner without modification.
Phase transitions and stability controls reduce premature abandonment near the target.
Real-world robot experiments confirm robustness of the executive mechanism.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same executive structure could be applied to other embodied tasks that require consistent commitment over long horizons.
Stronger detectors would likely amplify gains, but the consistency layer itself addresses a separate failure mode.
Control logic layered above perception may prove more scalable than retraining perception models for every new task variant.
Testing the approach in environments with moving objects would reveal whether memory persistence still holds when evidence changes.

Load-bearing premise

Semantic evidence from open-vocabulary detectors remains reliable enough across frames for the memory module to form stable hypotheses and for the controller to make correct phase transitions without being misled by systematic false positives.

What would settle it

Deploying ConsistNav in an environment where detector false positives create persistent wrong hypotheses that the stability module cannot override, resulting in lower success rates than the baseline.

Figures

Figures reproduced from arXiv: 2605.09869 by Defeng Gu, Haosen Wang, Kai Li, Liaoyuan Fan, Lutao Jiang, Tingbang Liang, Wenjian Hou, Yibin Wen, Yinqiang Zhang, Yizhou Zhao, Zhenyang Li, Zongqi He.

**Figure 1.** Figure 1: ConsistNav pipeline. ⃝1 Perception converts RGB-D and target cues through VLM scoring into value maps; ⃝2A ⃝2B planning maintains candidates and selects frontier/candidate subgoals; ⃝3 execution outputs LEFT, FORWARD, RIGHT, and STOP actions through the FSE controller. Thus, Ct stores accumulated evidence, qt gates planning, and at remains in the standard ObjectNav action space. The following subsections m… view at source ↗

**Figure 2.** Figure 2: Candidate Memory and FSE Controller. Left: Candidate Memory builds/stores the semantic candidate map. Right: seven-state FSE transitions, with black/green for commitment/success, gray/yellow for invalidation/recovery, and blue for returning to search. Consistency score and priority. To decide which hypotheses can influence control, the executive first converts the memory fields into a consistency score s … view at source ↗

**Figure 3.** Figure 3: Simulation results on HM3Dv2. Qualitative comparison of ConsistNav, VLFM, and ApexNav. Each column shows one episode; green/blue paths denote reference/agent trajectories, and green/black frames denote success/failure. candidates become explicit search failures rather than unstable commitments, while infeasible and late-discovery cases remain dataset-level limits. 4.4 ABLATION STUDY Ablation analysis [PIT… view at source ↗

**Figure 4.** Figure 4: Failure-cause comparison. Outcome statistics for the Non-executive method and ConsistNav on HM3Dv1, HM3Dv2, and MP3D, covering verified success and five residual failure modes [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Real-world deployment comparison. Visual comparison of the Non-executive baseline and ConsistNav on four target tasks using the AgileX LIMO platform. The results illustrate that ConsistNav maintains target hypotheses, verifies close-range evidence, and stops reliably under real sensor and timing conditions. and path efficiency, and ablations show that each executive component contributes complementary gain… view at source ↗

read the original abstract

Zero-shot object navigation has advanced rapidly with open-vocabulary detectors, image--text models, and language-guided exploration. However, even after current methods detect a plausible target hypothesis, the agent may still oscillate between exploration and pursuit, or abandon the object near success. We identify this failure mode as an action consistency gap: semantic evidence is repeatedly reinterpreted at each step without persistent commitment across the episode. We introduce ConsistNav, a training-free zero-shot ObjectNav framework built around a semantic executive composed of three coordinated modules: Finite-State Executive Controller stages target pursuit through guarded semantic phases; Persistent Candidate Memory accumulates cross-frame target evidence into stable object hypotheses; and Stability-Aware Action Control suppresses rotational stagnation, ineffective pursuit, and unverified stopping. This design changes neither the detector nor the low-level planner; instead, it controls when semantic evidence should influence navigation and when it should be suppressed or revisited. We conduct extensive experiments on HM3D and MP3D, where ConsistNav achieves state-of-the-art results among compared zero-shot ObjectNav methods and improves SR by 11.4% and SPL by 7.9% over the controlled baseline on MP3D. Ablation studies and real-world deployment experiments further demonstrate the effectiveness and robustness of the proposed executive mechanism.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ConsistNav adds a lightweight semantic executive with persistent memory and phase control to reduce oscillation in zero-shot object navigation, delivering clear benchmark gains while depending on detector stability.

read the letter

The main takeaway is that this paper puts a training-free executive layer on top of existing zero-shot ObjectNav pipelines to stop the agent from flipping between exploration and pursuit or quitting too early. The three modules—finite-state controller for guarded phases, persistent candidate memory for cross-frame evidence, and stability-aware action control—work together to decide when semantic signals should drive behavior and when they should be ignored or revisited, without touching the detector or low-level planner.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes ConsistNav, a training-free zero-shot ObjectNav framework that addresses the action consistency gap via a semantic executive composed of a Finite-State Executive Controller for guarded phase transitions, Persistent Candidate Memory for accumulating cross-frame target evidence, and Stability-Aware Action Control for suppressing stagnation and unverified stopping. It reports state-of-the-art results among compared zero-shot methods on HM3D and MP3D, including an 11.4% Success Rate and 7.9% SPL improvement over the controlled baseline, along with ablations and real-world deployment.

Significance. If the empirical results hold under detailed scrutiny, the work provides a modular, detector- and planner-agnostic mechanism for enforcing persistent semantic commitment in navigation, which could meaningfully reduce oscillation and premature abandonment in practical zero-shot settings. The training-free nature and real-world validation strengthen its potential applicability in robotics.

major comments (2)

Experimental Evaluation: The central claim of 11.4% SR and 7.9% SPL gains on MP3D (and SOTA status) is presented without quantitative details on baseline implementations, statistical variance across runs, number of episodes evaluated, or exact hyperparameter settings for the controlled baseline and compared methods, rendering the improvements difficult to reproduce or assess for significance.
Persistent Candidate Memory and Finite-State Executive Controller: The accumulation of cross-frame semantic evidence into stable hypotheses and the guarded phase transitions assume open-vocabulary detector outputs remain sufficiently reliable to avoid locking onto false positives or misses (common in HM3D/MP3D due to occlusions and viewpoint changes). No independent semantic verification, confidence thresholding, or backtracking mechanism is described beyond Stability-Aware Action Control's focus on rotational stagnation and unverified stopping, which could allow systematic error propagation into the executive state.

minor comments (1)

Abstract and Experiments: Explicitly define the 'controlled baseline' and list all compared zero-shot methods with their key implementation references to allow direct comparison of the reported gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of reproducibility and robustness that we will address in the revision. Below we respond point-by-point to the major comments.

read point-by-point responses

Referee: Experimental Evaluation: The central claim of 11.4% SR and 7.9% SPL gains on MP3D (and SOTA status) is presented without quantitative details on baseline implementations, statistical variance across runs, number of episodes evaluated, or exact hyperparameter settings for the controlled baseline and compared methods, rendering the improvements difficult to reproduce or assess for significance.

Authors: We agree that the experimental section would benefit from greater specificity to support reproducibility. In the revised manuscript we will add: explicit descriptions of baseline re-implementations (including any adaptations made to the controlled baseline), the precise evaluation protocol with episode counts per dataset, all relevant hyperparameter values in a dedicated table or appendix, and statistical variance (means and standard deviations) computed over multiple random seeds or runs. These additions will allow readers to more readily verify the reported gains and assess their significance. revision: yes
Referee: Persistent Candidate Memory and Finite-State Executive Controller: The accumulation of cross-frame semantic evidence into stable hypotheses and the guarded phase transitions assume open-vocabulary detector outputs remain sufficiently reliable to avoid locking onto false positives or misses (common in HM3D/MP3D due to occlusions and viewpoint changes). No independent semantic verification, confidence thresholding, or backtracking mechanism is described beyond Stability-Aware Action Control's focus on rotational stagnation and unverified stopping, which could allow systematic error propagation into the executive state.

Authors: We acknowledge the valid concern regarding potential propagation of detector errors. The Persistent Candidate Memory accumulates detections over multiple frames precisely to filter transient false positives and misses caused by occlusions or viewpoint variation, while the guarded transitions of the Finite-State Executive Controller limit rapid state changes based on single unreliable observations. Stability-Aware Action Control further reduces the risk of unverified stopping. That said, the framework does not introduce separate confidence thresholding or explicit backtracking beyond these mechanisms, as the goal is to remain training-free and detector-agnostic. In the revision we will add a limitations subsection that discusses failure modes arising from persistent detector errors, supported by qualitative examples drawn from the existing experiments, and note possible future extensions. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical framework with external benchmarks

full rationale

The paper introduces a training-free zero-shot navigation framework with three modules (Finite-State Executive Controller, Persistent Candidate Memory, Stability-Aware Action Control) that coordinate semantic phases and action stability. No equations, fitted parameters, or first-principles derivations appear; the central claims are empirical improvements (SR +11.4%, SPL +7.9% on MP3D) measured against external baselines on public datasets HM3D/MP3D. Ablations and real-world tests provide independent verification. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps, and results do not reduce to quantities defined by the method itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework treats open-vocabulary detectors and low-level planners as reliable black boxes whose outputs can be selectively trusted or suppressed; assumes standard simulation benchmarks reflect real-world navigation challenges.

axioms (1)

domain assumption Open-vocabulary detectors and image-text models supply usable semantic evidence that can be accumulated and phased without systematic bias
Invoked throughout the executive design as the basis for memory and control decisions.

invented entities (1)

Semantic Executive no independent evidence
purpose: Coordinates finite-state control, persistent memory, and stability-aware actions to enforce consistency
Core novel component introduced to close the identified gap

pith-pipeline@v0.9.0 · 5574 in / 1386 out tokens · 58453 ms · 2026-05-12T05:09:40.190689+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 2 internal anchors

[1]

Batra, Dhruv and Gokaslan, Aaron and Kembhavi, Aniruddha and Maksymets, Oleksandr and Mottaghi, Roozbeh and Savva, Manolis and Toshev, Alexander and Wijmans, Erik , journal =

work page
[2]

Savva, Manolis and Kadian, Abhishek and Maksymets, Oleksandr and Zhao, Yili and Wijmans, Erik and Jain, Bhavana and Straub, Julian and Liu, Jia and Koltun, Vladlen and Malik, Jitendra and Parikh, Devi and Batra, Dhruv , booktitle =

work page
[3]

Ramakrishnan, Santhosh K. and Gokaslan, Aaron and Wijmans, Erik and Maksymets, Oleksandr and Clegg, Alexander and Turner, John and Undersander, Eric and Galuba, Wojciech and Westbury, Andrew and Chang, Angel X. and Savva, Manolis and Zhao, Yili and Batra, Dhruv , booktitle =

work page
[4]

and Dai, Angela and Funkhouser, Thomas and Halber, Maciej and Niessner, Matthias and Savva, Manolis and Song, Shuran and Zeng, Andy and Zhang, Yinda , booktitle =

Chang, Angel X. and Dai, Angela and Funkhouser, Thomas and Halber, Maciej and Niessner, Matthias and Savva, Manolis and Song, Shuran and Zeng, Andy and Zhang, Yinda , booktitle =. Matterport3D: Learning from

work page
[5]

Wijmans, Erik and Kadian, Abhishek and Morcos, Ari and Lee, Stefan and Essa, Irfan and Parikh, Devi and Batra, Dhruv and Maksymets, Oleksandr , booktitle =

work page
[6]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Object Goal Navigation using Goal-Oriented Semantic Exploration , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[7]

and Al-Halah, Ziad and Grauman, Kristen , booktitle =

Ramakrishnan, Santhosh K. and Al-Halah, Ziad and Grauman, Kristen , booktitle =

work page
[8]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

Habitat-Web: Learning Embodied Object-Goal Navigation from Human Demonstrations at Scale , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

work page
[9]

Yadav, Karmesh and Ramrakhya, Ram and Majumdar, Arjun and Berges, Vincent-Pierre and Kuhar, Sachit and Batra, Dhruv and Baevski, Alexei and Maksymets, Oleksandr , journal =

work page
[10]

Simple but Effective:

Khandelwal, Apoorv and Weihs, Luca and Mottaghi, Roozbeh and Kembhavi, Aniruddha , booktitle =. Simple but Effective:

work page
[11]

Majumdar, Arjun and Aggarwal, Gunjan and Devnani, Bhavika and Hoffman, Judy and Batra, Dhruv , booktitle =

work page
[12]

Gadre, Samir Yitzhak and Wortsman, Mitchell and Ilharco, Gabriel and Schmidt, Ludwig and Song, Shuran , booktitle =

work page
[13]

Yokoyama, Naoki and Ha, Sehoon and Batra, Dhruv and Wang, Jiuguang and Bucher, Bernadette , booktitle =

work page
[14]

Yu, Bangguo and Tan, Jie and Sarkar, Aurojit and Sherif, Muhammed and Burgard, Wolfram and Kulić, Dana , booktitle =

work page
[15]

Proceedings of the Conference on Robot Learning (CoRL) , year =

Shah, Dhruv and Osi. Proceedings of the Conference on Robot Learning (CoRL) , year =

work page
[16]

Liang, Zhiyuan and others , journal =

work page
[17]

Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA) , year =

A Frontier-Based Approach for Autonomous Exploration , author =. Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA) , year =

work page
[18]

Li, Junnan and Li, Dongxu and Savarese, Silvio and Hoi, Steven , booktitle =

work page
[19]

Grounding

Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Jiang, Chunyuan and Li, Hanwang and Sui, Zheng and Zhang, Lei , booktitle =. Grounding

work page
[20]

Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark , booktitle =

work page
[21]

Faster Segment Anything: Towards Lightweight

Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung-Ho and Lee, Seungkyu and Hong, Choong Seon , journal =. Faster Segment Anything: Towards Lightweight

work page
[22]

Automated Planning: Theory and Practice , author =

work page
[23]

and Precup, Doina and Singh, Satinder , journal =

Sutton, Richard S. and Precup, Doina and Singh, Satinder , journal =. Between

work page
[24]

Artificial Intelligence , volume =

Planning and Acting in Partially Observable Stochastic Domains , author =. Artificial Intelligence , volume =

work page
[25]

Proceedings of the International Conference on Machine Learning (ICML) , year =

Learning Transferable Visual Models from Natural Language Supervision , author =. Proceedings of the International Conference on Machine Learning (ICML) , year =

work page
[26]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year =

Segment Anything , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year =

work page
[27]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Visual Instruction Tuning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[28]

arXiv preprint arXiv:2303.08774 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[29]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year =

Emerging Properties in Self-Supervised Vision Transformers , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year =

work page
[30]

International Conference on Learning Representations (ICLR) , year =

Open-Vocabulary Object Detection via Vision and Language Knowledge Distillation , author =. International Conference on Learning Representations (ICLR) , year =

work page
[31]

Proceedings of the European Conference on Computer Vision (ECCV) , year =

Simple Open-Vocabulary Object Detection with Vision Transformers , author =. Proceedings of the European Conference on Computer Vision (ECCV) , year =

work page
[32]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

Grounded Language-Image Pre-Training , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

work page
[33]

Zhou, Kaiwen and Zheng, Kaizhi and Pryor, Connor and Shen, Yilin and Jin, Hongxia and Getoor, Lise and Wang, Xin Eric , booktitle =

work page
[34]

Rajvanshi, Abhinav and Sikka, Karan and Lin, Xiao and Lee, Bhoram and Chiu, Han-Pang and Velasquez, Alvaro , booktitle =

work page
[35]

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , year =

Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill , author =. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , year =

work page
[36]

Kuang, Yuxuan and Lin, Hai and Jiang, Meng , booktitle =

work page
[37]

Long, Yuxing and Cai, Wenzhe and Wang, Hongcheng and Chen, Guanqi and Dong, Hao , journal =

work page
[38]

Zhang, Lingfeng and Zhang, Qiang and Wang, Hao and Xiao, Erjia and Jiang, Zixuan and Chen, Honglei and Xu, Renjing , journal =

work page
[39]

Yin, Hang and Xu, Xiuwei and Wu, Zhenyu and Zhou, Jie and Lu, Jiwen , journal =

work page
[40]

Zhang, Jiazhao and Wang, Kunyu and Xu, Rongtao and Geng, Gengze and Zhao, Yicong and Chen, Xiaomeng and Wei, Shibo and Zhao, Peng and Xu, Kai and He, Xuelong and Liu, Zuxuan and Li, Yu-Gang , booktitle =

work page
[41]

Learning to Explore Using Active Neural

Chaplot, Devendra Singh and Gandhi, Dhiraj and Gupta, Saurabh and Gupta, Abhinav and Salakhutdinov, Ruslan , booktitle =. Learning to Explore Using Active Neural

work page
[42]

Ramrakhya, Ram and Batra, Dhruv and Wijmans, Erik and Das, Abhishek , booktitle =

work page
[43]

Deitke, Matt and VanderBilt, Eli and Herrasti, Alvaro and Weihs, Luca and Ehsani, Kiana and Salvador, Jordi and Han, Winson and Kolve, Eric and Kembhavi, Aniruddha and Mottaghi, Roozbeh , booktitle =

work page
[44]

Maksymets, Oleksandr and Cartillier, Vincent and Gokaslan, Aaron and Wijmans, Erik and Galuba, Wojciech and Lee, Stefan and Batra, Dhruv , booktitle =

work page
[45]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year =

Hierarchical Object-to-Zone Graph for Object Navigation , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , year =

work page
[46]

Hong, Yicong and Wu, Qi and Qi, Yuankai and Rodriguez-Opazo, Cristian and Gould, Stephen , booktitle =

work page
[47]

An, Dong and Wang, Hanqing and Wang, Wenguan and Wang, Zun and Dai, Yan and He, Jianbing and Shen, Linyi and Wang, Jiao and Zhang, Liang , journal =

work page
[48]

On Evaluation of Embodied Navigation Agents

On Evaluation of Embodied Navigation Agents , author =. arXiv preprint arXiv:1807.06757 , year =

work page internal anchor Pith review arXiv
[49]

A Survey of Embodied

Duan, Jiafei and Yu, Samson and Tan, Hui Li and Zhu, Hongyuan and Tan, Cheston , journal =. A Survey of Embodied

work page
[50]

Rosinol, Antoni and Abate, Marcus and Chang, Yun and Carlone, Luca , booktitle =

work page
[51]

and Leutenegger, Stefan , booktitle =

McCormac, John and Handa, Ankur and Davison, Andrew J. and Leutenegger, Stefan , booktitle =

work page
[52]

Planning Algorithms , author =

work page
[53]

Behavior Trees in Robotics and

Colledanchise, Michele and. Behavior Trees in Robotics and

work page
[54]

IEEE Robotics & Automation Magazine , volume =

The Dynamic Window Approach to Collision Avoidance , author =. IEEE Robotics & Automation Magazine , volume =

work page
[55]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Flamingo: A Visual Language Model for Few-Shot Learning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[56]

Driess, Danny and Xia, Fei and Sajjadi, Mehdi S. M. and Lynch, Corey and Chowdhery, Aakanksha and Ichter, Brian and Wahid, Ayzaan and Tompson, Jonathan and Vuong, Quan and Yu, Tianhe and Huang, Wenlong and Chebotar, Yevgen and Sermanet, Pierre and Duckworth, Daniel and Levine, Sergey and Vanhoucke, Vincent and Hausman, Karol and Tober, Marc and Zeng, Andy...

work page
[57]

Chen, Deyao and Liu, Zongyu and Zhu, Jingliao and Ren, Zeqian and Yan, Jianfeng and Che, Wanxiang and Liu, Ting , booktitle =

work page
[58]

Science Robotics , year =

Navigating to Objects in the Real World , author =. Science Robotics , year =

work page
[59]

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , year =

Visual Language Maps for Robot Navigation , author =. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , year =

work page
[60]

Shah, Dhruv and Eysenbach, Benjamin and Kahn, Gregory and Levine, Sergey , booktitle =

work page
[61]

Proceedings of the International Conference on Machine Learning (ICML) , year =

Think Before You Act: Decision Transformers with Working Memory , author =. Proceedings of the International Conference on Machine Learning (ICML) , year =

work page
[62]

Brohan, Anthony and Brown, Noah and Carbajal, Justice and Chebotar, Yevgen and Chen, Xi and Choromanski, Krzysztof and Ding, Tianli and Driess, Danny and Dubey, Avinava and Finn, Chelsea and others , journal =

work page
[63]

Ahn, Michael and Brohan, Anthony and Brown, Noah and Chebotar, Yevgen and Cortes, Omar and David, Byron and Finn, Chelsea and Fu, Chuyuan and Gober, Keerthana and Gopalakrishnan, Karol and others , booktitle =. Do As

work page
[64]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Attention Is All You Need , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[65]

International Conference on Learning Representations (ICLR) , year =

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author =. International Conference on Learning Representations (ICLR) , year =

work page
[66]

Mur-Artal, Raul and Montiel, J. M. M. and Tard. IEEE Transactions on Robotics , volume =

work page
[67]

IEEE Transactions on Robotics , volume =

Campos, Carlos and Elvira, Richard and Rodr. IEEE Transactions on Robotics , volume =

work page
[68]

, booktitle =

Quigley, Morgan and Conley, Ken and Gerkey, Brian and Faust, Josh and Foote, Tully and Leibs, Jeremy and Wheeler, Rob and Ng, Andrew Y. , booktitle =

work page
[69]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Habitat 2.0: Training Home Assistants to Rearrange their Habitat , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page