hub Canonical reference

Objectnav revisited: On evaluation of embodied agents navigating to objects

· 2006 · arXiv 2006.13171

Canonical reference. 83% of citing Pith papers cite this work as background.

33 Pith papers citing it

Background 83% of classified citations

read on arXiv browse 33 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 baseline 1

citation-polarity summary

background 5 baseline 1

representative citing papers

Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

cs.CV · 2026-05-31 · accept · novelty 8.0

Introduces the TVR active viewpoint-matching task and TVRBench indoor simulation benchmark, where foundation models start at low single-digit success rates but reach 51.4% after visual-action SFT and multi-turn GRPO post-training.

When Robots Do the Chores: A Benchmark and Agent for Long-Horizon Household Task Execution

cs.AI · 2026-05-14 · unverdicted · novelty 8.0 · 2 refs

LongAct benchmark evaluates long-horizon household task execution from free-form instructions; HoloMind agent raises performance but top VLMs still reach only 59% goal completion and 16% full-task success.

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

cs.AI · 2026-05-10 · accept · novelty 8.0 · 2 refs

SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.

Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI

cs.CV · 2021-09-16 · accept · novelty 8.0

HM3D offers 1000 building-scale 3D environments that are larger and higher-fidelity than existing datasets, enabling better-performing embodied AI agents for tasks like PointGoal navigation.

LIME: Learning Intent-aware Camera Motion from Egocentric Video

cs.RO · 2026-07-02 · unverdicted · novelty 7.0

LIME formulates language-conditioned camera motion as predicting SE(3) target poses from RGB and intent text, using mined multi-intent supervision from egocentric video and a flow-matching pose head.

POINav: Benchmarking and Enhancing Final-Meters Arrival in Real-World Vision-Language Navigation

cs.RO · 2026-05-27 · unverdicted · novelty 7.0

POINav-Bench provides the first high-fidelity real-world benchmark for POI-goal VLN using 3DGS reconstructions of 126k m² with 163 POIs, supported by a Brain-Action framework and 70K real signage-entrance dataset.

IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

IntentionNav is a new benchmark showing that VLMs infer intended targets from implicit instructions in 48% of cases but achieve only 25% terminal success and 5.5% grounded success in active navigation.

Action-guided generation of 3D functionality segmentation data

cs.CV · 2025-11-28 · unverdicted · novelty 7.0

SynthFun3D generates synthetic 3D functionality segmentation data from action descriptions via object retrieval and scene arrangement, yielding consistent gains of +2.2 mAP, +6.3 mAR, and +5.7 mIoU when augmenting real data for VLM training.

SAGE-Nav: Leveraging LLM Planning and Alignment Fusion for Hierarchical Scene Graph-Guided Navigation

cs.RO · 2026-06-24 · unverdicted · novelty 6.0

SAGE-Nav decouples LLM global planning from reactive control via hierarchical scene graphs and alignment fusion, reporting SOTA results on i-THOR and RoboTHOR with improved efficiency and zero-shot generalization.

SurveilNav: Collaborative Object Goal Navigation with Robot and Surveillance System

cs.RO · 2026-06-23 · unverdicted · novelty 6.0

SurveilNav integrates robot local perception with multi-view surveillance for improved collaborative object goal navigation and reports SOTA results on HM3D.

NavWAM: A Navigation World Action Model for Goal-Conditioned Visual Navigation

cs.RO · 2026-06-11 · unverdicted · novelty 6.0

NavWAM is a diffusion-transformer policy that jointly learns future observation prediction, goal-progress values, and action chunks in a shared latent sequence for goal-conditioned visual navigation.

Semantic Evidence Regulation via Relational Bias for Zero-Shot Object Navigation

cs.RO · 2026-06-09 · conditional · novelty 6.0

DB-Nav/SER-Nav improves zero-shot object navigation by reranking frontier goals using activation from object co-occurrence and inhibition from similar distractors and failed visits.

Uni-LaViRA: Language-Vision-Robot Actions Translation for Unified Embodied Navigation

cs.RO · 2026-05-26 · unverdicted · novelty 6.0

A zero-shot unified agent for VLN-CE, ObjectNav, EQA and Aerial-VLN on wheeled, quadruped, humanoid and UAV platforms that translates language and vision inputs into actions via MLLMs plus TDM and SCB mechanisms, matching trained foundation models on multiple benchmarks.

ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries

cs.AI · 2026-05-07 · unverdicted · novelty 6.0 · 3 refs

ProCompNav builds a candidate pool from ambiguous queries then uses pool-splitting binary questions for disambiguation, improving success rate and shortening responses on CoIN-Bench and TextNav.

An Efficient Beam Search Algorithm for Active Perception in Mobile Robotics

cs.RO · 2026-04-25 · unverdicted · novelty 6.0

Node-wise beam search with expected gain and RRAG graph construction outperforms prior active perception methods by at least 20% on representative tasks.

ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

ESCAPE combines spatio-temporal fusion mapping for depth-free 3D memory with a memory-driven grounding module and adaptive execution policy to reach 65.09% success on ALFRED test-seen long-horizon mobile manipulation tasks.

Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting

cs.RO · 2026-04-14 · unverdicted · novelty 6.0

Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.

Visually-grounded Humanoid Agents

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

A coupled world-agent framework uses 3D Gaussian reconstruction and first-person RGB-D perception with iterative planning to enable goal-directed, collision-avoiding humanoid behavior in novel reconstructed scenes.

HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation

cs.AI · 2026-04-09 · unverdicted · novelty 6.0

HiRO-Nav adaptively triggers reasoning only on high-entropy actions via a hybrid training pipeline and shows better success-token trade-offs than always-reason or never-reason baselines on the CHORES-S benchmark.

ReMemNav: A Rethinking and Memory-Augmented Framework for Zero-Shot Object Navigation

cs.RO · 2026-03-25 · conditional · novelty 6.0

ReMemNav improves zero-shot object navigation success and efficiency by integrating episodic memory and rethinking with VLMs, achieving SR/SPL gains of 1.7%/7.0% on HM3D v0.1, 18.2%/11.1% on HM3D v0.2, and 8.7%/7.9% on MP3D.

MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation

cs.CV · 2026-02-05 · unverdicted · novelty 6.0

MerNav's Memory-Execute-Review framework improves success rates in zero-shot object goal navigation by 5-8% over baselines on four datasets while outperforming both training-free and supervised methods on key benchmarks.

C-NAV: Towards Self-Evolving Continual Object Navigation in Open World

cs.RO · 2025-10-23 · unverdicted · novelty 6.0

C-Nav is a continual visual navigation framework with dual-path anti-forgetting via feature distillation and replay plus adaptive sampling that outperforms baselines on a new continual object navigation benchmark while using less memory.

Personalized Embodied Navigation for Portable Object Finding

cs.RO · 2024-03-14 · unverdicted · novelty 6.0

Transit-Aware Planning (TAP) enriches navigation policies with object transit data on Dynamic Object Maps, raising success rates by 21.1% in MP3D simulation and 18.3% in real-world tests for finding non-stationary targets.

EvolveNav: Proactive Preflection and Self-Evolving Memory for Zero-Shot Object Goal Navigation

cs.AI · 2026-06-16 · unverdicted · novelty 5.0

EvolveNav adds an agentic rule memory with UCB retrieval and a memory-guided preflection module to enable continuous improvement in zero-shot object goal navigation, reporting a 10.1% success rate gain over baselines.

citing papers explorer

Showing 33 of 33 citing papers.

Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration? cs.CV · 2026-05-31 · accept · none · ref 42
Introduces the TVR active viewpoint-matching task and TVRBench indoor simulation benchmark, where foundation models start at low single-digit success rates but reach 51.4% after visual-action SFT and multi-turn GRPO post-training.
When Robots Do the Chores: A Benchmark and Agent for Long-Horizon Household Task Execution cs.AI · 2026-05-14 · unverdicted · none · ref 12 · 2 links
LongAct benchmark evaluates long-horizon household task execution from free-form instructions; HoloMind agent raises performance but top VLMs still reach only 59% goal completion and 16% full-task success.
SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning cs.AI · 2026-05-10 · accept · none · ref 9 · 2 links
SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.
Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI cs.CV · 2021-09-16 · accept · none · ref 34
HM3D offers 1000 building-scale 3D environments that are larger and higher-fidelity than existing datasets, enabling better-performing embodied AI agents for tasks like PointGoal navigation.
LIME: Learning Intent-aware Camera Motion from Egocentric Video cs.RO · 2026-07-02 · unverdicted · none · ref 7
LIME formulates language-conditioned camera motion as predicting SE(3) target poses from RGB and intent text, using mined multi-intent supervision from egocentric video and a flow-matching pose head.
POINav: Benchmarking and Enhancing Final-Meters Arrival in Real-World Vision-Language Navigation cs.RO · 2026-05-27 · unverdicted · none · ref 4
POINav-Bench provides the first high-fidelity real-world benchmark for POI-goal VLN using 3DGS reconstructions of 126k m² with 163 POIs, supported by a Brain-Action framework and 70K real signage-entrance dataset.
IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction cs.CV · 2026-05-22 · unverdicted · none · ref 4
IntentionNav is a new benchmark showing that VLMs infer intended targets from implicit instructions in 48% of cases but achieve only 25% terminal success and 5.5% grounded success in active navigation.
Action-guided generation of 3D functionality segmentation data cs.CV · 2025-11-28 · unverdicted · none · ref 5
SynthFun3D generates synthetic 3D functionality segmentation data from action descriptions via object retrieval and scene arrangement, yielding consistent gains of +2.2 mAP, +6.3 mAR, and +5.7 mIoU when augmenting real data for VLM training.
SAGE-Nav: Leveraging LLM Planning and Alignment Fusion for Hierarchical Scene Graph-Guided Navigation cs.RO · 2026-06-24 · unverdicted · none · ref 2
SAGE-Nav decouples LLM global planning from reactive control via hierarchical scene graphs and alignment fusion, reporting SOTA results on i-THOR and RoboTHOR with improved efficiency and zero-shot generalization.
SurveilNav: Collaborative Object Goal Navigation with Robot and Surveillance System cs.RO · 2026-06-23 · unverdicted · none · ref 13
SurveilNav integrates robot local perception with multi-view surveillance for improved collaborative object goal navigation and reports SOTA results on HM3D.
NavWAM: A Navigation World Action Model for Goal-Conditioned Visual Navigation cs.RO · 2026-06-11 · unverdicted · none · ref 16
NavWAM is a diffusion-transformer policy that jointly learns future observation prediction, goal-progress values, and action chunks in a shared latent sequence for goal-conditioned visual navigation.
Semantic Evidence Regulation via Relational Bias for Zero-Shot Object Navigation cs.RO · 2026-06-09 · conditional · none · ref 52
DB-Nav/SER-Nav improves zero-shot object navigation by reranking frontier goals using activation from object co-occurrence and inhibition from similar distractors and failed visits.
Uni-LaViRA: Language-Vision-Robot Actions Translation for Unified Embodied Navigation cs.RO · 2026-05-26 · unverdicted · none · ref 4
A zero-shot unified agent for VLN-CE, ObjectNav, EQA and Aerial-VLN on wheeled, quadruped, humanoid and UAV platforms that translates language and vision inputs into actions via MLLMs plus TDM and SCB mechanisms, matching trained foundation models on multiple benchmarks.
ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries cs.AI · 2026-05-07 · unverdicted · none · ref 14 · 3 links
ProCompNav builds a candidate pool from ambiguous queries then uses pool-splitting binary questions for disambiguation, improving success rate and shortening responses on CoIN-Bench and TextNav.
An Efficient Beam Search Algorithm for Active Perception in Mobile Robotics cs.RO · 2026-04-25 · unverdicted · none · ref 1
Node-wise beam search with expected gain and RRAG graph construction outperforms prior active perception methods by at least 20% on representative tasks.
ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation cs.CV · 2026-04-15 · unverdicted · none · ref 1
ESCAPE combines spatio-temporal fusion mapping for depth-free 3D memory with a memory-driven grounding module and adaptive execution policy to reach 65.09% success on ALFRED test-seen long-horizon mobile manipulation tasks.
Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting cs.RO · 2026-04-14 · unverdicted · none · ref 3
Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.
Visually-grounded Humanoid Agents cs.CV · 2026-04-09 · unverdicted · none · ref 6
A coupled world-agent framework uses 3D Gaussian reconstruction and first-person RGB-D perception with iterative planning to enable goal-directed, collision-avoiding humanoid behavior in novel reconstructed scenes.
HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation cs.AI · 2026-04-09 · unverdicted · none · ref 2
HiRO-Nav adaptively triggers reasoning only on high-entropy actions via a hybrid training pipeline and shows better success-token trade-offs than always-reason or never-reason baselines on the CHORES-S benchmark.
ReMemNav: A Rethinking and Memory-Augmented Framework for Zero-Shot Object Navigation cs.RO · 2026-03-25 · conditional · none · ref 1
ReMemNav improves zero-shot object navigation success and efficiency by integrating episodic memory and rethinking with VLMs, achieving SR/SPL gains of 1.7%/7.0% on HM3D v0.1, 18.2%/11.1% on HM3D v0.2, and 8.7%/7.9% on MP3D.
MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation cs.CV · 2026-02-05 · unverdicted · none · ref 5
MerNav's Memory-Execute-Review framework improves success rates in zero-shot object goal navigation by 5-8% over baselines on four datasets while outperforming both training-free and supervised methods on key benchmarks.
C-NAV: Towards Self-Evolving Continual Object Navigation in Open World cs.RO · 2025-10-23 · unverdicted · none · ref 2
C-Nav is a continual visual navigation framework with dual-path anti-forgetting via feature distillation and replay plus adaptive sampling that outperforms baselines on a new continual object navigation benchmark while using less memory.
Personalized Embodied Navigation for Portable Object Finding cs.RO · 2024-03-14 · unverdicted · none · ref 4
Transit-Aware Planning (TAP) enriches navigation policies with object transit data on Dynamic Object Maps, raising success rates by 21.1% in MP3D simulation and 18.3% in real-world tests for finding non-stationary targets.
EvolveNav: Proactive Preflection and Self-Evolving Memory for Zero-Shot Object Goal Navigation cs.AI · 2026-06-16 · unverdicted · none · ref 3
EvolveNav adds an agentic rule memory with UCB retrieval and a memory-guided preflection module to enable continuous improvement in zero-shot object goal navigation, reporting a 10.1% success rate gain over baselines.
Qwen-RobotNav Technical Report: A Scalable Navigation Model Designed for an Agentic Navigation System cs.RO · 2026-06-16 · unverdicted · none · ref 1 · 2 links
Qwen-RobotNav provides a parameterized navigation model trained on 15.6M samples with vision-language co-training that achieves SOTA results on benchmarks and zero-shot transfer to real robots.
IntentNav: Learning Spatial-Visual Object Navigation from Human Demonstrations cs.RO · 2026-06-06 · unverdicted · none · ref 1
IntentNav is a spatial-visual imitation framework that infers human search intent via frontier labeling to train VLM policies for object navigation, reporting SOTA on MP3D and HM3D benchmarks with zero-shot transfer to wheeled, quadruped, and humanoid robots.
STEM: Semantic Target Search and Exploration using MAVs in Cluttered Environments cs.RO · 2026-05-30 · unverdicted · none · ref 37
STEM develops a semantically-guided combinatorial planner and active perception pipeline that propagates object priorities to frontier voxels, enabling MAVs to find targets faster than baselines in simulation and real-world tests.
TravExplorer: Cross-Floor Embodied Exploration via Traversability-Aware 3-D Planning cs.RO · 2026-05-19 · unverdicted · none · ref 38
TravExplorer couples zero-shot semantic guidance with traversability-aware 3-D planning to enable cross-floor object navigation in unseen indoor environments.
CLUE: Adaptively Prioritized Contextual Cues by Leveraging a Unified Semantic Map for Effective Zero-Shot Object-Goal Navigation cs.RO · 2026-05-19 · unverdicted · none · ref 1
CLUE adaptively weights room-type and object-co-location cues from an LLM to construct a unified semantic value map that improves success rate and efficiency in zero-shot object-goal navigation.
MiniVLA-Nav v1: A Multi-Scene Simulation Dataset for Language-Conditioned Robot Navigation cs.RO · 2026-05-01 · unverdicted · none · ref 11
MiniVLA-Nav v1 provides 1,174 episodes of language-instructed robot navigation in photorealistic simulations with RGB, depth, segmentation, and expert action data.
OpenFrontier: General Navigation with Visual-Language Grounded Frontiers cs.RO · 2026-03-05 · unverdicted · none · ref 1
OpenFrontier treats navigation as sparse visual-frontier subgoal selection guided by vision-language priors, claiming strong zero-shot and real-robot performance without task-specific training.
Ask When It Pays: Cost-Aware Open-Ended Interaction for Instance Goal Navigation cs.CV · 2026-06-02 · unverdicted · none · ref 7
Proposes cost-aware question selection for ambiguous object navigation via information-gain analysis on corpora, a cost-penalizing benchmark, and a zero-shot MLLM agent.
Agent AI: Surveying the Horizons of Multimodal Interaction cs.AI · 2024-01-07 · unverdicted · none · ref 8
The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.

Objectnav revisited: On evaluation of embodied agents navigating to objects

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer