Canonical reference

A survey of robotic navigation and manipulation with physics simulators in the era of embodied ai

Lik Hang Kenny Wong, Xueyang Kang, Kaixin Bai, Jianwei Zhang · 2025 · cs.RO · arXiv 2505.01458

Canonical reference. 100% of citing Pith papers cite this work as background.

8 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 8 citing papers arXiv PDF

abstract

Navigation and manipulation are core capabilities in Embodied AI, but training agents to perform them directly in the real world is costly, time-consuming, and unsafe. Therefore, sim-to-real transfer has emerged as a key approach, yet the sim-to-real gap persists. This survey examines how physics simulators address this gap by analyzing properties that have received limited attention in prior surveys. We also analyze their features for navigation and manipulation tasks, as well as their hardware requirements. Additionally, we offer a resource with benchmark datasets, metrics, simulation platforms, and methods to help researchers select suitable tools while accounting for hardware constraints.

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

EvoMemNav: Efficient Self-Evolving Fine-Grained Memory for Zero-Shot Embodied Navigation

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

EvoMemNav builds a Visual-Semantic Memory Graph keeping raw views, applies a budgeted coarse-to-fine policy, and uses reflection-driven updates to improve zero-shot navigation on GOAT-Bench and HM3D.

Plan in Sandbox, Navigate in Open Worlds: Learning Physics-Grounded Abstracted Experience for Embodied Navigation

cs.RO · 2026-05-11 · unverdicted · novelty 6.0

SAGE trains agents in physics-grounded semantic abstractions via RL with asymmetric clipping, achieving 53.21% LLM-Match Success on A-EQA (+9.7% over baseline) and encouraging physical robot transfer.

ClickSeg3D: Few-Click Interactive Segmentation via Semantic Embeddings

cs.CV · 2026-05-09 · unverdicted · novelty 6.0 · 2 refs

ClickSeg3D uses a point Transformer encoder and hierarchical mask decoder with semantic embeddings to enable single-pass multi-object 3D interactive segmentation from sparse points, reporting over 20% mIoU gains versus baselines and 8-10% cross-dataset improvements with one click per instance.

PhyMix: Towards Physically Consistent Single-Image 3D Indoor Scene Generation with Implicit--Explicit Optimization

cs.CV · 2026-04-11 · unverdicted · novelty 6.0

PhyMix unifies a new multi-aspect physics evaluator with implicit policy optimization and explicit test-time correction to produce single-image 3D indoor scenes that are both visually faithful and physically plausible.

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

cs.LG · 2026-02-20 · conditional · novelty 6.0 · 2 refs

MapTab is a new multimodal benchmark with 328 images and nearly 200k queries that shows current MLLMs have substantial difficulty with multi-criteria route planning when visual and tabular information must be combined.

Towards Precise Intent-Aligned VLA Aerial Navigation via Expert-Guided GRPO

cs.RO · 2026-06-01 · unverdicted · novelty 4.0

EG-GRPO augments VLA aerial navigation with expert-guided group relative policy optimization and a faster simulation pipeline, claiming 2.13x success rate and 60.9% better intent alignment versus SFT baseline.

Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap

cs.RO · 2026-04-15 · unverdicted · novelty 4.0

A survey of UAV vision-and-language navigation that establishes a methodological taxonomy, reviews resources and challenges, and proposes a forward-looking research roadmap.

NVIDIA Isaac Sim: Enabling Scalable, GPU-Accelerated Simulation for Robotics

cs.RO · 2026-06-02 · unverdicted · novelty 2.0

A survey reviewing the architecture, usage patterns, and limitations of NVIDIA Isaac Sim across robotics domains.

citing papers explorer

Showing 8 of 8 citing papers.

EvoMemNav: Efficient Self-Evolving Fine-Grained Memory for Zero-Shot Embodied Navigation cs.CV · 2026-06-02 · unverdicted · none · ref 41 · internal anchor
EvoMemNav builds a Visual-Semantic Memory Graph keeping raw views, applies a budgeted coarse-to-fine policy, and uses reflection-driven updates to improve zero-shot navigation on GOAT-Bench and HM3D.
Plan in Sandbox, Navigate in Open Worlds: Learning Physics-Grounded Abstracted Experience for Embodied Navigation cs.RO · 2026-05-11 · unverdicted · none · ref 41 · internal anchor
SAGE trains agents in physics-grounded semantic abstractions via RL with asymmetric clipping, achieving 53.21% LLM-Match Success on A-EQA (+9.7% over baseline) and encouraging physical robot transfer.
ClickSeg3D: Few-Click Interactive Segmentation via Semantic Embeddings cs.CV · 2026-05-09 · unverdicted · none · ref 47 · 2 links · internal anchor
ClickSeg3D uses a point Transformer encoder and hierarchical mask decoder with semantic embeddings to enable single-pass multi-object 3D interactive segmentation from sparse points, reporting over 20% mIoU gains versus baselines and 8-10% cross-dataset improvements with one click per instance.
PhyMix: Towards Physically Consistent Single-Image 3D Indoor Scene Generation with Implicit--Explicit Optimization cs.CV · 2026-04-11 · unverdicted · none · ref 38 · internal anchor
PhyMix unifies a new multi-aspect physics evaluator with implicit policy optimization and explicit test-time correction to produce single-image 3D indoor scenes that are both visually faithful and physically plausible.
MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs? cs.LG · 2026-02-20 · conditional · none · ref 79 · 2 links · internal anchor
MapTab is a new multimodal benchmark with 328 images and nearly 200k queries that shows current MLLMs have substantial difficulty with multi-criteria route planning when visual and tabular information must be combined.
Towards Precise Intent-Aligned VLA Aerial Navigation via Expert-Guided GRPO cs.RO · 2026-06-01 · unverdicted · none · ref 10 · internal anchor
EG-GRPO augments VLA aerial navigation with expert-guided group relative policy optimization and a faster simulation pipeline, claiming 2.13x success rate and 60.9% better intent alignment versus SFT baseline.
Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap cs.RO · 2026-04-15 · unverdicted · none · ref 34 · internal anchor
A survey of UAV vision-and-language navigation that establishes a methodological taxonomy, reviews resources and challenges, and proposes a forward-looking research roadmap.
NVIDIA Isaac Sim: Enabling Scalable, GPU-Accelerated Simulation for Robotics cs.RO · 2026-06-02 · unverdicted · none · ref 58 · internal anchor
A survey reviewing the architecture, usage patterns, and limitations of NVIDIA Isaac Sim across robotics domains.

A survey of robotic navigation and manipulation with physics simulators in the era of embodied ai

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer