arxiv: 2605.02487 · v3 · submitted 2026-05-04 · 💻 cs.RO

Recognition: no theorem link

Visibility-Aware Mobile Grasping in Dynamic Environments

Tianrun Hu , Anxing Xiao , David Hsu , Hanbo Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:23 UTC · model grok-4.3

classification 💻 cs.RO

keywords mobile graspingdynamic environmentsactive perceptionbehavior treeswhole-body planningvisibility constraintsunknown environmentsmobile manipulator

0 comments

The pith

A unified system integrates whole-body planning with active perception and behavior trees to let mobile robots grasp objects safely in unknown dynamic environments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that mobile grasping succeeds in spaces with moving obstacles and limited initial views when perception and motion are handled together rather than separately. A low-level iterative planner moves the robot body while sensing obstacle velocities to shrink uncertainty, and a high-level behavior-tree module creates fresh subgoals for exploration or when plans break. Readers would care because prior methods either freeze the robot or risk collisions once unseen objects cross the path, limiting real deployment in homes or factories. If the integration holds, robots finish fetch tasks with fewer stops and lower crash rates even as the scene changes. Experiments in hundreds of simulations plus physical tests on a Fetch manipulator report the resulting gains over earlier decoupled approaches.

Core claim

The central claim is that an iterative low-level whole-body planner paired with velocity-aware active perception safely navigates dynamic unknowns while a hierarchical behavior-tree planner adaptively supplies subgoals, producing success rates of 68.8 percent in unknown static scenes and 58.0 percent in unknown dynamic scenes together with improved collision avoidance.

What carries the argument

The iterative low-level whole-body planner coupled with velocity-aware active perception, which reduces uncertainty about moving obstacles while the robot advances toward a grasp configuration.

If this is right

The robot maintains progress toward the grasp even when new obstacles appear by updating plans from fresh velocity data.
Behavior trees let the system switch between exploration, grasping, and recovery modes without external intervention when failures occur at runtime.
Success and safety improve consistently across four hundred randomized simulation trials and carry over to physical deployment on a mobile manipulator.
The approach avoids the safety failures that arise when perception and motion are decoupled in changing environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same coupling of velocity-aware sensing and adaptive subgoal generation could support related tasks such as object placement or drawer opening in populated spaces.
Limits would appear if obstacle speeds exceed what the sensor and replanner can track, suggesting targeted stress tests with faster or occluded movers.
Replacing explicit velocity estimation with learned motion prediction might further tighten reaction times in highly cluttered scenes.

Load-bearing premise

The planner and sensors can locate and dodge unobserved dynamic obstacles before they reach the robot, which requires adequate sensor range and replanning speed relative to obstacle motion.

What would settle it

Run repeated trials with obstacles that enter the path from outside the current field of view at speeds exceeding the documented replanning rate and measure whether collision frequency rises sharply above the reported baseline.

Figures

Figures reproduced from arXiv: 2605.02487 by Anxing Xiao, David Hsu, Hanbo Zhang, Tianrun Hu.

**Figure 1.** Figure 1: Visibility-Aware Mobile Grasping. A mobile manipulator approaches a target while a chair appears in its path. sensor of limited field-of-view, which is the common setting of many robots [5]; the grasp configuration is underspecified; and the trajectory to the goal is possibly blocked by unknown, changing obstacles. In this domain, the robot must plan a trajectory to the target, but it must strictly accoun… view at source ↗

**Figure 2.** Figure 2: System Architecture Overview. The receding-horizon framework iteratively refines the belief state bt = (qt, Mt, gt) through four coupled components: hierarchical subgoal policy generates intermediate goals gt, active perception policy directs gaze while updating map Mt, whole-body motion policy computes collision-free trajectories ξt to gt, and localization and control executes the trajectory while trackin… view at source ↗

**Figure 4.** Figure 4: Formally, given a map Mt and a target p as inputs, we need to sample simultaneously the base pose, torso height, and end-effector pose. We sample candidate configurations of them through three distributions: (1) pgeo samples collisionfree base poses qb from a 2D signed distance field that is updated in real-time from the point cloud; (2) ptorso queries a pre-built torso height map to select h that maximiz… view at source ↗

**Figure 5.** Figure 5: State-Dependent Gaze Optimization. During planning (ξt = ∅), the importance field concentrates on the target region to enable grasp pose generation. During execution (ξt ̸= ∅), the field shifts to the swept volume V (ξt) (colorcoded heatmap), weighted by velocity and temporal proximity to prioritize observation of collision-critical regions. ξt is being executed. During execution of ξt, the gaze is optim… view at source ↗

**Figure 6.** Figure 6: Evaluation Environments. Top: Five of 20 simulation scenes (ReplicaCAD) with varied furniture layouts and target placements. Bottom: Real-world deployment across five indoor locations—dining table, kitchen counter, workstation, coffee table, and sofa. Complete test cases and results are provided in Appendix I and Appendix K. for the torso, and 7D for the arm joints. This formulation naturally handles whole… view at source ↗

**Figure 7.** Figure 7: Qualitative results showing the execution sequence in simulation. From left to right: starting pose, observation stage, view at source ↗

**Figure 8.** Figure 8: Success Rate Comparison. Ablation study results across unknown static and dynamic simulation environments. Our full system achieves the highest success rates (68.8% and 58.0%), with subgoal generation providing the largest performance gain. Detailed failure breakdowns are provided in Appendix J. navigate, grasp) without adaptive goal resampling or gaze control. It ablated the benefit from the subgoal polic… view at source ↗

**Figure 10.** Figure 10: Failure flow in unknown static (left) and dynamic (right) environments in simulation benchmark. view at source ↗

**Figure 11.** Figure 11: Real robot evaluation results. Process flow and failure breakdown for static unknown (left) and dynamic (right) view at source ↗

**Figure 12.** Figure 12: Simulation Test Scenarios. Top-down views of all 20 ReplicaCAD scenes used for evaluation. Green dots indicate robot starting positions; red dots indicate target object locations. Scenes vary in room layout, furniture density, and navigational complexity. includes cases where the grasp pose is geometrically valid but execution errors (positioning, timing) cause failure. • IK Failure: The inverse kinematic… view at source ↗

**Figure 13.** Figure 13: System-level comparison. Success rates for all six methods across unknown static and dynamic environments. The CapMap Placement design suffers from high collision rates due to its sequential architecture, while the Closed-Loop Replanning design improves over sequential designs but lacks the structured task management of our behavior tree. where N is the number of episodes, Si ∈ {0, 1} is a binary success … view at source ↗

**Figure 16.** Figure 16: Sampling completeness analysis. Empirical completeness of pre-grasp configuration sampling. Left: Success rate as a function of the number of samples, showing rapid convergence to approximately 95%. Right: Computational time scales approximately linearly with the number of samples. successful obstacle avoidance. We evaluate trigger distances from 0.5 m to 2.5 m rather than starting from zero because the … view at source ↗

**Figure 15.** Figure 15: Obstacle distance analysis. Success rate and collision rate as a function of obstacle appearance distance. The system requires at least 1.5 m of reaction distance to achieve nearoptimal avoidance performance. Beyond this threshold, success rate plateaus as collision avoidance saturates and remaining failures are dominated by grasp and planning limitations view at source ↗

read the original abstract

This paper addresses the problem of mobile grasping in dynamic, unknown environments where a robot must operate under a limited field-of-view. The fundamental challenge is the inherent trade-off between ``seeing'' around to reduce environmental uncertainty and ``moving'' the body to achieve task progress in a high-dimensional configuration space, subject to visibility constraints. Previous approaches often assume known or static environments and decouple these objectives, failing to guarantee safety when unobserved dynamic obstacles intersect the robot's path during manipulation. In this paper, we propose a unified mobile grasping system comprising two core components: (1) an iterative low-level whole-body planner coupled with velocity-aware active perception to navigate dynamic environments safely; and (2) a hierarchical high-level planner based on behavior trees that adaptively generates subgoals to guide the robot through exploration and runtime failures. We provide experimental results across 400 randomized simulation scenarios and real-world deployment on a Fetch mobile manipulator. Results show that our system achieves a success rate of 68.8\% and 58.0\% in unknown static and dynamic environments, respectively, significantly boosting success rates by 22.8\% and 18.0\% over the \nam approach in both unknown static and dynamic environments, with improved collision safety.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical integrated planner for dynamic mobile grasping with real-world tests, though the dynamic safety relies on unproven timing assumptions.

read the letter

The paper combines velocity-aware active perception with an iterative whole-body planner and behavior trees to let a mobile robot grasp objects in unknown dynamic scenes. It reports usable success rates on a Fetch robot and in simulation, which is the main thing to know. The integration looks new in how it runs the low-level planner iteratively with perception updates to handle moving obstacles, while the high-level BT adapts subgoals for exploration and when things go wrong. Previous work often assumes static or known worlds, so this unified approach addresses a real gap for outdoor or cluttered settings. The experiments stand out as a strength. They ran 400 randomized simulation trials and deployed on hardware, beating the NAM baseline by roughly 20 points in both static and dynamic cases. Concrete numbers like 68.8% and 58% success give a sense of where the system lands in practice. The soft spot is the safety argument for dynamic obstacles. The claim that the system navigates safely rests on the planner replanning fast enough and the perception catching unseen movers in time. The results don't include the necessary details on maximum obstacle speeds tested, replan rates, or worst-case latency, so the guarantee stays more assumed than proven. The lower success in dynamic environments shows collisions or other issues still occur, and more analysis of those failures would help. This work is for robotics engineers focused on mobile manipulation outside labs. Readers who need ideas for coupling perception and motion in real time will get something from it. It deserves a serious referee because the empirical side is there and the problem is relevant, even if the dynamic safety needs more scrutiny. I recommend sending it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper proposes a unified mobile grasping system for dynamic unknown environments with limited field-of-view. It integrates (1) an iterative low-level whole-body planner coupled with velocity-aware active perception for safe navigation and (2) a hierarchical high-level behavior-tree planner that generates adaptive subgoals. Experiments across 400 randomized simulation scenarios report success rates of 68.8% (unknown static) and 58.0% (unknown dynamic), with gains of 22.8% and 18.0% over the NAM baseline, plus real-world validation on a Fetch manipulator.

Significance. If the empirical improvements and collision-safety gains hold under broader conditions, the work could advance practical mobile manipulation in unstructured settings by explicitly addressing the visibility-motion trade-off. The scale of 400 randomized trials and real-robot deployment are positive empirical strengths.

major comments (2)

[§3 (low-level planner)] §3 (low-level planner): The central safety claim—that velocity-aware active perception plus iterative replanning can detect and avoid unobserved dynamic obstacles in time—is load-bearing for the 'safely' and 'improved collision safety' assertions, yet the manuscript supplies no quantitative bounds on obstacle speeds, sensor range, prediction horizon, or worst-case replanning latency. The 58.0% dynamic success rate therefore does not confirm that the assumption holds beyond the tested scenarios.
[§4 (experiments)] §4 (experiments): The 18.0% improvement over the NAM baseline cannot be fully assessed because the baseline implementation, dynamic-obstacle motion models (speeds, trajectories, densities), and any statistical significance tests are not described. This weakens the cross-condition claims.

minor comments (2)

[Abstract] Abstract: the acronym 'NAM' is undefined; expand it on first use.
[Results] Results tables/figures: report standard deviations or confidence intervals alongside the success-rate percentages to allow readers to judge the reliability of the reported gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the safety analysis and experimental details. We address each major comment below and will make the indicated revisions to improve the manuscript.

read point-by-point responses

Referee: [§3 (low-level planner)] The central safety claim—that velocity-aware active perception plus iterative replanning can detect and avoid unobserved dynamic obstacles in time—is load-bearing for the 'safely' and 'improved collision safety' assertions, yet the manuscript supplies no quantitative bounds on obstacle speeds, sensor range, prediction horizon, or worst-case replanning latency. The 58.0% dynamic success rate therefore does not confirm that the assumption holds beyond the tested scenarios.

Authors: We agree that the manuscript would benefit from explicit quantitative bounds to support the safety claims. While the 400 randomized trials provide empirical evidence of improved collision safety in the tested dynamic scenarios, we acknowledge that bounds on obstacle speeds, sensor range, prediction horizon, and replanning latency were not stated. In the revision, we will add a new paragraph to §3.2 that supplies these bounds based on the system parameters (e.g., camera field-of-view and update rate, planner frequency) together with a worst-case timing analysis showing timely detection and avoidance for the obstacle velocities used in simulation. This will clarify the scope of the claims without overstating generality beyond the evaluated conditions. revision: yes
Referee: [§4 (experiments)] The 18.0% improvement over the NAM baseline cannot be fully assessed because the baseline implementation, dynamic-obstacle motion models (speeds, trajectories, densities), and any statistical significance tests are not described. This weakens the cross-condition claims.

Authors: We agree that fuller documentation of the baseline and experimental conditions is required for readers to assess the reported gains. In the revised manuscript we will expand §4 with (i) a precise description of how the NAM baseline was implemented, (ii) the exact motion models for dynamic obstacles (including speed ranges, trajectory generation, and density), and (iii) the statistical tests performed on the 400 trials per condition. These additions will allow direct evaluation of the 18.0% improvement and the static/dynamic comparisons. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical system proposal with measured outcomes

full rationale

The paper describes a robotic grasping system with two planners and reports success rates from 400 simulation trials plus real-world tests on a Fetch manipulator. No equations, parameter fittings, or first-principles derivations appear in the provided text; the central claims are direct experimental measurements (68.8% and 58.0% success) compared against a baseline. No self-citations are invoked as load-bearing uniqueness theorems, and no predictions reduce by construction to fitted inputs or self-definitions. The work is self-contained as an engineering contribution whose validity rests on external benchmarks rather than internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The system rests on standard robotics modeling assumptions rather than new free parameters or invented entities.

axioms (2)

domain assumption Accurate kinematic and dynamic models of the mobile manipulator are available
Required for whole-body planning to be feasible
domain assumption Sensor observations can be fused in real time to update an occupancy or visibility map
Underpins the active-perception component

pith-pipeline@v0.9.0 · 5517 in / 1249 out tokens · 26548 ms · 2026-05-12T02:23:18.707179+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · 7 internal anchors

[1]

arXiv preprint arXiv:2401.12202 (2024)

P. Liu, Y . Orru, C. Paxton, N. M. M. Shafiullah, and L. Pinto, “Ok-robot: What really matters in integrating open-knowledge models for robotics,”arXiv preprint arXiv:2401.12202, 2024. [Online]. Available: https://arxiv.org/abs/2401.12202

work page arXiv 2024
[2]

Homerobot: Open-vocabulary mobile manipulation.arXiv preprint arXiv:2306.11565, 2024

S. Yenamandra, A. Ramachandran, K. Yadav, A. Wang, M. Khanna, T. Gervet, T.-Y . Yang, V . Jain, A. W. Clegg, J. Turner, Z. Kira, M. Savva, A. Chang, D. S. Chaplot, D. Batra, R. Mottaghi, Y . Bisk, and C. Paxton, “Homerobot: Open-vocabulary mobile manipulation,” 2024. [Online]. Available: https://arxiv. org/abs/2306.11565

work page arXiv 2024
[3]

Combining nav- igation and manipulation costs for time-efficient robot placement in mobile manipulation tasks,

F. Reister, M. Grotz, and T. Asfour, “Combining nav- igation and manipulation costs for time-efficient robot placement in mobile manipulation tasks,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9913–9920, 2022

work page 2022
[4]

Robot placement based on reachability inversion,

N. Vahrenkamp, T. Asfour, and R. Dillmann, “Robot placement based on reachability inversion,” in2013 IEEE International Conference on Robotics and Automation, 2013, pp. 1970–1975

work page 2013
[5]

Where should i look optimised gaze control for whole-body collision avoidance in dynamic environments,

M. Finean, W. Merkt, and I. Havoutis, “Where should i look optimised gaze control for whole-body collision avoidance in dynamic environments,”IEEE Robotics and Automation Letters, vol. PP, pp. 1–1, 12 2021

work page 2021
[6]

Mo- tions in microseconds via vectorized sampling-based planning,

W. Thomason, Z. Kingston, and L. E. Kavraki, “Mo- tions in microseconds via vectorized sampling-based planning,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 8749–8756

work page 2024
[7]

Demonstrating mobile manipulation in the wild: A metrics-driven approach,

M. Bajracharya, J. Borders, R. Cheng, D. Helmick, L. Kaul, D. Kruse, J. Leichty, J. Ma, C. Matl, F. Michel, C. Papazov, J. Petersen, K. Shankar, and M. Tjersland, “Demonstrating mobile manipulation in the wild: A metrics-driven approach,” inRobotics: Science and Systems XIX, ser. RSS2023. Robotics: Science and Systems Foundation, Jul. 2023. [Online]. Avai...

work page doi:10.15607/rss.2023.xix.055 2023
[8]

Go fetch: Mobile manipulation in unstructured environments,

K. Blomqvist, M. Breyer, A. Cramariuc, J. F ¨orster, M. Grinvald, F. Tschopp, J. J. Chung, L. Ott, J. Nieto, and R. Siegwart, “Go fetch: Mobile manipulation in unstructured environments,” 2020. [Online]. Available: https://arxiv.org/abs/2004.00899

work page arXiv 2020
[9]

Uncertainty-aware arm-base coordinated object grasping with a mobile manipulation platform,

D. Chen and G. v. Wichert, “Uncertainty-aware arm-base coordinated object grasping with a mobile manipulation platform,” inISR/Robotik 2014; 41st International Sym- posium on Robotics, 2014, pp. 1–6

work page 2014
[10]

Uncertainty aware mobile manipulator platform pose planning based on capability map,

Y . Meng, Y . Chen, and Y . Lou, “Uncertainty aware mobile manipulator platform pose planning based on capability map,” in2021 IEEE International Conference on Real-time Computing and Robotics (RCAR), 2021, pp. 123–128

work page 2021
[11]

Demonstrating adaptive mobile manipulation in retail environments,

M. Spahn, C. Pezzato, C. Salmi, R. Dekker, C. Wang, C. Pek, J. Kober, J. Alonso-Mora, C. Hern´andez Corbato, and M. Wisse, “Demonstrating adaptive mobile manipulation in retail environments,” inRobotics: Science and Systems (R:SS), 2024. [Online]. Available: https://www.roboticsproceedings.org/rss20/p047.html

work page 2024
[12]

Robi butler: Multimodal remote interaction with a household robot assistant,

A. Xiao, N. Janaka, T. Hu, A. Gupta, K. Li, C. Yu, and D. Hsu, “Robi butler: Multimodal remote interaction with a household robot assistant,” 2025. [Online]. Available: https://arxiv.org/abs/2409.20548

work page arXiv 2025
[13]

Real-time sampling-based safe motion planning for robotic manipulators in dynamic environments,

N. Covic, B. Lacevic, D. Osmankovic, and T. Uzunovic, “Real-time sampling-based safe motion planning for robotic manipulators in dynamic environments,”IEEE Transactions on Robotics, vol. 41, p. 5287–5306, 2025. [Online]. Available: http://dx.doi.org/10.1109/TRO.2025. 3598119

work page doi:10.1109/tro.2025 2025
[14]

Simultaneous scene reconstruction and whole-body motion planning for safe operation in dynamic environments,

M. N. Finean, W. Merkt, and I. Havoutis, “Simultaneous scene reconstruction and whole-body motion planning for safe operation in dynamic environments,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 3710–

work page 2021
[15]

Available: http://dx.doi.org/10.1109/ IROS51168.2021.9636860

[Online]. Available: http://dx.doi.org/10.1109/ IROS51168.2021.9636860

work page arXiv 2021
[16]

The design of stretch: A compact, lightweight mobile manipulator for indoor human environments,

M. Tzes, V . Vasilopoulos, Y . Kantaros, and G. J. Pappas, “Reactive informative planning for mobile manipulation tasks under sensing and environmental uncertainty,” in2022 International Conference on Robotics and Automation (ICRA). IEEE Press, 2022, p. 7320–7326. [Online]. Available: https://doi.org/10.1109/ICRA46639. 2022.9811642

work page doi:10.1109/icra46639 2022
[17]

Reactive planning for mobile manipulation tasks in unexplored semantic environments,

V . Vasilopoulos, Y . Kantaros, G. J. Pappas, and D. E. Koditschek, “Reactive planning for mobile manipulation tasks in unexplored semantic environments,” in2021 IEEE International Conference on Robotics and Automa- tion (ICRA), 2021, pp. 6385–6392

work page 2021
[18]

Base placement optimization for coverage mobile manipulation tasks,

H. Zhang, K. Mi, and Z. Zhang, “Base placement optimization for coverage mobile manipulation tasks,”

work page
[19]

Available: https://arxiv.org/abs/2304

[Online]. Available: https://arxiv.org/abs/2304. 08246

work page
[20]

Symbolic state space optimization for long horizon mobile manipulation planning,

X. Zhang, Y . Zhu, Y . Ding, Y . Jiang, Y . Zhu, P. Stone, and S. Zhang, “Symbolic state space optimization for long horizon mobile manipulation planning,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 866–872

work page 2023
[21]

B ∗: Efficient and optimal base placement for fixed-base manipulators,

Z. Zhao, L. Cui, S. Xie, S. Zhang, Z. Han, L. Ruan, and Y . Zhu, “B ∗: Efficient and optimal base placement for fixed-base manipulators,”IEEE Robotics and Automation Letters, vol. 10, no. 10, pp. 10 634–10 641, Oct. 2025. [Online]. Available: http://dx.doi.org/10.1109/LRA.2025.3604741

work page doi:10.1109/lra.2025.3604741 2025
[22]

Look before you sweep: Visibility-aware motion planning,

G. Goretkin, L. P. Kaelbling, and T. Lozano-P´erez, “Look before you sweep: Visibility-aware motion planning,” inProceedings of the Workshop on the Algorithmic Foundations of Robotics (WAFR). Springer, 2018, pp. 1–

work page 2018
[23]

Available: https://arxiv.org/abs/1901.06109

[Online]. Available: https://arxiv.org/abs/1901.06109

work page arXiv 1901
[24]

Perception-aware motion planning via multiobjective search on gpus,

B. Ichter, B. Landry, E. Schmerling, and M. Pavone, “Perception-aware motion planning via multiobjective search on gpus,” 2017. [Online]. Available: https: //arxiv.org/abs/1705.02408

work page arXiv 2017
[25]

Look as you leap: Planning simultaneous motion and perception for high-dof robots,

Q. Meng, E. Flores, C. Q.-P. na, P. Qian, Z. Kingston, S. K. Hamlin, V . Unhelkar, and L. E. Kavraki, “Look as you leap: Planning simultaneous motion and perception for high-dof robots,” 2025. [Online]. Available: https://arxiv.org/abs/2509.19610

work page arXiv 2025
[26]

Greedy but safe re- planning under kinodynamic constraints,

K. E. Bekris and L. E. Kavraki, “Greedy but safe re- planning under kinodynamic constraints,” inProceedings 2007 IEEE International Conference on Robotics and Automation, 2007, pp. 704–710

work page 2007
[27]

Receding horizon

A. Bircher, M. Kamel, K. Alexis, H. Oleynikova, and R. Siegwart, “Receding horizon ”next-best-view” planner for 3d exploration,” in2016 IEEE International Con- ference on Robotics and Automation (ICRA), 2016, pp. 1462–1468

work page 2016
[28]

Closed-loop next-best-view planning for target-driven grasping,

M. Breyer, L. Ott, R. Siegwart, and J. J. Chung, “Closed-loop next-best-view planning for target-driven grasping,” 2022. [Online]. Available: https://arxiv.org/ abs/2207.10543

work page arXiv 2022
[29]

Multi-view picking: Next-best-view reaching for improved grasping in clutter,

D. Morrison, P. Corke, and J. Leitner, “Multi-view picking: Next-best-view reaching for improved grasping in clutter,” in2019 International Conference on Robotics and Automation (ICRA). IEEE Press, 2019, p. 8762–8768. [Online]. Available: https://doi.org/10.1109/ ICRA.2019.8793805

work page arXiv 2019
[30]

Affordance-driven next-best-view planning for robotic grasping,

X. Zhang, D. Wang, S. Han, W. Li, B. Zhao, Z. Wang, X. Duan, C. Fang, X. Li, and J. He, “Affordance-driven next-best-view planning for robotic grasping,” 2023. [Online]. Available: https://arxiv.org/abs/2309.09556

work page arXiv 2023
[31]

Hypothesis-based belief planning for dexterous grasping,

C. Zito, V . Ortenzi, M. Adjigble, M. Kopicki, R. Stolkin, and J. L. Wyatt, “Hypothesis-based belief planning for dexterous grasping,” 2019. [Online]. Available: https://arxiv.org/abs/1903.05517

work page arXiv 2019
[32]

Active- perceptive motion generation for mobile manipulation,

S. Jauhri, S. Lueth, and G. Chalvatzaki, “Active- perceptive motion generation for mobile manipulation,”

work page
[33]

Available: https://arxiv.org/abs/2310

[Online]. Available: https://arxiv.org/abs/2310. 00433

work page
[34]

Map space belief prediction for manipulation-enhanced mapping,

J. M. C. Marques, N. Dengler, T. Zaenker, J. Mucke, S. Wang, M. Bennewitz, and K. Hauser, “Map space belief prediction for manipulation-enhanced mapping,”

work page
[35]

Available: https://arxiv.org/abs/2502

[Online]. Available: https://arxiv.org/abs/2502. 20606

work page
[36]

Neo: A novel expeditious optimisation algorithm for reactive motion control of manipulators,

J. Haviland and P. Corke, “Neo: A novel expeditious optimisation algorithm for reactive motion control of manipulators,”IEEE Robotics and Automation Letters, vol. 6, no. 2, p. 1043–1050, Apr. 2021. [Online]. Available: http://dx.doi.org/10.1109/LRA.2021.3056060

work page doi:10.1109/lra.2021.3056060 2021
[37]

Continuous-time gaussian process motion planning via probabilistic inference,

M. Mukadam, J. Dong, X. Yan, F. Dellaert, and B. Boots, “Continuous-time gaussian process motion planning via probabilistic inference,”The International Journal of Robotics Research, vol. 37, no. 11, pp. 1319–1340, Sep. 2018. [Online]. Available: http: //dx.doi.org/10.1177/0278364918790369

work page doi:10.1177/0278364918790369 2018
[38]

Neural randomized planning for whole body robot motion,

Y . Lu, Y . Ma, D. Hsu, and P. Cai, “Neural randomized planning for whole body robot motion,” 2024. [Online]. Available: https://arxiv.org/abs/2405.11317

work page arXiv 2024
[39]

Real-time whole-body motion planning for mobile ma- nipulators using environment-adaptive search and spatial- temporal optimization,

C. Wu, R. Wang, M. Song, F. Gao, J. Mei, and B. Zhou, “Real-time whole-body motion planning for mobile ma- nipulators using environment-adaptive search and spatial- temporal optimization,” in2024 IEEE International Con- ference on Robotics and Automation (ICRA), 2024, pp. 1369–1375

work page 2024
[40]

Rampage: Toward whole-body, real-time, and agile motion planning in unknown cluttered environments for mobile manip- ulators,

Y . Yang, F. Meng, Z. Meng, and C. Yang, “Rampage: Toward whole-body, real-time, and agile motion planning in unknown cluttered environments for mobile manip- ulators,”IEEE Transactions on Industrial Electronics, vol. 71, no. 11, pp. 14 492–14 502, 2024

work page 2024
[41]

Nearest-neighbourless asymptot- ically optimal motion planning with Fully Connected Informed Trees (FCIT *),

T. S. Wilson, W. Thomason, Z. Kingston, L. E. Kavraki, and J. D. Gammell, “Nearest-neighbourless asymptot- ically optimal motion planning with Fully Connected Informed Trees (FCIT *),” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, 19–23 May 2025, pp. 14 140–14 146

work page 2025
[42]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

M. Ahn, A. Brohan, N. Brown, Y . Chebotar, R. Cortes, B. David, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzoget al., “Do as i can, not as i say: Grounding language in robotic affordances,” inConference on Robot Learning. PMLR, 2022, pp. 287–315. [Online]. Available: https://arxiv.org/abs/2204.01691

work page internal anchor Pith review Pith/arXiv arXiv 2022
[43]

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

W. Huang, C. Wang, R. Zhang, Y . Li, J. Wu, and L. Fei- Fei, “V oxposer: Composable 3d value maps for robotic manipulation with language models,” inConference on Robot Learning. PMLR, 2023, pp. 540–562. [Online]. Available: https://arxiv.org/abs/2307.05973

work page internal anchor Pith review arXiv 2023
[44]

Dynamem: Online dynamic spatio-semantic memory for open world mobile manipulation,

P. Liu, Z. Guo, M. Warke, S. Chintala, C. Paxton, N. M. M. Shafiullah, and L. Pinto, “Dynamem: Online dynamic spatio-semantic memory for open world mobile manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/2411.04999

work page arXiv 2025
[45]

Adaptive skill coordination for robotic mobile manipulation

N. Yokoyama, A. Clegg, J. Truong, E. Undersander, T.-Y . Yang, S. Arnaud, S. Ha, D. Batra, and A. Rai, “Asc: Adaptive skill coordination for robotic mobile manipulation,” 2023. [Online]. Available: https://arxiv. org/abs/2304.00410

work page arXiv 2023
[46]

Robot learning of mobile manipulation with reachability behavior priors,

S. Jauhri, J. Peters, and G. Chalvatzaki, “Robot learning of mobile manipulation with reachability behavior priors,”IEEE Robotics and Automation Letters, vol. 7, no. 3, p. 8399–8406, 2022. [Online]. Available: https://doi.org/10.1109/LRA.2022.3188109

work page doi:10.1109/lra.2022.3188109 2022
[47]

Pre-grasp approach- ing on mobile robots: A pre-active layered approach,

L. Naik, S. Kalkan, and N. Kr ¨uger, “Pre-grasp approach- ing on mobile robots: A pre-active layered approach,” IEEE Robotics and Automation Letters, vol. 9, no. 3, pp. 2606–2613, 2024

work page 2024
[48]

Quadwbg: Generalizable quadrupedal whole-body grasping,

J. Wang, J. Rajabov, C. Xu, Y . Zheng, and H. Wang, “Quadwbg: Generalizable quadrupedal whole-body grasping,” 2025. [Online]. Available: https://arxiv.org/abs/2411.06782

work page arXiv 2025
[49]

Gamma: Graspability- aware mobile manipulation policy learning based on online grasping pose fusion,

J. Zhang, N. Gireesh, J. Wang, X. Fang, C. Xu, W. Chen, L. Dai, and H. Wang, “Gamma: Graspability- aware mobile manipulation policy learning based on online grasping pose fusion,” 2024. [Online]. Available: https://arxiv.org/abs/2309.15459

work page arXiv 2024
[50]

RT-1: Robotics Transformer for Real-World Control at Scale

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsuet al., “Rt-1: Robotics transformer for real-world control at scale,”arXiv preprint arXiv:2212.06817, 2022. [Online]. Available: https://arxiv.org/abs/2212.06817

work page internal anchor Pith review Pith/arXiv arXiv 2022
[51]

PaLM-E: An Embodied Multimodal Language Model

D. Driess, F. Xia, M. S. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yuet al., “Palm-e: An embodied multimodal language model,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 665–13 675. [Online]. Available: https://arxiv.org/abs/2303.03378

work page internal anchor Pith review Pith/arXiv arXiv 2023
[52]

$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

P. Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, M. Y . Galliker, D. Ghosh, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. V...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[53]

Sg-vla: Learning spatially-grounded vision- language-action models for mobile manipulation,

R. Tu, A. Shukla, S. Yoo, X. Li, J. Li, J. Xie, H. Su, and Z. Tu, “Sg-vla: Learning spatially-grounded vision- language-action models for mobile manipulation,” 2026. [Online]. Available: https://arxiv.org/abs/2603.22760

work page arXiv 2026
[54]

Momanipvla: Transferring vision-language- action models for general mobile manipulation,

Z. Wu, Y . Zhou, X. Xu, Z. Wang, and H. Yan, “Momanipvla: Transferring vision-language- action models for general mobile manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/2503.13446

work page arXiv 2025
[55]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

B. Zitkovich, A. Apple, D. Bodnar, T. Nguyen, A. Brohan, Y . Chebotar, C. Finn, K. Hausmanet al., “Rt- 2: Vision-language-action models transferred from web- scale real-world data,”arXiv preprint arXiv:2307.15818,

work page internal anchor Pith review Pith/arXiv arXiv
[56]

Available: https://arxiv.org/abs/2307

[Online]. Available: https://arxiv.org/abs/2307. 15818

work page
[57]

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

M. Deitke, C. Clark, S. Lee, R. Tripathi, Y . Yang, J. S. Park, M. Salehi, N. Muennighoff, K. Lo, L. Soldaini, J. Lu, T. Anderson, E. Bransom, K. Ehsani, H. Ngo, Y . Chen, A. Patel, M. Yatskar, C. Callison-Burch, A. Head, R. Hendrix, F. Bastani, E. VanderBilt, N. Lambert, Y . Chou, A. Chheda, J. Sparks, S. Skjonsberg, M. Schmitz, A. Sarnat, B. Bischoff, P...

work page internal anchor Pith review arXiv 2024
[58]

Real-time instance detection with fast incremental learning

M. Sundermeyer, A. Mousavian, R. Triebel, and D. Fox, “Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE Press, 2021, p. 13438–13444. [Online]. Available: https://doi.org/10.1109/ICRA48506.2021.9561877

work page doi:10.1109/icra48506.2021.9561877 2021
[59]

Trac-ik: An open-source library for improved solving of generic inverse kinematics,

P. Beeson and B. Ames, “Trac-ik: An open-source library for improved solving of generic inverse kinematics,” in2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids). IEEE Press, 2015, p. 928–935. [Online]. Available: https://doi.org/10.1109/ HUMANOIDS.2015.7363472

work page arXiv 2015
[60]

Path planning in unstructured environments : A real-time hybrid a* implementation for fast and deterministic path generation for the kth research concept vehicle,

K. Kurzer, “Path planning in unstructured environments : A real-time hybrid a* implementation for fast and deterministic path generation for the kth research concept vehicle,” Master’s thesis, KTH, Integrated Transport Research Lab, ITRL, 2016. [Online]. Available: https://www.diva-portal.org/smash/record.jsf? pid=diva2:1057261

work page 2016
[61]

Rrt-connect: An efficient approach to single-query path planning,

J. Kuffner and S. LaValle, “Rrt-connect: An efficient approach to single-query path planning,” inProceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Pro- ceedings (Cat. No.00CH37065), vol. 2, 2000, pp. 995– 1001 vol.2

work page 2000
[62]

Creating high-quality roadmaps for motion planning in virtual environments,

R. Geraerts and M. H. Overmars, “Creating high-quality roadmaps for motion planning in virtual environments,” in2006 IEEE/RSJ International Conference on Intelli- gent Robots and Systems, 2006, pp. 4355–4361

work page 2006
[63]

Fast smoothing of manipulator trajectories using optimal bounded- acceleration shortcuts,

K. Hauser and V . Ng-Thow-Hing, “Fast smoothing of manipulator trajectories using optimal bounded- acceleration shortcuts,” in2010 IEEE International Con- ference on Robotics and Automation, 2010, pp. 2493– 2498

work page 2010
[64]

Collision- free and smooth trajectory computation in cluttered environments,

J. Pan, L. Zhang, and D. Manocha, “Collision- free and smooth trajectory computation in cluttered environments,”The International Journal of Robotics Research, vol. 31, no. 10, pp. 1155–1175, 2012. [Online]. Available: https://doi.org/10.1177/0278364912453186

work page doi:10.1177/0278364912453186 2012
[65]

LaNoising: A data-driven approach for 903nm ToF LiDAR performance modeling under fog,

S. Macenski, F. Martin, R. White, and J. Gin ´es Clavero, “The marathon 2: A navigation system,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020. [Online]. Available: https: //doi.org/10.1109/IROS45743.2020.9341207

work page doi:10.1109/iros45743.2020.9341207 2020
[66]

Maniskill3: Gpu parallelized robotics simulation and rendering for generalizable embodied ai.arXiv preprint arXiv:2410.00425, 2024

S. Tao, F. Xiang, A. Shukla, Y . Qin, X. Hinrichsen, X. Yuan, C. Bao, X. Lin, Y . Liu, T. kai Chan, Y . Gao, X. Li, T. Mu, N. Xiao, A. Gurha, V . N. Rajesh, Y . W. Choi, Y .-R. Chen, Z. Huang, R. Calandra, R. Chen, S. Luo, and H. Su, “Maniskill3: Gpu parallelized robotics simulation and rendering for generalizable embodied ai,”Robotics: Science and System...

work page arXiv 2025
[67]

Habitat 2.0: Training home assistants to rearrange their habitat,

A. Szot, A. Clegg, E. Undersander, E. Wijmans, Y . Zhao, J. Turner, N. Maestre, M. Mukadam, D. Chaplot, O. Maksymets, A. Gokaslan, V . V ondrus, S. Dharur, F. Meier, W. Galuba, A. Chang, Z. Kira, V . Koltun, J. Malik, M. Savva, and D. Batra, “Habitat 2.0: Training home assistants to rearrange their habitat,” inAdvances in Neural Information Processing Sys...

work page 2021
[68]

Acronym: A large-scale grasp dataset based on simulation,

C. Eppner, A. Mousavian, and D. Fox, “Acronym: A large-scale grasp dataset based on simulation,” in2021 IEEE International Conference on Robotics and Automa- tion (ICRA), 2021, pp. 6222–6227. APPENDIX This appendix provides supplementary material organized as follows: implementation details for grasp sampling, pre- grasp configuration generation, kinemati...

work page 2021
[69]

The object’s final height is within the expected range relative to the support surface (±5 cm below to 30 cm above)

work page
[70]

The object’s total displacement from spawn is less than 30 cm (no excessive rolling/bouncing)

work page
[71]

d) Oracle Grasp Checking:Stable objects undergo gras- pability validation to ensure each benchmark object is actually graspable under ideal conditions

No collision with existing objects at spawn time Objects failing these checks are re-sampled at alternative positions until a stable placement is found or the maximum attempts (100) are exceeded. d) Oracle Grasp Checking:Stable objects undergo gras- pability validation to ensure each benchmark object is actually graspable under ideal conditions. The proce...

work page
[72]

Approaches the target object without collision

work page
[73]

Executes a grasp that achieves stable contact with the object

work page
[74]

Lifts the object above a height threshold (10 cm above the support surface)

work page
[75]

Maintains stable grasp for 2 seconds after lifting This physics-based validation accounts for the robot’s full kinematic and dynamic constraints during execution. b) Failure Categories:We categorize failures into three groups to identify system bottlenecks: Execution Failuresoccur during physical interaction: •Collision:Any contact between the robot body ...

work page
[76]

System-Level Comparisons:We evaluate two alternative system designs that make different architectural choices for integrating navigation and manipulation, complementing the ablation study in the main paper. a) CapMap Placement:This system design follows the sequential navigation-then-manipulation paradigm but incorporates manipulation-aware base placement...

work page
[77]

SPL is defined as: SPL= 1 N NX i=1 Si · li max(pi, li) (14) Fig

Additional Analysis: a) Path Efficiency Analysis:We evaluate path efficiency using Success weighted by Path Length (SPL), which mea- sures how efficiently the robot reaches the target relative to the optimal path length. SPL is defined as: SPL= 1 N NX i=1 Si · li max(pi, li) (14) Fig. 13:System-level comparison.Success rates for all six methods across unk...

work page