pith. machine review for the scientific record. sign in

arxiv: 2605.02487 · v3 · submitted 2026-05-04 · 💻 cs.RO

Recognition: no theorem link

Visibility-Aware Mobile Grasping in Dynamic Environments

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:23 UTC · model grok-4.3

classification 💻 cs.RO
keywords mobile graspingdynamic environmentsactive perceptionbehavior treeswhole-body planningvisibility constraintsunknown environmentsmobile manipulator
0
0 comments X

The pith

A unified system integrates whole-body planning with active perception and behavior trees to let mobile robots grasp objects safely in unknown dynamic environments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that mobile grasping succeeds in spaces with moving obstacles and limited initial views when perception and motion are handled together rather than separately. A low-level iterative planner moves the robot body while sensing obstacle velocities to shrink uncertainty, and a high-level behavior-tree module creates fresh subgoals for exploration or when plans break. Readers would care because prior methods either freeze the robot or risk collisions once unseen objects cross the path, limiting real deployment in homes or factories. If the integration holds, robots finish fetch tasks with fewer stops and lower crash rates even as the scene changes. Experiments in hundreds of simulations plus physical tests on a Fetch manipulator report the resulting gains over earlier decoupled approaches.

Core claim

The central claim is that an iterative low-level whole-body planner paired with velocity-aware active perception safely navigates dynamic unknowns while a hierarchical behavior-tree planner adaptively supplies subgoals, producing success rates of 68.8 percent in unknown static scenes and 58.0 percent in unknown dynamic scenes together with improved collision avoidance.

What carries the argument

The iterative low-level whole-body planner coupled with velocity-aware active perception, which reduces uncertainty about moving obstacles while the robot advances toward a grasp configuration.

If this is right

  • The robot maintains progress toward the grasp even when new obstacles appear by updating plans from fresh velocity data.
  • Behavior trees let the system switch between exploration, grasping, and recovery modes without external intervention when failures occur at runtime.
  • Success and safety improve consistently across four hundred randomized simulation trials and carry over to physical deployment on a mobile manipulator.
  • The approach avoids the safety failures that arise when perception and motion are decoupled in changing environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same coupling of velocity-aware sensing and adaptive subgoal generation could support related tasks such as object placement or drawer opening in populated spaces.
  • Limits would appear if obstacle speeds exceed what the sensor and replanner can track, suggesting targeted stress tests with faster or occluded movers.
  • Replacing explicit velocity estimation with learned motion prediction might further tighten reaction times in highly cluttered scenes.

Load-bearing premise

The planner and sensors can locate and dodge unobserved dynamic obstacles before they reach the robot, which requires adequate sensor range and replanning speed relative to obstacle motion.

What would settle it

Run repeated trials with obstacles that enter the path from outside the current field of view at speeds exceeding the documented replanning rate and measure whether collision frequency rises sharply above the reported baseline.

Figures

Figures reproduced from arXiv: 2605.02487 by Anxing Xiao, David Hsu, Hanbo Zhang, Tianrun Hu.

Figure 1
Figure 1. Figure 1: Visibility-Aware Mobile Grasping. A mobile manip￾ulator approaches a target while a chair appears in its path. sensor of limited field-of-view, which is the common setting of many robots [5]; the grasp configuration is underspecified; and the trajectory to the goal is possibly blocked by unknown, changing obstacles. In this domain, the robot must plan a trajectory to the target, but it must strictly accoun… view at source ↗
Figure 2
Figure 2. Figure 2: System Architecture Overview. The receding-horizon framework iteratively refines the belief state bt = (qt, Mt, gt) through four coupled components: hierarchical subgoal policy generates intermediate goals gt, active perception policy directs gaze while updating map Mt, whole-body motion policy computes collision-free trajectories ξt to gt, and localization and control executes the trajectory while trackin… view at source ↗
Figure 4
Figure 4. Figure 4: Formally, given a map Mt and a target p as inputs, we need to sample simultaneously the base pose, torso height, and end-effector pose. We sample candidate configurations of them through three distributions: (1) pgeo samples collision￾free base poses qb from a 2D signed distance field that is updated in real-time from the point cloud; (2) ptorso queries a pre-built torso height map to select h that maximiz… view at source ↗
Figure 5
Figure 5. Figure 5: State-Dependent Gaze Optimization. During plan￾ning (ξt = ∅), the importance field concentrates on the target region to enable grasp pose generation. During execution (ξt ̸= ∅), the field shifts to the swept volume V (ξt) (color￾coded heatmap), weighted by velocity and temporal proximity to prioritize observation of collision-critical regions. ξt is being executed. During execution of ξt, the gaze is optim… view at source ↗
Figure 6
Figure 6. Figure 6: Evaluation Environments. Top: Five of 20 simulation scenes (ReplicaCAD) with varied furniture layouts and target placements. Bottom: Real-world deployment across five indoor locations—dining table, kitchen counter, workstation, coffee table, and sofa. Complete test cases and results are provided in Appendix I and Appendix K. for the torso, and 7D for the arm joints. This formulation naturally handles whole… view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative results showing the execution sequence in simulation. From left to right: starting pose, observation stage, view at source ↗
Figure 8
Figure 8. Figure 8: Success Rate Comparison. Ablation study results across unknown static and dynamic simulation environments. Our full system achieves the highest success rates (68.8% and 58.0%), with subgoal generation providing the largest performance gain. Detailed failure breakdowns are provided in Appendix J. navigate, grasp) without adaptive goal resampling or gaze control. It ablated the benefit from the subgoal polic… view at source ↗
Figure 10
Figure 10. Figure 10: Failure flow in unknown static (left) and dynamic (right) environments in simulation benchmark. view at source ↗
Figure 11
Figure 11. Figure 11: Real robot evaluation results. Process flow and failure breakdown for static unknown (left) and dynamic (right) view at source ↗
Figure 12
Figure 12. Figure 12: Simulation Test Scenarios. Top-down views of all 20 ReplicaCAD scenes used for evaluation. Green dots indicate robot starting positions; red dots indicate target object locations. Scenes vary in room layout, furniture density, and navigational complexity. includes cases where the grasp pose is geometrically valid but execution errors (positioning, timing) cause failure. • IK Failure: The inverse kinematic… view at source ↗
Figure 13
Figure 13. Figure 13: System-level comparison. Success rates for all six methods across unknown static and dynamic environments. The CapMap Placement design suffers from high collision rates due to its sequential architecture, while the Closed-Loop Replanning design improves over sequential designs but lacks the structured task management of our behavior tree. where N is the number of episodes, Si ∈ {0, 1} is a binary success … view at source ↗
Figure 16
Figure 16. Figure 16: Sampling completeness analysis. Empirical com￾pleteness of pre-grasp configuration sampling. Left: Success rate as a function of the number of samples, showing rapid convergence to approximately 95%. Right: Computational time scales approximately linearly with the number of samples. successful obstacle avoidance. We evaluate trigger distances from 0.5 m to 2.5 m rather than starting from zero because the … view at source ↗
Figure 15
Figure 15. Figure 15: Obstacle distance analysis. Success rate and collision rate as a function of obstacle appearance distance. The system requires at least 1.5 m of reaction distance to achieve near￾optimal avoidance performance. Beyond this threshold, suc￾cess rate plateaus as collision avoidance saturates and remain￾ing failures are dominated by grasp and planning limitations view at source ↗
read the original abstract

This paper addresses the problem of mobile grasping in dynamic, unknown environments where a robot must operate under a limited field-of-view. The fundamental challenge is the inherent trade-off between ``seeing'' around to reduce environmental uncertainty and ``moving'' the body to achieve task progress in a high-dimensional configuration space, subject to visibility constraints. Previous approaches often assume known or static environments and decouple these objectives, failing to guarantee safety when unobserved dynamic obstacles intersect the robot's path during manipulation. In this paper, we propose a unified mobile grasping system comprising two core components: (1) an iterative low-level whole-body planner coupled with velocity-aware active perception to navigate dynamic environments safely; and (2) a hierarchical high-level planner based on behavior trees that adaptively generates subgoals to guide the robot through exploration and runtime failures. We provide experimental results across 400 randomized simulation scenarios and real-world deployment on a Fetch mobile manipulator. Results show that our system achieves a success rate of 68.8\% and 58.0\% in unknown static and dynamic environments, respectively, significantly boosting success rates by 22.8\% and 18.0\% over the \nam approach in both unknown static and dynamic environments, with improved collision safety.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a unified mobile grasping system for dynamic unknown environments with limited field-of-view. It integrates (1) an iterative low-level whole-body planner coupled with velocity-aware active perception for safe navigation and (2) a hierarchical high-level behavior-tree planner that generates adaptive subgoals. Experiments across 400 randomized simulation scenarios report success rates of 68.8% (unknown static) and 58.0% (unknown dynamic), with gains of 22.8% and 18.0% over the NAM baseline, plus real-world validation on a Fetch manipulator.

Significance. If the empirical improvements and collision-safety gains hold under broader conditions, the work could advance practical mobile manipulation in unstructured settings by explicitly addressing the visibility-motion trade-off. The scale of 400 randomized trials and real-robot deployment are positive empirical strengths.

major comments (2)
  1. [§3 (low-level planner)] §3 (low-level planner): The central safety claim—that velocity-aware active perception plus iterative replanning can detect and avoid unobserved dynamic obstacles in time—is load-bearing for the 'safely' and 'improved collision safety' assertions, yet the manuscript supplies no quantitative bounds on obstacle speeds, sensor range, prediction horizon, or worst-case replanning latency. The 58.0% dynamic success rate therefore does not confirm that the assumption holds beyond the tested scenarios.
  2. [§4 (experiments)] §4 (experiments): The 18.0% improvement over the NAM baseline cannot be fully assessed because the baseline implementation, dynamic-obstacle motion models (speeds, trajectories, densities), and any statistical significance tests are not described. This weakens the cross-condition claims.
minor comments (2)
  1. [Abstract] Abstract: the acronym 'NAM' is undefined; expand it on first use.
  2. [Results] Results tables/figures: report standard deviations or confidence intervals alongside the success-rate percentages to allow readers to judge the reliability of the reported gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the safety analysis and experimental details. We address each major comment below and will make the indicated revisions to improve the manuscript.

read point-by-point responses
  1. Referee: [§3 (low-level planner)] The central safety claim—that velocity-aware active perception plus iterative replanning can detect and avoid unobserved dynamic obstacles in time—is load-bearing for the 'safely' and 'improved collision safety' assertions, yet the manuscript supplies no quantitative bounds on obstacle speeds, sensor range, prediction horizon, or worst-case replanning latency. The 58.0% dynamic success rate therefore does not confirm that the assumption holds beyond the tested scenarios.

    Authors: We agree that the manuscript would benefit from explicit quantitative bounds to support the safety claims. While the 400 randomized trials provide empirical evidence of improved collision safety in the tested dynamic scenarios, we acknowledge that bounds on obstacle speeds, sensor range, prediction horizon, and replanning latency were not stated. In the revision, we will add a new paragraph to §3.2 that supplies these bounds based on the system parameters (e.g., camera field-of-view and update rate, planner frequency) together with a worst-case timing analysis showing timely detection and avoidance for the obstacle velocities used in simulation. This will clarify the scope of the claims without overstating generality beyond the evaluated conditions. revision: yes

  2. Referee: [§4 (experiments)] The 18.0% improvement over the NAM baseline cannot be fully assessed because the baseline implementation, dynamic-obstacle motion models (speeds, trajectories, densities), and any statistical significance tests are not described. This weakens the cross-condition claims.

    Authors: We agree that fuller documentation of the baseline and experimental conditions is required for readers to assess the reported gains. In the revised manuscript we will expand §4 with (i) a precise description of how the NAM baseline was implemented, (ii) the exact motion models for dynamic obstacles (including speed ranges, trajectory generation, and density), and (iii) the statistical tests performed on the 400 trials per condition. These additions will allow direct evaluation of the 18.0% improvement and the static/dynamic comparisons. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical system proposal with measured outcomes

full rationale

The paper describes a robotic grasping system with two planners and reports success rates from 400 simulation trials plus real-world tests on a Fetch manipulator. No equations, parameter fittings, or first-principles derivations appear in the provided text; the central claims are direct experimental measurements (68.8% and 58.0% success) compared against a baseline. No self-citations are invoked as load-bearing uniqueness theorems, and no predictions reduce by construction to fitted inputs or self-definitions. The work is self-contained as an engineering contribution whose validity rests on external benchmarks rather than internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The system rests on standard robotics modeling assumptions rather than new free parameters or invented entities.

axioms (2)
  • domain assumption Accurate kinematic and dynamic models of the mobile manipulator are available
    Required for whole-body planning to be feasible
  • domain assumption Sensor observations can be fused in real time to update an occupancy or visibility map
    Underpins the active-perception component

pith-pipeline@v0.9.0 · 5517 in / 1249 out tokens · 26548 ms · 2026-05-12T02:23:18.707179+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · 7 internal anchors

  1. [1]

    arXiv preprint arXiv:2401.12202 (2024)

    P. Liu, Y . Orru, C. Paxton, N. M. M. Shafiullah, and L. Pinto, “Ok-robot: What really matters in integrating open-knowledge models for robotics,”arXiv preprint arXiv:2401.12202, 2024. [Online]. Available: https://arxiv.org/abs/2401.12202

  2. [2]

    Homerobot: Open-vocabulary mobile manipulation.arXiv preprint arXiv:2306.11565, 2024

    S. Yenamandra, A. Ramachandran, K. Yadav, A. Wang, M. Khanna, T. Gervet, T.-Y . Yang, V . Jain, A. W. Clegg, J. Turner, Z. Kira, M. Savva, A. Chang, D. S. Chaplot, D. Batra, R. Mottaghi, Y . Bisk, and C. Paxton, “Homerobot: Open-vocabulary mobile manipulation,” 2024. [Online]. Available: https://arxiv. org/abs/2306.11565

  3. [3]

    Combining nav- igation and manipulation costs for time-efficient robot placement in mobile manipulation tasks,

    F. Reister, M. Grotz, and T. Asfour, “Combining nav- igation and manipulation costs for time-efficient robot placement in mobile manipulation tasks,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9913–9920, 2022

  4. [4]

    Robot placement based on reachability inversion,

    N. Vahrenkamp, T. Asfour, and R. Dillmann, “Robot placement based on reachability inversion,” in2013 IEEE International Conference on Robotics and Automation, 2013, pp. 1970–1975

  5. [5]

    Where should i look optimised gaze control for whole-body collision avoidance in dynamic environments,

    M. Finean, W. Merkt, and I. Havoutis, “Where should i look optimised gaze control for whole-body collision avoidance in dynamic environments,”IEEE Robotics and Automation Letters, vol. PP, pp. 1–1, 12 2021

  6. [6]

    Mo- tions in microseconds via vectorized sampling-based planning,

    W. Thomason, Z. Kingston, and L. E. Kavraki, “Mo- tions in microseconds via vectorized sampling-based planning,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 8749–8756

  7. [7]

    Demonstrating mobile manipulation in the wild: A metrics-driven approach,

    M. Bajracharya, J. Borders, R. Cheng, D. Helmick, L. Kaul, D. Kruse, J. Leichty, J. Ma, C. Matl, F. Michel, C. Papazov, J. Petersen, K. Shankar, and M. Tjersland, “Demonstrating mobile manipulation in the wild: A metrics-driven approach,” inRobotics: Science and Systems XIX, ser. RSS2023. Robotics: Science and Systems Foundation, Jul. 2023. [Online]. Avai...

  8. [8]

    Go fetch: Mobile manipulation in unstructured environments,

    K. Blomqvist, M. Breyer, A. Cramariuc, J. F ¨orster, M. Grinvald, F. Tschopp, J. J. Chung, L. Ott, J. Nieto, and R. Siegwart, “Go fetch: Mobile manipulation in unstructured environments,” 2020. [Online]. Available: https://arxiv.org/abs/2004.00899

  9. [9]

    Uncertainty-aware arm-base coordinated object grasping with a mobile manipulation platform,

    D. Chen and G. v. Wichert, “Uncertainty-aware arm-base coordinated object grasping with a mobile manipulation platform,” inISR/Robotik 2014; 41st International Sym- posium on Robotics, 2014, pp. 1–6

  10. [10]

    Uncertainty aware mobile manipulator platform pose planning based on capability map,

    Y . Meng, Y . Chen, and Y . Lou, “Uncertainty aware mobile manipulator platform pose planning based on capability map,” in2021 IEEE International Conference on Real-time Computing and Robotics (RCAR), 2021, pp. 123–128

  11. [11]

    Demonstrating adaptive mobile manipulation in retail environments,

    M. Spahn, C. Pezzato, C. Salmi, R. Dekker, C. Wang, C. Pek, J. Kober, J. Alonso-Mora, C. Hern´andez Corbato, and M. Wisse, “Demonstrating adaptive mobile manipulation in retail environments,” inRobotics: Science and Systems (R:SS), 2024. [Online]. Available: https://www.roboticsproceedings.org/rss20/p047.html

  12. [12]

    Robi butler: Multimodal remote interaction with a household robot assistant,

    A. Xiao, N. Janaka, T. Hu, A. Gupta, K. Li, C. Yu, and D. Hsu, “Robi butler: Multimodal remote interaction with a household robot assistant,” 2025. [Online]. Available: https://arxiv.org/abs/2409.20548

  13. [13]

    Real-time sampling-based safe motion planning for robotic manipulators in dynamic environments,

    N. Covic, B. Lacevic, D. Osmankovic, and T. Uzunovic, “Real-time sampling-based safe motion planning for robotic manipulators in dynamic environments,”IEEE Transactions on Robotics, vol. 41, p. 5287–5306, 2025. [Online]. Available: http://dx.doi.org/10.1109/TRO.2025. 3598119

  14. [14]

    Simultaneous scene reconstruction and whole-body motion planning for safe operation in dynamic environments,

    M. N. Finean, W. Merkt, and I. Havoutis, “Simultaneous scene reconstruction and whole-body motion planning for safe operation in dynamic environments,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 3710–

  15. [15]

    Available: http://dx.doi.org/10.1109/ IROS51168.2021.9636860

    [Online]. Available: http://dx.doi.org/10.1109/ IROS51168.2021.9636860

  16. [16]

    The design of stretch: A compact, lightweight mobile manipulator for indoor human environments,

    M. Tzes, V . Vasilopoulos, Y . Kantaros, and G. J. Pappas, “Reactive informative planning for mobile manipulation tasks under sensing and environmental uncertainty,” in2022 International Conference on Robotics and Automation (ICRA). IEEE Press, 2022, p. 7320–7326. [Online]. Available: https://doi.org/10.1109/ICRA46639. 2022.9811642

  17. [17]

    Reactive planning for mobile manipulation tasks in unexplored semantic environments,

    V . Vasilopoulos, Y . Kantaros, G. J. Pappas, and D. E. Koditschek, “Reactive planning for mobile manipulation tasks in unexplored semantic environments,” in2021 IEEE International Conference on Robotics and Automa- tion (ICRA), 2021, pp. 6385–6392

  18. [18]

    Base placement optimization for coverage mobile manipulation tasks,

    H. Zhang, K. Mi, and Z. Zhang, “Base placement optimization for coverage mobile manipulation tasks,”

  19. [19]

    Available: https://arxiv.org/abs/2304

    [Online]. Available: https://arxiv.org/abs/2304. 08246

  20. [20]

    Symbolic state space optimization for long horizon mobile manipulation planning,

    X. Zhang, Y . Zhu, Y . Ding, Y . Jiang, Y . Zhu, P. Stone, and S. Zhang, “Symbolic state space optimization for long horizon mobile manipulation planning,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 866–872

  21. [21]

    B ∗: Efficient and optimal base placement for fixed-base manipulators,

    Z. Zhao, L. Cui, S. Xie, S. Zhang, Z. Han, L. Ruan, and Y . Zhu, “B ∗: Efficient and optimal base placement for fixed-base manipulators,”IEEE Robotics and Automation Letters, vol. 10, no. 10, pp. 10 634–10 641, Oct. 2025. [Online]. Available: http://dx.doi.org/10.1109/LRA.2025.3604741

  22. [22]

    Look before you sweep: Visibility-aware motion planning,

    G. Goretkin, L. P. Kaelbling, and T. Lozano-P´erez, “Look before you sweep: Visibility-aware motion planning,” inProceedings of the Workshop on the Algorithmic Foundations of Robotics (WAFR). Springer, 2018, pp. 1–

  23. [23]

    Available: https://arxiv.org/abs/1901.06109

    [Online]. Available: https://arxiv.org/abs/1901.06109

  24. [24]

    Perception-aware motion planning via multiobjective search on gpus,

    B. Ichter, B. Landry, E. Schmerling, and M. Pavone, “Perception-aware motion planning via multiobjective search on gpus,” 2017. [Online]. Available: https: //arxiv.org/abs/1705.02408

  25. [25]

    Look as you leap: Planning simultaneous motion and perception for high-dof robots,

    Q. Meng, E. Flores, C. Q.-P. na, P. Qian, Z. Kingston, S. K. Hamlin, V . Unhelkar, and L. E. Kavraki, “Look as you leap: Planning simultaneous motion and perception for high-dof robots,” 2025. [Online]. Available: https://arxiv.org/abs/2509.19610

  26. [26]

    Greedy but safe re- planning under kinodynamic constraints,

    K. E. Bekris and L. E. Kavraki, “Greedy but safe re- planning under kinodynamic constraints,” inProceedings 2007 IEEE International Conference on Robotics and Automation, 2007, pp. 704–710

  27. [27]

    Receding horizon

    A. Bircher, M. Kamel, K. Alexis, H. Oleynikova, and R. Siegwart, “Receding horizon ”next-best-view” planner for 3d exploration,” in2016 IEEE International Con- ference on Robotics and Automation (ICRA), 2016, pp. 1462–1468

  28. [28]

    Closed-loop next-best-view planning for target-driven grasping,

    M. Breyer, L. Ott, R. Siegwart, and J. J. Chung, “Closed-loop next-best-view planning for target-driven grasping,” 2022. [Online]. Available: https://arxiv.org/ abs/2207.10543

  29. [29]

    Multi-view picking: Next-best-view reaching for improved grasping in clutter,

    D. Morrison, P. Corke, and J. Leitner, “Multi-view picking: Next-best-view reaching for improved grasping in clutter,” in2019 International Conference on Robotics and Automation (ICRA). IEEE Press, 2019, p. 8762–8768. [Online]. Available: https://doi.org/10.1109/ ICRA.2019.8793805

  30. [30]

    Affordance-driven next-best-view planning for robotic grasping,

    X. Zhang, D. Wang, S. Han, W. Li, B. Zhao, Z. Wang, X. Duan, C. Fang, X. Li, and J. He, “Affordance-driven next-best-view planning for robotic grasping,” 2023. [Online]. Available: https://arxiv.org/abs/2309.09556

  31. [31]

    Hypothesis-based belief planning for dexterous grasping,

    C. Zito, V . Ortenzi, M. Adjigble, M. Kopicki, R. Stolkin, and J. L. Wyatt, “Hypothesis-based belief planning for dexterous grasping,” 2019. [Online]. Available: https://arxiv.org/abs/1903.05517

  32. [32]

    Active- perceptive motion generation for mobile manipulation,

    S. Jauhri, S. Lueth, and G. Chalvatzaki, “Active- perceptive motion generation for mobile manipulation,”

  33. [33]

    Available: https://arxiv.org/abs/2310

    [Online]. Available: https://arxiv.org/abs/2310. 00433

  34. [34]

    Map space belief prediction for manipulation-enhanced mapping,

    J. M. C. Marques, N. Dengler, T. Zaenker, J. Mucke, S. Wang, M. Bennewitz, and K. Hauser, “Map space belief prediction for manipulation-enhanced mapping,”

  35. [35]

    Available: https://arxiv.org/abs/2502

    [Online]. Available: https://arxiv.org/abs/2502. 20606

  36. [36]

    Neo: A novel expeditious optimisation algorithm for reactive motion control of manipulators,

    J. Haviland and P. Corke, “Neo: A novel expeditious optimisation algorithm for reactive motion control of manipulators,”IEEE Robotics and Automation Letters, vol. 6, no. 2, p. 1043–1050, Apr. 2021. [Online]. Available: http://dx.doi.org/10.1109/LRA.2021.3056060

  37. [37]

    Continuous-time gaussian process motion planning via probabilistic inference,

    M. Mukadam, J. Dong, X. Yan, F. Dellaert, and B. Boots, “Continuous-time gaussian process motion planning via probabilistic inference,”The International Journal of Robotics Research, vol. 37, no. 11, pp. 1319–1340, Sep. 2018. [Online]. Available: http: //dx.doi.org/10.1177/0278364918790369

  38. [38]

    Neural randomized planning for whole body robot motion,

    Y . Lu, Y . Ma, D. Hsu, and P. Cai, “Neural randomized planning for whole body robot motion,” 2024. [Online]. Available: https://arxiv.org/abs/2405.11317

  39. [39]

    Real-time whole-body motion planning for mobile ma- nipulators using environment-adaptive search and spatial- temporal optimization,

    C. Wu, R. Wang, M. Song, F. Gao, J. Mei, and B. Zhou, “Real-time whole-body motion planning for mobile ma- nipulators using environment-adaptive search and spatial- temporal optimization,” in2024 IEEE International Con- ference on Robotics and Automation (ICRA), 2024, pp. 1369–1375

  40. [40]

    Rampage: Toward whole-body, real-time, and agile motion planning in unknown cluttered environments for mobile manip- ulators,

    Y . Yang, F. Meng, Z. Meng, and C. Yang, “Rampage: Toward whole-body, real-time, and agile motion planning in unknown cluttered environments for mobile manip- ulators,”IEEE Transactions on Industrial Electronics, vol. 71, no. 11, pp. 14 492–14 502, 2024

  41. [41]

    Nearest-neighbourless asymptot- ically optimal motion planning with Fully Connected Informed Trees (FCIT *),

    T. S. Wilson, W. Thomason, Z. Kingston, L. E. Kavraki, and J. D. Gammell, “Nearest-neighbourless asymptot- ically optimal motion planning with Fully Connected Informed Trees (FCIT *),” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, 19–23 May 2025, pp. 14 140–14 146

  42. [42]

    Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

    M. Ahn, A. Brohan, N. Brown, Y . Chebotar, R. Cortes, B. David, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzoget al., “Do as i can, not as i say: Grounding language in robotic affordances,” inConference on Robot Learning. PMLR, 2022, pp. 287–315. [Online]. Available: https://arxiv.org/abs/2204.01691

  43. [43]

    VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

    W. Huang, C. Wang, R. Zhang, Y . Li, J. Wu, and L. Fei- Fei, “V oxposer: Composable 3d value maps for robotic manipulation with language models,” inConference on Robot Learning. PMLR, 2023, pp. 540–562. [Online]. Available: https://arxiv.org/abs/2307.05973

  44. [44]

    Dynamem: Online dynamic spatio-semantic memory for open world mobile manipulation,

    P. Liu, Z. Guo, M. Warke, S. Chintala, C. Paxton, N. M. M. Shafiullah, and L. Pinto, “Dynamem: Online dynamic spatio-semantic memory for open world mobile manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/2411.04999

  45. [45]

    Adaptive skill coordination for robotic mobile manipulation

    N. Yokoyama, A. Clegg, J. Truong, E. Undersander, T.-Y . Yang, S. Arnaud, S. Ha, D. Batra, and A. Rai, “Asc: Adaptive skill coordination for robotic mobile manipulation,” 2023. [Online]. Available: https://arxiv. org/abs/2304.00410

  46. [46]

    Robot learning of mobile manipulation with reachability behavior priors,

    S. Jauhri, J. Peters, and G. Chalvatzaki, “Robot learning of mobile manipulation with reachability behavior priors,”IEEE Robotics and Automation Letters, vol. 7, no. 3, p. 8399–8406, 2022. [Online]. Available: https://doi.org/10.1109/LRA.2022.3188109

  47. [47]

    Pre-grasp approach- ing on mobile robots: A pre-active layered approach,

    L. Naik, S. Kalkan, and N. Kr ¨uger, “Pre-grasp approach- ing on mobile robots: A pre-active layered approach,” IEEE Robotics and Automation Letters, vol. 9, no. 3, pp. 2606–2613, 2024

  48. [48]

    Quadwbg: Generalizable quadrupedal whole-body grasping,

    J. Wang, J. Rajabov, C. Xu, Y . Zheng, and H. Wang, “Quadwbg: Generalizable quadrupedal whole-body grasping,” 2025. [Online]. Available: https://arxiv.org/abs/2411.06782

  49. [49]

    Gamma: Graspability- aware mobile manipulation policy learning based on online grasping pose fusion,

    J. Zhang, N. Gireesh, J. Wang, X. Fang, C. Xu, W. Chen, L. Dai, and H. Wang, “Gamma: Graspability- aware mobile manipulation policy learning based on online grasping pose fusion,” 2024. [Online]. Available: https://arxiv.org/abs/2309.15459

  50. [50]

    RT-1: Robotics Transformer for Real-World Control at Scale

    A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsuet al., “Rt-1: Robotics transformer for real-world control at scale,”arXiv preprint arXiv:2212.06817, 2022. [Online]. Available: https://arxiv.org/abs/2212.06817

  51. [51]

    PaLM-E: An Embodied Multimodal Language Model

    D. Driess, F. Xia, M. S. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yuet al., “Palm-e: An embodied multimodal language model,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 665–13 675. [Online]. Available: https://arxiv.org/abs/2303.03378

  52. [52]

    $\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

    P. Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, M. Y . Galliker, D. Ghosh, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. V...

  53. [53]

    Sg-vla: Learning spatially-grounded vision- language-action models for mobile manipulation,

    R. Tu, A. Shukla, S. Yoo, X. Li, J. Li, J. Xie, H. Su, and Z. Tu, “Sg-vla: Learning spatially-grounded vision- language-action models for mobile manipulation,” 2026. [Online]. Available: https://arxiv.org/abs/2603.22760

  54. [54]

    Momanipvla: Transferring vision-language- action models for general mobile manipulation,

    Z. Wu, Y . Zhou, X. Xu, Z. Wang, and H. Yan, “Momanipvla: Transferring vision-language- action models for general mobile manipulation,” 2025. [Online]. Available: https://arxiv.org/abs/2503.13446

  55. [55]

    RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

    B. Zitkovich, A. Apple, D. Bodnar, T. Nguyen, A. Brohan, Y . Chebotar, C. Finn, K. Hausmanet al., “Rt- 2: Vision-language-action models transferred from web- scale real-world data,”arXiv preprint arXiv:2307.15818,

  56. [56]

    Available: https://arxiv.org/abs/2307

    [Online]. Available: https://arxiv.org/abs/2307. 15818

  57. [57]

    Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

    M. Deitke, C. Clark, S. Lee, R. Tripathi, Y . Yang, J. S. Park, M. Salehi, N. Muennighoff, K. Lo, L. Soldaini, J. Lu, T. Anderson, E. Bransom, K. Ehsani, H. Ngo, Y . Chen, A. Patel, M. Yatskar, C. Callison-Burch, A. Head, R. Hendrix, F. Bastani, E. VanderBilt, N. Lambert, Y . Chou, A. Chheda, J. Sparks, S. Skjonsberg, M. Schmitz, A. Sarnat, B. Bischoff, P...

  58. [58]

    Real-time instance detection with fast incremental learning

    M. Sundermeyer, A. Mousavian, R. Triebel, and D. Fox, “Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE Press, 2021, p. 13438–13444. [Online]. Available: https://doi.org/10.1109/ICRA48506.2021.9561877

  59. [59]

    Trac-ik: An open-source library for improved solving of generic inverse kinematics,

    P. Beeson and B. Ames, “Trac-ik: An open-source library for improved solving of generic inverse kinematics,” in2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids). IEEE Press, 2015, p. 928–935. [Online]. Available: https://doi.org/10.1109/ HUMANOIDS.2015.7363472

  60. [60]

    Path planning in unstructured environments : A real-time hybrid a* implementation for fast and deterministic path generation for the kth research concept vehicle,

    K. Kurzer, “Path planning in unstructured environments : A real-time hybrid a* implementation for fast and deterministic path generation for the kth research concept vehicle,” Master’s thesis, KTH, Integrated Transport Research Lab, ITRL, 2016. [Online]. Available: https://www.diva-portal.org/smash/record.jsf? pid=diva2:1057261

  61. [61]

    Rrt-connect: An efficient approach to single-query path planning,

    J. Kuffner and S. LaValle, “Rrt-connect: An efficient approach to single-query path planning,” inProceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Pro- ceedings (Cat. No.00CH37065), vol. 2, 2000, pp. 995– 1001 vol.2

  62. [62]

    Creating high-quality roadmaps for motion planning in virtual environments,

    R. Geraerts and M. H. Overmars, “Creating high-quality roadmaps for motion planning in virtual environments,” in2006 IEEE/RSJ International Conference on Intelli- gent Robots and Systems, 2006, pp. 4355–4361

  63. [63]

    Fast smoothing of manipulator trajectories using optimal bounded- acceleration shortcuts,

    K. Hauser and V . Ng-Thow-Hing, “Fast smoothing of manipulator trajectories using optimal bounded- acceleration shortcuts,” in2010 IEEE International Con- ference on Robotics and Automation, 2010, pp. 2493– 2498

  64. [64]

    Collision- free and smooth trajectory computation in cluttered environments,

    J. Pan, L. Zhang, and D. Manocha, “Collision- free and smooth trajectory computation in cluttered environments,”The International Journal of Robotics Research, vol. 31, no. 10, pp. 1155–1175, 2012. [Online]. Available: https://doi.org/10.1177/0278364912453186

  65. [65]

    LaNoising: A data-driven approach for 903nm ToF LiDAR performance modeling under fog,

    S. Macenski, F. Martin, R. White, and J. Gin ´es Clavero, “The marathon 2: A navigation system,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020. [Online]. Available: https: //doi.org/10.1109/IROS45743.2020.9341207

  66. [66]

    Maniskill3: Gpu parallelized robotics simulation and rendering for generalizable embodied ai.arXiv preprint arXiv:2410.00425, 2024

    S. Tao, F. Xiang, A. Shukla, Y . Qin, X. Hinrichsen, X. Yuan, C. Bao, X. Lin, Y . Liu, T. kai Chan, Y . Gao, X. Li, T. Mu, N. Xiao, A. Gurha, V . N. Rajesh, Y . W. Choi, Y .-R. Chen, Z. Huang, R. Calandra, R. Chen, S. Luo, and H. Su, “Maniskill3: Gpu parallelized robotics simulation and rendering for generalizable embodied ai,”Robotics: Science and System...

  67. [67]

    Habitat 2.0: Training home assistants to rearrange their habitat,

    A. Szot, A. Clegg, E. Undersander, E. Wijmans, Y . Zhao, J. Turner, N. Maestre, M. Mukadam, D. Chaplot, O. Maksymets, A. Gokaslan, V . V ondrus, S. Dharur, F. Meier, W. Galuba, A. Chang, Z. Kira, V . Koltun, J. Malik, M. Savva, and D. Batra, “Habitat 2.0: Training home assistants to rearrange their habitat,” inAdvances in Neural Information Processing Sys...

  68. [68]

    Acronym: A large-scale grasp dataset based on simulation,

    C. Eppner, A. Mousavian, and D. Fox, “Acronym: A large-scale grasp dataset based on simulation,” in2021 IEEE International Conference on Robotics and Automa- tion (ICRA), 2021, pp. 6222–6227. APPENDIX This appendix provides supplementary material organized as follows: implementation details for grasp sampling, pre- grasp configuration generation, kinemati...

  69. [69]

    The object’s final height is within the expected range relative to the support surface (±5 cm below to 30 cm above)

  70. [70]

    The object’s total displacement from spawn is less than 30 cm (no excessive rolling/bouncing)

  71. [71]

    d) Oracle Grasp Checking:Stable objects undergo gras- pability validation to ensure each benchmark object is actually graspable under ideal conditions

    No collision with existing objects at spawn time Objects failing these checks are re-sampled at alternative positions until a stable placement is found or the maximum attempts (100) are exceeded. d) Oracle Grasp Checking:Stable objects undergo gras- pability validation to ensure each benchmark object is actually graspable under ideal conditions. The proce...

  72. [72]

    Approaches the target object without collision

  73. [73]

    Executes a grasp that achieves stable contact with the object

  74. [74]

    Lifts the object above a height threshold (10 cm above the support surface)

  75. [75]

    Maintains stable grasp for 2 seconds after lifting This physics-based validation accounts for the robot’s full kinematic and dynamic constraints during execution. b) Failure Categories:We categorize failures into three groups to identify system bottlenecks: Execution Failuresoccur during physical interaction: •Collision:Any contact between the robot body ...

  76. [76]

    System-Level Comparisons:We evaluate two alternative system designs that make different architectural choices for integrating navigation and manipulation, complementing the ablation study in the main paper. a) CapMap Placement:This system design follows the sequential navigation-then-manipulation paradigm but incorporates manipulation-aware base placement...

  77. [77]

    SPL is defined as: SPL= 1 N NX i=1 Si · li max(pi, li) (14) Fig

    Additional Analysis: a) Path Efficiency Analysis:We evaluate path efficiency using Success weighted by Path Length (SPL), which mea- sures how efficiently the robot reaches the target relative to the optimal path length. SPL is defined as: SPL= 1 N NX i=1 Si · li max(pi, li) (14) Fig. 13:System-level comparison.Success rates for all six methods across unk...