One Body, Two Minds: Variable Autonomy Approach for a Co-embodied Robotic Hand
Pith reviewed 2026-06-25 21:10 UTC · model grok-4.3
The pith
A wearable robotic hand shares one physical body with its user but switches between autonomous grasping and human head-gesture control across task phases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The co-embodied variable autonomy approach, where human and robot share a single physical body and operate at different autonomy levels across task phases from mutual autonomy during object search and grasping to human-dominant control during actuation, enables effective human-robot collaboration through physical coupling while preserving user agency.
What carries the argument
Phase-switching variable autonomy in one shared body: visuomotor diffusion policy handles autonomous grasping then yields to head-gesture human actuation with continuous veto release.
If this is right
- Users adapted rapidly with 23.3 percent faster completion times across trials and large effect size.
- Best policy variant reached 93.6 percent task success rate in bimanual tool-use tasks.
- Overall user acceptance reached 5.70 out of 7 with 5.52 out of 7 willingness for daily use.
- The system maintains physical coupling while allowing full independent robot actions in the grasping phase and full human control afterward.
Where Pith is reading between the lines
- The same phase logic could apply to other wearable devices where the timing of autonomy handoff matters more than blended signals.
- Success depends on the policy generalizing to new object positions or slight variations not tested in the five-tool set.
- If head gestures prove fatiguing in longer sessions, an alternative input channel would be needed to preserve the human-dominant phase.
- Extending the approach to continuous manipulation rather than discrete tool actuation would require additional phase definitions.
Load-bearing premise
The learning-from-demonstration visuomotor diffusion policy will reliably perform autonomous grasping whenever the user positions the hand near known objects.
What would settle it
A repeated trial where the grasping policy fails more than 20 percent of the time or users show no reduction in completion time across sessions would falsify the viability of the phase switch.
Figures
read the original abstract
Assistive robotic systems face a fundamental trade-off: fully autonomous systems lack user agency, while fully user-controlled systems demand continuous cognitive effort. Existing shared autonomy approaches blend human and robot commands but are mostly deployed in separate physical bodies. We introduce co-embodiment with variable autonomy, where human and robot share a single physical body and operate at different autonomy levels across task phases, from mutual autonomy during object search and grasping to human-dominant control during actuation. We present a co-embodied, wearable robotic hand that has its own ``mind'' and operates with variable autonomy levels. A learning-from-demonstration visuomotor diffusion policy enables autonomous grasping when the user positions the hand near known objects. Once grasped, the system signals completion and the human can actuate the grasped tool (drill, spray bottle, infrared thermometer, lighter, and ice-cream scoop) via hands-free head gestures. The human retains veto authority at all times through a release gesture that returns the system to the initial phase. Unlike blended autonomy, where control is continuously negotiated, our co-embodied approach consists of variable autonomy from full human control to full independent actions while maintaining physical coupling, realizing a one body, two minds paradigm. In a user study with 44 participants performing five bimanual tasks, users rapidly adapted to this ``two minds'' paradigm: completion times improved by 23.3% across trials ($p < 0.001$, Cohen's $d = 0.94$), the best-performing policy variant reached a 93.6% task success rate, and acceptance ratings were high (5.70/7 overall impression, 5.52/7 daily use willingness). This work establishes co-embodiment with variable autonomy as a viable approach for assistive robotics, enabling human-robot collaboration through co-embodiment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a co-embodied wearable robotic hand that shares a single physical body between human and robot under a variable-autonomy scheme. A learning-from-demonstration visuomotor diffusion policy performs autonomous grasping once the user positions the hand near known objects; upon grasp completion the system switches to human-dominant control for tool actuation (drill, spray bottle, etc.) via head gestures, with a release gesture returning control to the initial phase. A 44-participant user study on five bimanual tasks reports 23.3% faster completion times (p<0.001, d=0.94), up to 93.6% task success for the best policy variant, and acceptance ratings of 5.70/7 overall impression.
Significance. If the reported user-study outcomes prove robust, the work supplies concrete evidence that co-embodiment with discrete phase-wise autonomy can deliver measurable collaboration gains while preserving user agency, a result that would be of direct interest to assistive-robotics and shared-control communities. The 44-participant sample, within-subject design, and reporting of effect sizes constitute clear empirical strengths.
major comments (2)
- [User Study] User Study section (and Abstract system-description paragraph): the central claim that the variable-autonomy scheme enables effective collaboration rests on the visuomotor diffusion policy reliably completing autonomous grasps when the hand is positioned near objects. The reported aggregate metrics (23.3% time reduction, 93.6% success, 5.70/7 acceptance) are end-to-end; no per-phase grasping success rate, failure-mode analysis, or ablation isolating policy performance is supplied. Without this breakdown it is impossible to determine whether the observed gains are attributable to the claimed phase switch or to other factors such as user adaptation or veto usage.
- [Methods] Methods / Policy Training subsection: the manuscript states that the LfD visuomotor diffusion policy enables autonomous grasping, yet provides no quantitative evaluation (success rate, failure cases, or comparison against baseline policies) of this component on the five target objects. Because the phase-switch logic depends on reliable grasp detection, the absence of isolated policy metrics leaves the load-bearing assumption unverified.
minor comments (2)
- [Figures] Figure captions and axis labels should explicitly state the number of trials per condition and whether error bars represent standard error or 95% CI.
- [User Study] The acceptance questionnaire items are referenced only by overall scores; listing the individual Likert items and their means would improve interpretability.
Simulated Author's Rebuttal
We thank the referee for their thorough review and for highlighting the strengths of our empirical evaluation. We address the major comments on the user study and methods below, providing the strongest honest defense of our approach while acknowledging areas for improvement.
read point-by-point responses
-
Referee: [User Study] User Study section (and Abstract system-description paragraph): the central claim that the variable-autonomy scheme enables effective collaboration rests on the visuomotor diffusion policy reliably completing autonomous grasps when the hand is positioned near objects. The reported aggregate metrics (23.3% time reduction, 93.6% success, 5.70/7 acceptance) are end-to-end; no per-phase grasping success rate, failure-mode analysis, or ablation isolating policy performance is supplied. Without this breakdown it is impossible to determine whether the observed gains are attributable to the claimed phase switch or to other factors such as user adaptation or veto usage.
Authors: We agree that breaking down the results by phase would provide stronger evidence for the contribution of the variable autonomy scheme. The study was designed to assess the integrated system in realistic tasks, where the overall time savings and high success rate indicate effective collaboration. The phase switch is triggered by grasp completion detection, and the 93.6% success implies reliable grasping in context. However, to better isolate the policy's role, we will perform a post-hoc analysis of the user study data to report approximate per-phase metrics and failure modes in the revised version if the data allows for it. revision: partial
-
Referee: [Methods] Methods / Policy Training subsection: the manuscript states that the LfD visuomotor diffusion policy enables autonomous grasping, yet provides no quantitative evaluation (success rate, failure cases, or comparison against baseline policies) of this component on the five target objects. Because the phase-switch logic depends on reliable grasp detection, the absence of isolated policy metrics leaves the load-bearing assumption unverified.
Authors: The policy was developed using learning-from-demonstration and integrated into the system for the user study. While we did not include a separate quantitative evaluation of the policy alone (e.g., success rates on the five objects in isolation or baselines), the user study serves as an in-the-wild validation. We acknowledge this as a gap. In revision, we will add any available training metrics or a note on the policy's role, and if possible, include a small offline evaluation. Otherwise, we will explicitly state this limitation. revision: partial
Circularity Check
No circularity; empirical user-study claims are self-contained
full rationale
The paper describes a co-embodied robotic hand system using a learning-from-demonstration visuomotor diffusion policy for phase switching and reports direct empirical outcomes from a 44-participant study (23.3% time reduction, 93.6% success, 5.70/7 acceptance). No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central claims rest on measured task performance rather than any reduction of outputs to inputs by construction, satisfying the default expectation of no significant circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Global, regional, and national burden of stroke and its risk factors, 1990–2019: a systematic analysis for the global burden of disease study 2019,
G. . S. Collaboratorset al., “Global, regional, and national burden of stroke and its risk factors, 1990–2019: a systematic analysis for the global burden of disease study 2019,”The Lancet. Neurology, vol. 20, no. 10, p. 795, 2021
1990
-
[2]
Bi- manual training in stroke: How do coupling and symmetry-breaking matter?
R. Sleimen-Malkoun, J.-J. Temprado, L. Thefenne, and E. Berton, “Bi- manual training in stroke: How do coupling and symmetry-breaking matter?”BMC neurology, vol. 11, no. 1, p. 11, 2011
2011
-
[3]
Active robot-assisted feeding with a general- purpose mobile manipulator: Design, evaluation, and lessons learned,
D. Park, Y . Hoshi, H. P. Mahajan, H. K. Kim, Z. Erickson, W. A. Rogers, and C. C. Kemp, “Active robot-assisted feeding with a general- purpose mobile manipulator: Design, evaluation, and lessons learned,” Robotics and Autonomous Systems, vol. 124, p. 103344, 2020
2020
-
[4]
Who’s in charge here? a survey on trustworthy ai in variable autonomy robotic systems,
L. Methnani, M. Chiou, V . Dignum, and A. Theodorou, “Who’s in charge here? a survey on trustworthy ai in variable autonomy robotic systems,”ACM computing surveys, vol. 56, no. 7, pp. 1–32, 2024
2024
-
[5]
Myoelectric control of prosthetic hands: state-of-the- art review,
P. Geethanjali, “Myoelectric control of prosthetic hands: state-of-the- art review,”Medical Devices: Evidence and Research, pp. 247–255, 2016
2016
-
[6]
An empirical evaluation of force feedback in body-powered prostheses,
J. D. Brown, T. S. Kunz, D. Gardner, M. K. Shelley, A. J. Davis, and R. B. Gillespie, “An empirical evaluation of force feedback in body-powered prostheses,”IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 25, no. 3, pp. 215–226, 2016
2016
-
[7]
A highly integrated bionic hand with neural control and feedback for use in daily life,
M. Ortiz-Catalan, J. Zbinden, J. Millenaar, D. D’Accolti, M. Controzzi, F. Clemente, L. Cappello, E. J. Earley, E. Mastinu, J. Kolankowska et al., “A highly integrated bionic hand with neural control and feedback for use in daily life,”Science robotics, vol. 8, no. 83, p. eadf7360, 2023
2023
-
[8]
Neural interfaces for control of upper limb prostheses: the state of the art and future possibilities,
A. E. Schultz and T. A. Kuiken, “Neural interfaces for control of upper limb prostheses: the state of the art and future possibilities,”Pm&r, vol. 3, no. 1, pp. 55–67, 2011
2011
-
[9]
A policy-blending formalism for shared control,
A. D. Dragan and S. S. Srinivasa, “A policy-blending formalism for shared control,”The International Journal of Robotics Research, vol. 32, no. 7, pp. 790–805, 2013
2013
-
[10]
A shared autonomy approach for wheelchair navigation based on learned user preferences,
Y . Chang, M. Kutbi, N. Agadakos, B. Sun, and P. Mordohai, “A shared autonomy approach for wheelchair navigation based on learned user preferences,” inProceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 1490–1499
2017
-
[11]
Sari: Shared autonomy across repeated interaction,
A. Jonnavittula, S. A. Mehta, and D. P. Losey, “Sari: Shared autonomy across repeated interaction,”ACM Transactions on Human-Robot Interaction, vol. 13, no. 2, pp. 1–36, 2024
2024
-
[12]
Shared autonomy via hindsight optimization for teleopera- tion and teaming,
S. Javdani, H. Admoni, S. Pellegrinelli, S. S. Srinivasa, and J. A. Bagnell, “Shared autonomy via hindsight optimization for teleopera- tion and teaming,”The International Journal of Robotics Research, vol. 37, no. 7, pp. 717–742, 2018. Fig. 9.Robustness of learning effects.Sensitivity analysis restricted to participants who maintained or improved task su...
2018
-
[13]
Dex- net 3.0: Computing robust vacuum suction grasp targets in point clouds using a new analytic model and deep learning,
J. Mahler, M. Matl, X. Liu, A. Li, D. Gealy, and K. Goldberg, “Dex- net 3.0: Computing robust vacuum suction grasp targets in point clouds using a new analytic model and deep learning,” in2018 IEEE International Conference on robotics and automation (ICRA). IEEE, 2018, pp. 5620–5627
2018
-
[14]
Diffusion policy: Visuomotor policy learning via action diffusion,
C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,”The International Journal of Robotics Research, vol. 44, no. 10-11, pp. 1684–1704, 2025
2025
-
[15]
Plan- ning with diffusion for flexible behavior synthesis,
M. Janner, Y . Du, J. B. Tenenbaum, and S. Levine, “Plan- ning with diffusion for flexible behavior synthesis,”arXiv preprint arXiv:2205.09991, 2022
Pith/arXiv arXiv 2022
-
[16]
3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations,
Y . Ze, G. Zhang, K. Zhang, C. Hu, M. Wang, and H. Xu, “3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations,”arXiv preprint arXiv:2403.03954, 2024
Pith/arXiv arXiv 2024
-
[17]
S 2-diffusion: Generalizing from instance-level to category-level skills in robot manipulation,
Q. Yang, M. C. Welle, D. Kragic, and O. Andersson, “S 2-diffusion: Generalizing from instance-level to category-level skills in robot manipulation,”arXiv preprint arXiv:2502.09389, 2025
arXiv 2025
-
[18]
A robotic skill learning system built upon diffusion policies and foundation models,
N. Ingelhag, J. Munkeby, J. van Haastregt, A. Varava, M. C. Welle, and D. Kragic, “A robotic skill learning system built upon diffusion policies and foundation models,” in2024 33rd IEEE International Conference on Robot and Human Interactive Communication (RO- MAN). IEEE, 2024, pp. 748–754
2024
-
[19]
Aloha unleashed: A simple recipe for robot dexterity,
T. Z. Zhao, J. Tompson, D. Driess, P. Florence, K. Ghasemipour, C. Finn, and A. Wahid, “Aloha unleashed: A simple recipe for robot dexterity,”arXiv preprint arXiv:2410.13126, 2024
arXiv 2024
-
[20]
Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots,
C. Chi, Z. Xu, C. Pan, E. Cousineau, B. Burchfiel, S. Feng, R. Tedrake, and S. Song, “Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots,”arXiv preprint arXiv:2402.10329, 2024
Pith/arXiv arXiv 2024
-
[21]
A careful examination of large behavior models for multitask dexterous manipulation,
J. Barreiros, A. Beaulieu, A. Bhat, R. Cory, E. Cousineau, H. Dai, C.-H. Fang, K. Hashimoto, M. Z. Irshad, M. Itkinaet al., “A careful examination of large behavior models for multitask dexterous manipulation,”arXiv preprint arXiv:2507.05331, 2025
Pith/arXiv arXiv 2025
-
[22]
Learning dexterous in- hand manipulation with multifingered hands via visuomotor diffusion,
P. Koczy, M. C. Welle, and D. Kragic, “Learning dexterous in- hand manipulation with multifingered hands via visuomotor diffusion,” arXiv preprint arXiv:2503.02587, 2025
arXiv 2025
-
[23]
Sam 2: Segment anything in images and videos,
N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R ¨adle, C. Rolland, L. Gustafson, E. Mintun, J. Pan, K. V . Alwala, N. Carion, C.-Y . Wu, R. Girshick, P. Doll ´ar, and C. Feichtenhofer, “Sam 2: Segment anything in images and videos,” 2024. [Online]. Available: https://arxiv.org/abs/2408.00714
Pith/arXiv arXiv 2024
-
[24]
Hmdb: A large video database for human motion recognition,
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “Hmdb: A large video database for human motion recognition,” in2011 International Conference on Computer Vision, 2011, pp. 2556–2563
2011
-
[25]
Imagenet: A large-scale hierarchical image database,
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255
2009
-
[26]
Diffusion policy: Visuomotor policy learning via action diffusion,
C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” 2024. [Online]. Available: https://arxiv.org/abs/2303.04137
Pith/arXiv arXiv 2024
-
[27]
G* power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences,
F. Faul, E. Erdfelder, A.-G. Lang, and A. Buchner, “G* power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences,”Behavior research methods, vol. 39, no. 2, pp. 175–191, 2007
2007
-
[28]
Project website,
“Project website,” One Body, Two Minds. [Online]. Available: https://co-embodiment.github.io/
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.