pith. sign in

arxiv: 2511.17774 · v3 · submitted 2025-11-21 · 💻 cs.RO

Contact-Rich Robotic Assembly in Construction via Diffusion Policy Learning

Pith reviewed 2026-05-17 20:01 UTC · model grok-4.3

classification 💻 cs.RO
keywords robotic assemblydiffusion policiesconstruction roboticscontact-rich manipulationtimber joinerypolicy learningindustrial robotsforce sensing
0
0 comments X

The pith

Diffusion policies trained on force-sensing demonstrations let industrial robots assemble tight-fitting timber joints despite positional errors up to 10 mm.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that diffusion policy learning can produce contact-aware robot behaviors for construction-scale assembly tasks where friction, geometric constraints, and fabrication uncertainties make precise pre-alignment impractical. Training occurs on teleoperated demonstrations gathered inside an instrumented workcell, after which the resulting policies are tested both under ideal conditions and under deliberately introduced misalignments that exceed the physical joint clearance. A sympathetic reader cares because successful generalization would mean robots could perform reliable, high-precision work in the messy tolerance environment typical of building sites without requiring perfect fixturing or sub-millimeter positioning hardware.

Core claim

Sensory-motor diffusion policies trained from teleoperated demonstrations in a force/torque-equipped industrial workcell achieve 100 percent success on nominal mortise-and-tenon timber assemblies and maintain an average 75 percent success rate when randomized positional perturbations of up to 10 mm are introduced, supplying initial evidence that the learned policies compensate for large misalignments through contact-rich control rather than open-loop positioning.

What carries the argument

Sensory-motor diffusion policies that generate action sequences conditioned on current force and position observations to produce contact-aware assembly motions.

Load-bearing premise

That teleoperated demonstrations collected inside a controlled workcell with force sensing will transfer to the wider range of material imperfections, tolerance stack-ups, and dynamic disturbances present on real construction sites.

What would settle it

A field trial on an actual construction site in which success rates fall below 50 percent when the same policy encounters accumulated tolerances from multiple prefabricated members and typical site vibrations would falsify the claim of practical robustness.

Figures

Figures reproduced from arXiv: 2511.17774 by (2) University of Michigan), Arash Adel (1) ((1) Princeton University, Daniel Ruan (1), Nima Fazeli (2), Salma Mozaffari (1), Sigrid Adriaenssens (1), William van den Bogert (2).

Figure 1
Figure 1. Figure 1: Overview of our method. A CNN-based diffusion policy is trained conditional on end-effector pose and F/T data observations to predict sequences of robot actions. geometrically interlocking timber components to create durable structural connections with minimal reliance on fasteners or adhesives [22]. These joints are valued for their enhanced structural performance, rotational stiffness, and resilience und… view at source ↗
Figure 2
Figure 2. Figure 2: Multi-robot setup consisting of two 6-axis industrial robotic arms [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Gripper for holding the tenon piece equipped with anti-collision and F/T sensors. gardless of their position. In addition, this transformation in￾cludes an adjustable scaling parameter that maps the demon￾strator’s control motions to proportionally smaller movements on the robot, enhancing the demonstrator’s fine motor control capabilities during high-precision tasks such as timber joint as￾sembly. During … view at source ↗
Figure 4
Figure 4. Figure 4: VR-based teleoperation system for collecting human expert demonstra￾tions. 4D quaternion rotation of the tenon gripper end effector) every 12 ms (approximately 83 Hz), while F/T data is collected as a 6D vector (3D force in newtons + 3D torque in newton-meters) at 64 Hz after passing through an infinite impulse response (IIR) low-pass filter to attenuate frequency components above 64 Hz. This filtering ste… view at source ↗
Figure 5
Figure 5. Figure 5: Data collection and policy rollout workflow. iteration step k. As implemented in [30], we add the sampled random noise ϵ k to the noise-free actions A0 in K forward steps (original DDPM method). Policy training was conducted on high-performance comput￾ing (HPC) clusters7 , which provide powerful computational re￾sources for efficiently processing large datasets and enabling faster, scalable learning by lev… view at source ↗
Figure 6
Figure 6. Figure 6: Data filtering using a low-pass Butterworth filter. The visualized trajectory is for an error recovery scenario, demonstrating insertion after a collision with the mortise surface. Only the first 3 dimensions for each pose and F/T data are shown. A rollout is recorded as a failure if any collision errors are trig￾gered by the industrial robot controller or if the policy does not appear to be making any mea… view at source ↗
Figure 7
Figure 7. Figure 7: Mortise and tenon terminology, dimensions, and reference frame. tion (5 mm), and uncertainty at the edge of the demonstration distribution (10 mm). For each offset condition, the goal po￾sition was randomized uniformly along the circumference of a circle with a radius of the specified distance. The success rate of a policy was computed separately for each offset distance, and its overall performance was re… view at source ↗
Figure 8
Figure 8. Figure 8: Smoothed mean squared error (MSE) training and validation loss curves for Policy 4, averaged then smoothed across all training iterations. The shaded area represents ±1 standard deviation [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Example of an insertion sequence for a successful rollout (top row) and an unsuccessful rollout (bottom row). a timber mortise and tenon joint under no uncertainty. These parameter values were subsequently used as the initialization point for parameter tuning in the Phase 2 experiments [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: illustrates the contribution of F/T feedback to task per￾formance. As expected, both ablated policies underperformed relative to the full model. However, a notable pattern emerged when comparing the pose-only policy to the masked full model. When F/T inputs were masked in the full model, the tenon con￾sistently collided with the mortise, reflecting the model’s in￾ability to detect contact without force fe… view at source ↗
Figure 11
Figure 11. Figure 11: Success rates of diffusion policies trained with different numbers of demonstrations (50, 100, 200, and 400) evaluated at three mortise offsets (0, 5, and 10 mm). Error bars indicate the standard error of the mean (SEM) com￾puted across the 4 independently trained models for each set of parameters. 5. Conclusion The experimental results presented in this work demonstrate that diffusion policies can achiev… view at source ↗
read the original abstract

Fabrication uncertainty arising from tolerance accumulation, material imperfection, and positioning errors remains a critical barrier to automated robotic assembly in construction, particularly for contact-rich manipulation tasks governed by friction and geometric constraints. This paper investigates the deployment of diffusion policy learning on construction-scale industrial robots to enable robust, high-precision assembly under such uncertainty, using tight-fitting mortise and tenon timber joinery as a representative case study. Sensory-motor diffusion policies are trained using teleoperated demonstrations collected from an industrial robotic workcell equipped with force/torque sensing. A two-phase experimental study evaluates baseline performance and robustness under randomized positional perturbations up to 10 mm, far exceeding the sub-millimeter joint clearance. The best-performing policy achieved 100% success under nominal conditions and 75% average success under uncertainty. These results provide initial evidence that diffusion policies compensate for misalignments through contact-aware control, representing a step toward robust robotic assembly in construction under tight tolerances.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript investigates the application of sensory-motor diffusion policies, trained via teleoperated demonstrations in a force/torque-equipped industrial robotic workcell, to contact-rich assembly of tight-fitting mortise and tenon timber joints. A two-phase experimental study evaluates nominal performance and robustness to randomized positional perturbations of up to 10 mm. The best policy achieves 100% success under nominal conditions and 75% average success under uncertainty, offering initial evidence that diffusion policies enable contact-aware compensation for misalignments in construction-scale assembly tasks.

Significance. If the empirical results hold under broader conditions, the work provides concrete physical-robot evidence that diffusion-based imitation learning can address tolerance accumulation and positioning errors in contact-rich construction tasks, a domain where traditional control often fails. The use of force/torque sensing during demonstration collection and hardware validation on industrial robots strengthens the practical relevance, though the limited perturbation types leave open questions about transfer to full site variability.

major comments (2)
  1. [Abstract] Abstract: success rates of 100% nominal and 75% under perturbation are reported without any information on the number of teleoperated demonstrations, diffusion model hyperparameters, number of evaluation trials, statistical variance, or precise definition of assembly success (e.g., insertion depth threshold or force limits). These omissions directly affect assessment of the central claim that the policy compensates for misalignments.
  2. [Experimental Study] Two-phase experimental study: only randomized positional offsets (up to 10 mm) are applied to demonstrations collected in a controlled workcell. This does not test material imperfections, variable friction, or dynamic disturbances that would change contact dynamics outside the training distribution, weakening the generalization implied by the claim of contact-aware control for construction uncertainty.
minor comments (2)
  1. Clarify whether any baseline policies (e.g., standard behavior cloning or force-based controllers) were evaluated alongside the diffusion policy to contextualize the reported success rates.
  2. Ensure that any tables or figures presenting success rates include error bars or trial counts for transparency.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. Below we provide point-by-point responses to the major comments and indicate the revisions we plan to make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: success rates of 100% nominal and 75% under perturbation are reported without any information on the number of teleoperated demonstrations, diffusion model hyperparameters, number of evaluation trials, statistical variance, or precise definition of assembly success (e.g., insertion depth threshold or force limits). These omissions directly affect assessment of the central claim that the policy compensates for misalignments.

    Authors: The abstract is intended as a concise summary, while the full manuscript details the experimental protocol, including the number of teleoperated demonstrations, model hyperparameters, evaluation trials, and the definition of assembly success (full insertion within force limits). To directly address the concern and strengthen the presentation of the central claim, we will revise the abstract to incorporate a brief statement on the number of trials and success criteria. revision: yes

  2. Referee: [Experimental Study] Two-phase experimental study: only randomized positional offsets (up to 10 mm) are applied to demonstrations collected in a controlled workcell. This does not test material imperfections, variable friction, or dynamic disturbances that would change contact dynamics outside the training distribution, weakening the generalization implied by the claim of contact-aware control for construction uncertainty.

    Authors: The study focuses on positional perturbations up to 10 mm as a representative and quantifiable source of uncertainty in construction assembly, with force/torque sensing enabling contact-aware adaptation as shown by the reported success rates. We agree that material imperfections, variable friction, and dynamic disturbances are not tested here and represent additional challenges. In revision we will add an explicit limitations discussion clarifying the current scope and suggesting future extensions to broader site variability. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical hardware results independent of derivations

full rationale

The paper reports experimental outcomes from training sensory-motor diffusion policies on teleoperated demonstrations collected in a force/torque-equipped workcell, followed by direct physical testing of success rates under nominal conditions and randomized positional perturbations up to 10 mm. No mathematical derivation chain, first-principles predictions, or equations are presented that reduce by construction to fitted parameters, self-citations, or ansatzes; the 100% nominal and 75% perturbed success rates are measured performance metrics on hardware rather than outputs forced by the training data or prior author work. The central claim of contact-aware compensation is supported by these empirical benchmarks, which remain falsifiable outside any internal fitting process.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that teleoperated demonstrations capture the necessary contact dynamics and that randomized positional perturbations adequately proxy real construction uncertainty. No free parameters are explicitly fitted in the abstract; standard diffusion model training is used.

axioms (1)
  • domain assumption Teleoperated demonstrations provide sufficient coverage of contact-rich behaviors for policy generalization
    Invoked when training sensory-motor policies from human-guided data

pith-pipeline@v0.9.0 · 5504 in / 1197 out tokens · 45849 ms · 2026-05-17T20:01:22.877428+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. From Reach to Insert: Tactile-Augmented Precision Assembly under Sub-Millimeter Tolerances

    cs.RO 2026-05 unverdicted novelty 6.0

    A two-stage IL-RL method with tactile group sampling and a tactile critic achieves 67% success at 0.05 mm clearance while cutting max force by 60% and torque by 44%.

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages · cited by 1 Pith paper · 8 internal anchors

  1. [1]

    Robotics and automated systems in construction: Under- standing industry-specific challenges for adoption.Journal of Building Engineering, 26, pp

    Delgado JMD, Oyedele L, Ajayi A, Akanbi L, Akinade O, Bilal M, Owolabi H. Robotics and automated systems in construction: Under- standing industry-specific challenges for adoption.Journal of Building Engineering, 26, pp. 100868, 2019,https://doi.org/10.1016/j.jo be.2019.100868

  2. [2]

    Intelligent robots and human- robot collaboration in the construction industry: a review.Journal of Intelligent Construction, 1(1), pp

    Wei HH, Zhang Y , Sun X, Chen J, Li S. Intelligent robots and human- robot collaboration in the construction industry: a review.Journal of Intelligent Construction, 1(1), pp. 1–12, 2023,https://doi.org/10.2 6599/JIC.2023.9180002

  3. [3]

    Construction work and education: occupational health and safety reviewed.Construction Management and Economics, 17, pp

    Laukkanen T. Construction work and education: occupational health and safety reviewed.Construction Management and Economics, 17, pp. 53– 62, 1999,https://doi.org/10.1080/014461999371826

  4. [4]

    Construction work and risk of occupational disability: a ten year follow up of 14 474 male workers.Occupational and Environmental Medicine, 62, pp

    Arndt V , Rothenbacher D, Daniel U, Zschenderlein B, Schuberth S, Bren- ner H. Construction work and risk of occupational disability: a ten year follow up of 14 474 male workers.Occupational and Environmental Medicine, 62, pp. 559–566, 2005,https://doi.org/10.1136/oe m.2004.018135

  5. [5]

    The future of construction automation: Technological disrup- tion and the upcoming ubiquity of robotics.Automation in Construction, 59, pp

    Bock T. The future of construction automation: Technological disrup- tion and the upcoming ubiquity of robotics.Automation in Construction, 59, pp. 113–121, 2015,https://doi.org/10.1016/j.autcon.201 5.07.022

  6. [6]

    Substitution of workforce with robotics in the construction industry: A wise or witless approach.Journal of Open Innovation: Technology, Market, and Com- plexity, 10(4), pp

    Musarat MA, Alaloul WS, Rostam NAQA, Khan AM. Substitution of workforce with robotics in the construction industry: A wise or witless approach.Journal of Open Innovation: Technology, Market, and Com- plexity, 10(4), pp. 100420, 2024,https://doi.org/10.1016/j.joit mc.2024.100420

  7. [7]

    Advancing robotic assembly in construction: Innova- tions, challenges, and opportunities.Automation in Construction, 178, pp

    Chen Z, Adel A. Advancing robotic assembly in construction: Innova- tions, challenges, and opportunities.Automation in Construction, 178, pp. 106370, 2025,https://doi.org/10.1016/j.autcon.2025.1063 70

  8. [8]

    Parallel paths of inquiry: Detailing for DFAB HOUSE.Technology|Architecture+Design, 5, pp

    Graser K, Adel A, Baur M, Pont DS, Thoma A. Parallel paths of inquiry: Detailing for DFAB HOUSE.Technology|Architecture+Design, 5, pp. 38–43, 2021,https://doi.org/10.1080/24751448.2021.186366 8

  9. [9]

    Feedback-driven adaptive multi- robot timber construction.Automation in Construction, 164, pp

    Adel A, Ruan D, McGee W, Mozaffari S. Feedback-driven adaptive multi- robot timber construction.Automation in Construction, 164, pp. 105444, 2024,https://doi.org/10.1016/j.autcon.2024.105444

  10. [10]

    New paradigms of the automatic robotic timber construction in architecture

    Willmann J, Gramazio F, Kohler M. New paradigms of the automatic robotic timber construction in architecture. InAdvancing Wood Ar- chitecture: a Computational Approach, pp. 13–27. Routledge, 2017, https://doi.org/10.4324/9781315678825-2

  11. [11]

    PhD thesis, ETH Zurich, 2018,https://doi.org/10.3929/ ethz-b-000266723

    Apolinarska AA.Complex timber structures from simple elements: com- putational design of novel bar structures for robotic fabrication and as- sembly. PhD thesis, ETH Zurich, 2018,https://doi.org/10.3929/ ethz-b-000266723

  12. [13]

    PhD thesis, ETH Zurich, 2020, https://doi.org/10.3929/ethz-b-000439443

    Adel A.Computational Design for Cooperative Robotic Assembly of Nonstandard Timber Frame Buildings. PhD thesis, ETH Zurich, 2020, https://doi.org/10.3929/ethz-b-000439443

  13. [14]

    Co-robotic assembly of nonstandard timber structures

    Adel A. Co-robotic assembly of nonstandard timber structures. InHy- brids&Haecceities, Proceedings of the 42nd Annual Conference of the Association for Computer Aided Design in Architecture (ACADIA), pp. 604–613. CumInCAD, 2022,https://doi.org/10.7302/8675

  14. [15]

    Computational design and on-site mobile robotic construction of an adaptive reinforce- ment beam network for cross-laminated timber slab panels.Automation in Construction, 142, pp

    Chai H, Wagner HJ, Guo Z, Qi Y , Menges A, Yuan PF. Computational design and on-site mobile robotic construction of an adaptive reinforce- ment beam network for cross-laminated timber slab panels.Automation in Construction, 142, pp. 104536, 2022,https://doi.org/10.1016/ j.autcon.2022.104536

  15. [16]

    Automated on-site assembly of timber buildings on the example of a biomimetic shell.Au- tomation in Construction, 156, pp

    Lauer APR, Benner E, Stark T, Klassen S, Abolhasani S, Schroth L, Gien- ger A, Wagner HJ, Schwieger V , Menges A et al. Automated on-site assembly of timber buildings on the example of a biomimetic shell.Au- tomation in Construction, 156, pp. 105118, 2023,https://doi.org/ 10.1016/j.autcon.2023.105118

  16. [17]

    Au- tomatic assembly of jointed timber structure using distributed robotic clamps

    Leung PY , Apolinarska AA, Tanadini D, Gramazio F, Kohler M. Au- tomatic assembly of jointed timber structure using distributed robotic clamps. InPROJECTIONS, Proceedings of the 26th International Con- ference of the Association for Computer-Aided Architectural Design (CAADRIA), volume 1, pp. 583–592. CumInCAD, 2021,https://do i.org/10.52842/conf.caadria....

  17. [18]

    A survey of robot manipu- lation in contact.Robotics and Autonomous Systems, 156, pp

    Suomalainen M, Karayiannidis Y , Kyrki V . A survey of robot manipu- lation in contact.Robotics and Autonomous Systems, 156, pp. 104224, 2022,https://doi.org/10.1016/j.robot.2022.104224

  18. [19]

    Robotic assembly of timber joints using reinforcement learning.Au- tomation in Construction, 125, pp

    Apolinarska AA, Pacher M, Li H, Cote N, Pastrana R, Gramazio F, Kohler M. Robotic assembly of timber joints using reinforcement learning.Au- tomation in Construction, 125, pp. 103569, 2021,https://doi.org/ 10.1016/j.autcon.2021.103569

  19. [20]

    Robotic assembly of timber structures in a human-robot collaboration setup.Frontiers in Robotics and AI, 8, pp

    Kramberger A, Kunic A, Iturrate I, Sloth C, Naboni R, Schlette C. Robotic assembly of timber structures in a human-robot collaboration setup.Frontiers in Robotics and AI, 8, pp. 768038, 2022,https: //doi.org/10.3389/frobt.2021.768038

  20. [21]

    Automatic reconstruction of parametric building models from indoor point clouds

    Yang X, Amtsberg F, Sedlmair M, Menges A. Challenges and potential for human–robot collaboration in timber prefabrication.Automation in Construction, 160, pp. 105333, 2024,https://doi.org/10.1016/j. autcon.2024.105333

  21. [22]

    Simon and Schuster, 1981

    Benson T.Building the timber frame house: The revival of a forgotten craft. Simon and Schuster, 1981

  22. [23]

    Fang D, Mueller C. Mortise-and-tenon joinery for modern timber con- struction: Quantifying the embodied carbon of an alternative structural connection.Architecture, Structures and Construction, 3(1), pp. 11–24, 2023,https://doi.org/10.1007/s44150-021-00018-5

  23. [24]

    Mortise-and-tenon joinery for modern timber construction: Quantifying the embodied carbon of an alternative structural connection

    Fang D. Mortise-and-tenon joinery for modern timber construction: Quantifying the embodied carbon of an alternative structural connection. Master’s thesis, Massachusetts Institute of Technology, 2021,https: //hdl.handle.net/1721.1/145614

  24. [25]

    End-to-end training of deep visuo- motor policies.Journal of Machine Learning Research, 17(39), pp

    Levine S, Finn C, Darrell T, Abbeel P. End-to-end training of deep visuo- motor policies.Journal of Machine Learning Research, 17(39), pp. 1–40, 2016,http://jmlr.org/papers/v17/15-522.html

  25. [26]

    One-shot visual imitation learning via meta-learning

    Finn C, Yu T, Zhang T, Abbeel P, Levine S. One-shot visual imitation learning via meta-learning. InProceedings of the Conference on Robot Learning CoRL, pp. 357–368. PMLR, 2017,https://proceedings. mlr.press/v78/finn17a.html

  26. [27]

    Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection.The International journal of robotics research, 37(4-5), pp

    Levine S, Pastor P, Krizhevsky A, Ibarz J, Quillen D. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection.The International journal of robotics research, 37(4-5), pp. 421–436, 2018,https://doi.org/10.1177/0278364917710318

  27. [28]

    Scalable deep reinforce- ment learning for vision-based robotic manipulation

    Kalashnikov D, Irpan A, Pastor P, Ibarz J, Herzog A, Jang E, Quillen D, Holly E, Kalakrishnan M, Vanhoucke V et al. Scalable deep reinforce- ment learning for vision-based robotic manipulation. InProccedings of Conference on robot learning (CoRL), volume 87, pp. 651–673. PMLR, 2018,https://proceedings.mlr.press/v87/kalashnikov18a.h tml

  28. [29]

    RT-1: Robotics Transformer for Real-World Control at Scale

    Brohan A, Brown N, Carbajal J, Chebotar Y , Dabis J, Finn C, Gopalakr- ishnan K, Hausman K, Herzog A, Hsu J, Ibarz J, Ichter B, Irpan A, Jack- son T, Jesmonth S, Joshi NJ, Julian R, Kalashnikov D, Kuang Y , Leal I, Lee KH, Levine S, Lu Y , Malla U, Manjunath D, Mordatch I, Nachum O, Parada C, Peralta J, Perez E, Pertsch K, Quiambao J, Rao K, Ryoo M, Salaz...

  29. [30]

    Diffusion policy: Visuomotor policy learning via action diffusion

    Chi C, Xu Z, Feng S, Cousineau E, Du Y , Burchfiel B, Tedrake R, Song S. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, pp. 02783649241273668, 2024,https://doi.org/10.1177/02783649241273668

  30. [31]

    Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

    Chi C, Xu Z, Pan C, Cousineau E, Burchfiel B, Feng S, Tedrake R, Song S. Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots, 2024,https://arxiv.org/abs/2402.10329

  31. [32]

    Adaptive compliance policy: Learning approximate compliance for diffusion guided control, 2025,https://arxiv.org/abs/2410.0 9309

    Hou Y , Liu Z, Chi C, Cousineau E, Kuppuswamy N, Feng S, Burchfiel B, Song S. Adaptive compliance policy: Learning approximate compliance for diffusion guided control, 2025,https://arxiv.org/abs/2410.0 9309

  32. [33]

    Physics-driven data generation for contact-rich manipulation via trajectory optimization, 2025,https://arxiv.org/abs/2502.2 0382

    Yang L, Suh HJT, Zhao T, Graesdal BP, Kelestemur T, Wang J, Pang T, Tedrake R. Physics-driven data generation for contact-rich manipulation via trajectory optimization, 2025,https://arxiv.org/abs/2502.2 0382

  33. [34]

    Should we learn contact-rich manipulation policies from sampling-based planners? IEEE Robotics and Automation Letters, 10(6), pp

    Zhu H, Zhao T, Ni X, Wang J, Fang K, Righetti L, Pang T. Should we learn contact-rich manipulation policies from sampling-based planners? IEEE Robotics and Automation Letters, 10(6), pp. 6248–6255, 2025,ht tps://doi.org/10.1109/LRA.2025.3564701

  34. [35]

    End-effector pose correc- tion for versatile large-scale multi-robotic systems.IEEE Robotics and Automation Letters, 4, pp

    Stadelmann L, Sandy T, Thoma A, Buchli J. End-effector pose correc- tion for versatile large-scale multi-robotic systems.IEEE Robotics and Automation Letters, 4, pp. 546–553, 2019,https://doi.org/10.110 9/LRA.2019.2891499

  35. [36]

    Tolerance-aware design of robotically assembled spatial structures

    Gandia A, Gramazio F, Kohler M. Tolerance-aware design of robotically assembled spatial structures. InHybrids&Haecceities, Proceedings of the 42nd Annual Conference of the Association for Computer Aided De- sign in Architecture (ACADIA), pp. 4–23. CumInCAD, 2022,https: //papers.cumincad.org/cgi-bin/works/Show?acadia22_4

  36. [37]

    Additive robotic fabrication of complex timber structures

    Helm V , Knauss M, Kohlhammer T, Gramazio F, Kohler M. Additive robotic fabrication of complex timber structures. InAdvancing Wood Architecture: A Computational Approach, pp. 29–44. Routledge, 2016, https://doi.org/10.4324/9781315678825-3

  37. [38]

    Reducing uncertainty in multi-robot con- struction through perception modelling and adaptive fabrication

    Ruan D, McGee W, Adel A. Reducing uncertainty in multi-robot con- struction through perception modelling and adaptive fabrication. InPro- ceedings of 40th International Symposium on Automation and Robotics in Construction (ISARC), pp. 25–31. IAARC Publications, 2023,https: //doi.org/10.22260/ISARC2023/0006

  38. [39]

    Adaptive robotic con- struction of wood frames.Construction Robotics, 8(1), pp

    Cote N, Tish D, Koehle M, Koga Y , Chitta S. Adaptive robotic con- struction of wood frames.Construction Robotics, 8(1), pp. 8, 2024, https://doi.org/10.1007/s41693-024-00122-0

  39. [40]

    Robotic assembly of modular multi-storey timber- only frame structures using traditional wood joinery

    Helmreich M, Mayer H, Pacher M, Nakajima T, Kuroki M, Tsubata S, Gramazio F, Kohler M. Robotic assembly of modular multi-storey timber- only frame structures using traditional wood joinery. InProceedings of the 27th International Conference for the Association for Computer- Aided Architectural Design Research in Asia (CAADRIA), pp. 111–120. CumInCAD, 2022...

  40. [41]

    A unified passivity-based control framework for position, torque and impedance control of flexible joint robots.The international journal of robotics research, 26(1), pp

    Albu-Schäffer A, Ott C, Hirzinger G. A unified passivity-based control framework for position, torque and impedance control of flexible joint robots.The international journal of robotics research, 26(1), pp. 23–39, 2007,https://doi.org/10.1177/0278364907073776

  41. [42]

    Can robots assemble an ikea chair? Science Robotics, 3(17), pp

    Suárez-Ruiz F, Zhou X, Pham QC. Can robots assemble an ikea chair? Science Robotics, 3(17), pp. eaat6385, 2018,https://doi.org/10.1 126/scirobotics.aat6385

  42. [43]

    A prac- tical approach to insertion with variable socket position using deep re- inforcement learning

    Vecerik M, Sushkov O, Barker D, Rothörl T, Hester T, Scholz J. A prac- tical approach to insertion with variable socket position using deep re- inforcement learning. In2019 international conference on robotics and automation (ICRA), pp. 754–760. IEEE, 2019,https://doi.org/10 .1109/ICRA.2019.8794074

  43. [44]

    TartanAir: A dataset to push the limits of visual SLAM,

    Schoettler G, Nair A, Luo J, Bahl S, Ojea JA, Solowjow E, Levine S. Deep reinforcement learning for industrial insertion tasks with visual inputs and natural rewards. InProceedingd of 2020 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS), pp. 5548–5555. IEEE, 2020,https://doi.org/10.1109/IROS45743.2020.9341714

  44. [45]

    A framework for robot manip- ulation: Skill formalism, meta learning and adaptive control

    Johannsmeier L, Gerchow M, Haddadin S. A framework for robot manip- ulation: Skill formalism, meta learning and adaptive control. InInterna- tional Conference on Robotics and Automation (ICRA), pp. 5844–5850. IEEE, 2019,https://doi.org/10.1109/ICRA.2019.8793542

  45. [46]

    Robotic integral attachment

    Robeller C, Weinand Y , Helm V , Thoma A, Gramazio F, Kohler M. Robotic integral attachment. InProceedings of Fabricate 2017: Rethink- ing Design and Construction, volume 3, pp. 92–97. UCL Press, 2017, https://doi.org/10.2307/j.ctt1n7qkg7.16

  46. [47]

    PhD thesis, EPFL, 2023,https://infoscience.epfl.ch/entities/p ublication/6fd77403-f912-4f03-a68c-18a3bac91960

    Rogeau NHPL.Robotic Assembly of Integrally-Attached Timber Plate Structures: From Computational Design to Automated Construction. PhD thesis, EPFL, 2023,https://infoscience.epfl.ch/entities/p ublication/6fd77403-f912-4f03-a68c-18a3bac91960

  47. [48]

    Deep imi- tation learning for humanoid loco-manipulation through human teleoper- ation

    Seo M, Han S, Sim K, Bang SH, Gonzalez C, Sentis L, Zhu Y . Deep imi- tation learning for humanoid loco-manipulation through human teleoper- ation. In2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids), pp. 1–8, 2023,https://doi.org/10.1109/Hu manoids57100.2023.10375203

  48. [49]

    and Imai, H

    Wang C, Fan L, Sun J, Zhang R, Fei-Fei L, Xu D, Zhu Y , Anandkumar A. Mimicplay: Long-horizon imitation learning by watching human play. In Proceedings of The 7th Conference on Robot Learning (CoRL), volume 229, pp. 201–221, 2023,https://doi.org/10.48550/arXiv.2302. 12422

  49. [50]

    Videodex: Learning dexterity from internet videos

    Shaw K, Bahl S, Pathak D. Videodex: Learning dexterity from internet videos. In Liu K, Kulic D, Ichnowski J, editors,Proceedings of The 6th Conference on Robot Learning, volume 205 ofProceedings of Machine Learning Research, pp. 654–665. PMLR, 14–18 Dec 2023,https://pr oceedings.mlr.press/v205/shaw23a.html

  50. [51]

    Learning fine-grained bimanual manipulation with low-cost hardware

    Zhao TZ, Kumar V , Levine S, Finn C. Learning fine-grained bimanual manipulation with low-cost hardware. InProceedings of Robotics: Sci- ence and Systems XIX. Robotics: Science and Systems Foundation, 2023, https://www.roboticsproceedings.org/rss19/p016.pdf

  51. [52]

    ALOHA unleashed: A simple recipe for robot dexter- ity

    Zhao TZ, Tompson J, Driess D, Florence P, Ghasemipour SKS, Finn C, Wahid A. ALOHA unleashed: A simple recipe for robot dexter- ity. In Agrawal P, Kroemer O, Burgard W, editors,Proceedings of The 8th Conference on Robot Learning, volume 270 ofProceedings of Machine Learning Research, pp. 1910–1924. PMLR, 06–09 Nov 2025, https://proceedings.mlr.press/v270/z...

  52. [53]

    3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations

    Ze Y , Zhang G, Zhang K, Hu C, Wang M, Xu H. 3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations. InICRA 2024 Workshop on 3D Visual Representations for Robot Manip- ulation, 2024,https://www.roboticsproceedings.org/rss20/p0 67.pdf

  53. [54]

    Denoising Diffusion Probabilistic Models

    Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS), volume 33, pp. 6840–6851, 2020,https: //doi.org/10.48550/arXiv.2006.11239

  54. [55]

    Deep unsu- pervised learning using nonequilibrium thermodynamics

    Sohl-Dickstein J, Weiss E, Maheswaranathan N, Ganguli S. Deep unsu- pervised learning using nonequilibrium thermodynamics. InProceedings of the 32nd International Conference on Machine Learning, volume 37 ofProceedings of Machine Learning Research, pp. 2256–2265. PMLR, 07–09 Jul 2015,https://proceedings.mlr.press/v37/sohl-dic kstein15.html

  55. [56]

    Denoising Diffusion Implicit Models

    Song J, Meng C, Ermon S. Denoising diffusion implicit models, 2022, https://doi.org/10.48550/arXiv.2010.02502

  56. [57]

    Self-supervised correspondence in visuomotor policy learning.IEEE Robotics and Automation Letters, 5 (2), pp

    Florence P, Manuelli L, Tedrake R. Self-supervised correspondence in visuomotor policy learning.IEEE Robotics and Automation Letters, 5 (2), pp. 492–499, 2020,https://doi.org/10.1109/LRA.2019.295 6365

  57. [58]

    Behavior transformers: Cloning k modes with one stone

    Shafiullah NM, Cui Z, Altanzaya AA, Pinto L. Behavior transformers: Cloning k modes with one stone. In Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, editors,Proceedings of Advances in Neural In- formation Processing Systems (NeurIPS), volume 35, pp. 22955–22968. Curran Associates, Inc., 2022,https://proceedings.neurips.cc /paper_files/paper/202...

  58. [59]

    Implicit behavioral cloning

    Florence P, Lynch C, Zeng A, Ramirez OA, Wahid A, Downs L, Wong A, Lee J, Mordatch I, Tompson J. Implicit behavioral cloning. InProceed- ings of the Conference on Robot Learning (CoRL), pp. 158–168. PMLR, 2022,https://proceedings.mlr.press/v164/florence22a.htm l

  59. [60]

    Strictly batch imitation learning by energy-based distribution matching

    Jarrett D, Bica I, van der Schaar M. Strictly batch imitation learning by energy-based distribution matching. InProceedings of Advances in Neural Information Processing Systems (NeurIPS), volume 33, pp. 7354– 7365, 2020,https://proceedings.neurips.cc/paper_files/p aper/2020/hash/524f141e189d2a00968c3d48cadd4159-Abstr act.html

  60. [61]

    Robotic compliant object prying using diffusion policy guided by vision and force observations.IEEE Robotics and Automation Letters, 10(6), pp

    Kang JH, Joshi S, Huang R, Gupta SK. Robotic compliant object prying using diffusion policy guided by vision and force observations.IEEE Robotics and Automation Letters, 10(6), pp. 5505–5512, 2025,https: //doi.org/10.1109/LRA.2025.3553689

  61. [62]

    Tacdiffusion: Force-domain diffusion policy for precise tactile manipulation,

    Wu Y , Chen Z, Wu F, Chen L, Zhang L, Bing Z, Swikir A, Haddadin S, Knoll A. Tacdiffusion: Force-domain diffusion policy for precise tactile manipulation, 2025,https://arxiv.org/abs/2409.11047

  62. [63]

    IRB 4600 40kg/2,55m,https://new.abb.com/produc ts/robotics/robots/articulated-robots/irb-4600

    ABB Group. IRB 4600 40kg/2,55m,https://new.abb.com/produc ts/robotics/robots/articulated-robots/irb-4600. Accessed October 21, 2024

  63. [64]

    F/T Sensor: Delta IP60,https://www

    ATI Industrial Automation. F/T Sensor: Delta IP60,https://www. ati- ia.com/products/ft/ft_models.aspx?id=delta+ip60. Accessed October 21, 2024

  64. [65]

    OPR 081-P00 Anti-collision and overload protection sensor,ht tps://schunk.com/us/en/automation-technology/anti-col lision-unit/opr/c/PGR_1105

    Schunk. OPR 081-P00 Anti-collision and overload protection sensor,ht tps://schunk.com/us/en/automation-technology/anti-col lision-unit/opr/c/PGR_1105. Accessed October 21, 2024

  65. [66]

    ABB AB, Robotics and Motion, Västerås, Sweden,

    ABB Robotics.Technical Reference Manual - RAPID Instructions, Func- tions and Data Types. ABB AB, Robotics and Motion, Västerås, Sweden,

  66. [67]

    CX2062|Embedded PC with Intel ® Xeon® D-1548

    Beckoff. CX2062|Embedded PC with Intel ® Xeon® D-1548. https://www.beckhoff.com/en-us/products/ipc/embedded-pcs/cx20x2- intel-r-xeon-r-d/cx2062.html. Accessed: 2025-05-08

  67. [68]

    ROS 2 Jazzy Jalisco, 2023,https://docs.ros.org/e n/jazzy/

    Open Robotics. ROS 2 Jazzy Jalisco, 2023,https://docs.ros.org/e n/jazzy/. Accessed: 2025-05-08

  68. [69]

    HTC VIVE Pro 2,https://www.vive.com/us/pr oduct/vive-pro2/overview/

    HTC Corporation. HTC VIVE Pro 2,https://www.vive.com/us/pr oduct/vive-pro2/overview/. Accessed October 21, 2024

  69. [70]

    OpenVR SDK, 2024,https://github.com/Val veSoftware/openvr

    Valve Corporation. OpenVR SDK, 2024,https://github.com/Val veSoftware/openvr. Accessed: 2025-05-12

  70. [71]

    On the theory of filter amplifiers.Wireless Engineer, 7 (6), pp

    Butterworth S. On the theory of filter amplifiers.Wireless Engineer, 7 (6), pp. 536–541, 1930

  71. [72]

    Built different: Tactile percep- tion to overcome cross-embodiment capability differences in collabora- tive manipulation, 2024,https://arxiv.org/abs/2409.14896

    van den Bogert W, Iyengar M, Fazeli N. Built different: Tactile percep- tion to overcome cross-embodiment capability differences in collabora- tive manipulation, 2024,https://arxiv.org/abs/2409.14896

  72. [73]

    On the continuity of rotation rep- resentations in neural networks

    Zhou Y , Barnes C, Lu J, Yang J, Li H. On the continuity of rotation rep- resentations in neural networks. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pp. 5745–5753, 2019, https://openaccess.thecvf.com/content_CVPR_2019/papers /Zhou_On_the_Continuity_of_Rotation_Representations_in _Neural_Networks_CVPR_2019_paper.pdf

  73. [74]

    Improved denoising diffusion probabilistic mod- els

    Nichol AQ, Dhariwal P. Improved denoising diffusion probabilistic mod- els. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pp. 8162–8171. PMLR, 18–24 Jul 2021,https://proceedings.mlr. press/v139/nichol21a.html

  74. [75]

    Making sense of vision and touch: Learning multimodal representations for contact-rich tasks.IEEE Transactions on Robotics, 36 (3), pp

    Lee MA, Zhu Y , Zachares P, Tan M, Srinivasan K, Savarese S, Fei-Fei L, Garg A, Bohg J. Making sense of vision and touch: Learning multimodal representations for contact-rich tasks.IEEE Transactions on Robotics, 36 (3), pp. 582–596, 2020,https://doi.org/10.1109/TRO.2019.295 9445

  75. [76]

    Octo: An Open-Source Generalist Robot Policy

    Octo Model Team, Ghosh D, Walke H, Pertsch K, Black K, Mees O, Dasari S, Hejna J, Kreiman T, Xu C, Luo J, Tan YL, Chen LY , Sanketi P, Vuong Q, Xiao T, Sadigh D, Finn C, Levine S. Octo: An open-source generalist robot policy, 2024,https://arxiv.org/abs/2405.12213

  76. [77]

    OpenVLA: An Open-Source Vision-Language-Action Model

    Kim MJ, Pertsch K, Karamcheti S, Xiao T, Balakrishna A, Nair S, Rafailov R, Foster E, Lam G, Sanketi P, Vuong Q, Kollar T, Burchfiel B, Tedrake R, Sadigh D, Levine S, Liang P, Finn C. OpenVLA: An open- source vision-language-action model, 2024,https://arxiv.org/ab s/2406.09246

  77. [78]

    Engrained performance: Performance-driven com- putational design of a robotically assembled shingle facade system

    Craney R, Adel A. Engrained performance: Performance-driven com- putational design of a robotically assembled shingle facade system. In Distributed Proximities, Proceedings of the 40th Annual Conference of the Association of Computer Aided Design in Architecture (ACADIA), pp. 604–613. CumInCAD, 2020,https://doi.org/10.52842/conf.ac adia.2020.1.604

  78. [79]

    A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation

    TRI LBM Team, Barreiros J, Beaulieu A, Bhat A, Cory R, Cousineau E, Dai H, Fang CH, Hashimoto K, Irshad MZ, Itkina M, Kuppuswamy N, Lee KH, Liu K, McConachie D, McMahon I, Nishimura H, Phillips- Grafflin C, Richter C, Shah P, Srinivasan K, Wulfe B, Xu C, Zhang M, Alspach A, Angeles M, Arora K, Guizilini VC, Castro A, Chen D, Chu TS, Creasey S, Curtis S, D...