Recognition: unknown
Simulator Adaptation for Sim-to-Real Learning of Legged Locomotion via Proprioceptive Distribution Matching
Pith reviewed 2026-05-10 15:34 UTC · model grok-4.3
The pith
Proprioceptive distribution matching adapts simulators for legged robot policies using only joint data, matching privileged methods without motion capture.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Simulator adaptation via proprioceptive distribution matching recovers accurate dynamics parameters and improves real-world policy performance comparably to state-matching baselines, as shown in extensive sim-to-sim ablations on the Go2 quadruped and in real hardware tests that reduce drift with less than five minutes of data even for two-legged walking.
What carries the argument
Proprioceptive distribution matching, which quantifies dynamics discrepancies by comparing simulation and hardware rollouts as distributions of joint observations and actions without requiring time alignment or external sensors.
Load-bearing premise
Comparing distributions of proprioceptive joint observations and actions is sufficient to identify and correct the relevant dynamics discrepancies without time alignment or privileged state information.
What would settle it
If adapting the simulator with proprioceptive distribution matching produces no improvement in parameter recovery accuracy or real-world policy drift reduction on the Go2 quadruped compared to an unadapted simulator.
Figures
read the original abstract
Simulation trained legged locomotion policies often exhibit performance loss on hardware due to dynamics discrepancies between the simulator and the real world, highlighting the need for approaches that adapt the simulator itself to better match hardware behavior. Prior work typically quantify these discrepancies through precise, time-aligned matching of joint and base trajectories. This process requires motion capture, privileged sensing, and carefully controlled initial conditions. We introduce a practical alternative based on proprioceptive distribution matching, which compares hardware and simulation rollouts as distributions of joint observations and actions, eliminating the need for time alignment or external sensing. Using this metric as a black-box objective, we explore adapting simulator dynamics through parameter identification, action-delta models, and residual actuator models. Our approach matches the parameter recovery and policy-performance gains of privileged state-matching baselines across extensive sim-to-sim ablations on the Go2 quadruped. Real-world experiments demonstrate substantial drift reduction using less than five minutes of hardware data, even for a challenging two-legged walking behavior. These results demonstrate that proprioceptive distribution matching provides a practical and effective route to simulator adaptation for sim-to-real transfer of learned legged locomotion.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that proprioceptive distribution matching—comparing marginal distributions of joint observations and actions from hardware and simulation rollouts, without time alignment or external sensing—provides a practical method for adapting simulators to reduce dynamics discrepancies in sim-to-real transfer of legged locomotion policies. It explores three adaptation strategies (simulator parameter identification, action-delta models, and residual actuator models) and reports that these match the parameter recovery and policy performance of privileged state-matching baselines across extensive sim-to-sim ablations on the Go2 quadruped. Real-world experiments further show substantial drift reduction using less than five minutes of hardware data, including for a challenging two-legged walking behavior.
Significance. If the central results hold, the work provides a low-overhead alternative to trajectory-matching sim-to-real methods that typically require motion capture or privileged state information. The extensive sim-to-sim ablations on a standard quadruped platform and the real-world validation with minimal data collection are notable strengths, as they directly address practical deployment constraints for learned locomotion policies. Credit is due for demonstrating effectiveness on a non-standard behavior (two-legged walking) and for framing the adaptation as a black-box optimization problem.
major comments (2)
- [§3] §3 (Proprioceptive Distribution Matching): The central objective minimizes distance between marginal distributions of proprioceptive observations and actions. This formulation does not enforce matching of temporal structure, transition probabilities P(o_{t+1}|o_t, a_t), or phase relationships. Consequently, distinct dynamics parameter sets (e.g., compensating friction and inertia changes that preserve joint-angle histograms) can produce statistically indistinguishable marginals under the same policy. The sim-to-sim ablations claim equivalent parameter recovery to privileged baselines, yet no identifiability analysis, sensitivity to initialization, or uniqueness checks are reported. This directly affects the claim that the method 'recovers the relevant dynamics discrepancies.'
- [Real-world results section] Real-world results section (and abstract): The experiments report 'substantial drift reduction' with <5 min of hardware data for both quadrupedal and two-legged behaviors. However, the provided abstract contains no quantitative metrics, error bars, or explicit baseline comparisons for the real-world drift (e.g., position error over time or success rate). If the full manuscript similarly omits detailed statistics or exclusion criteria for the hardware rollouts, the evidence for practical effectiveness remains difficult to assess independently of post-hoc choices.
minor comments (2)
- [Abstract] The abstract would be strengthened by including one or two key quantitative results (e.g., drift reduction percentages or parameter recovery error) to support the qualitative claims of 'substantial' improvement and 'matching' baselines.
- [§3] Notation for the distribution distance metric (e.g., whether Wasserstein, KL, or MMD is used) and the precise definition of the proprioceptive observation vector should be clarified early in the method section for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and for recognizing the practical strengths of our approach. We address each major comment below, indicating planned revisions to the manuscript.
read point-by-point responses
-
Referee: The central objective minimizes distance between marginal distributions of proprioceptive observations and actions. This formulation does not enforce matching of temporal structure, transition probabilities P(o_{t+1}|o_t, a_t), or phase relationships. Consequently, distinct dynamics parameter sets (e.g., compensating friction and inertia changes that preserve joint-angle histograms) can produce statistically indistinguishable marginals under the same policy. The sim-to-sim ablations claim equivalent parameter recovery to privileged baselines, yet no identifiability analysis, sensitivity to initialization, or uniqueness checks are reported. This directly affects the claim that the method 'recovers the relevant dynamics discrepancies.'
Authors: We agree that marginal distribution matching does not enforce temporal structure or guarantee unique recovery of dynamics parameters. Nevertheless, our sim-to-sim ablations demonstrate that the adapted simulators achieve parameter recovery and downstream policy performance equivalent to privileged state-matching baselines across varied conditions on the Go2. This provides empirical support that the method identifies discrepancies relevant to policy transfer. We will add a discussion of these theoretical limitations, including identifiability considerations, and report sensitivity to initialization from our existing experimental results. revision: partial
-
Referee: Real-world results section (and abstract): The experiments report 'substantial drift reduction' with <5 min of hardware data for both quadrupedal and two-legged behaviors. However, the provided abstract contains no quantitative metrics, error bars, or explicit baseline comparisons for the real-world drift (e.g., position error over time or success rate). If the full manuscript similarly omits detailed statistics or exclusion criteria for the hardware rollouts, the evidence for practical effectiveness remains difficult to assess independently of post-hoc choices.
Authors: The full manuscript reports quantitative real-world metrics including position drift over time with error bars, success rates, and comparisons against unadapted and baseline simulators, along with details on data collection and rollout criteria. To improve accessibility, we will revise the abstract to include key quantitative results and explicit baseline comparisons for the observed drift reduction. revision: yes
Circularity Check
No significant circularity; adaptation objective and validation are externally grounded
full rationale
The paper defines its core objective as minimizing a distribution distance between proprioceptive observations and actions collected from independent hardware rollouts and simulator rollouts. This target is external to the optimization and is not derived from the fitted parameters themselves. Claims of matching privileged baselines are supported by separate sim-to-sim ablations, while real-world drift reduction is measured on held-out hardware trials using <5 min of data. No equations reduce any reported gain to a fitted quantity by construction, no uniqueness theorems are imported from self-citations, and no ansatz is smuggled via prior work. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- simulator dynamics parameters
- parameters of action-delta and residual actuator models
axioms (1)
- domain assumption Proprioceptive distributions of joint observations and actions are sufficient to quantify relevant sim-to-real dynamics discrepancies
Reference graph
Works this paper leans on
-
[1]
Sim-to-real transfer in deep reinforcement learning for robotics: A survey,
W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-real transfer in deep reinforcement learning for robotics: A survey,”2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020, pp. 737–744, 2020
2020
-
[2]
Sim-to-real transfer of robotic control with dynamics randomization,
X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 5 2018, pp. 3803–3810. [Online]. Available: https://ieeexplore.ieee.org/document/8460528/
-
[3]
Data-efficient domain randomization with bayesian optimization,
F. Muratore, C. Eilers, M. Gienger, and J. Peters, “Data-efficient domain randomization with bayesian optimization,”IEEE Robotics and Automation Letters, vol. 6, pp. 911–918, 4 2021
2021
-
[4]
Bayessim: Adaptive domain randomization via probabilistic inference for robotics simulators,
F. Ramos, R. Possas, and D. Fox, “Bayessim: Adaptive domain randomization via probabilistic inference for robotics simulators,” 2019
2019
-
[5]
Auto-tuned sim-to-real transfer,
Y . Du, O. Watkins, T. Darrell, P. Abbeel, and D. Pathak, “Auto-tuned sim-to-real transfer,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 5 2021, pp. 1290–1296. [Online]. Available: https://ieeexplore.ieee.org/document/9562091/
-
[6]
Learning human-to-humanoid real-time whole-body teleoperation,
T. He, Z. Luo, W. Xiao, C. Zhang, K. Kitani, C. Liu, and G. Shi, “Learning human-to-humanoid real-time whole-body teleoperation,” in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10 2024, pp. 8944–8951
2024
-
[7]
Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,
T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbab, C. Pan, Z. Yi, G. Qu, K. Kitani, J. Hodgins, L. J. Fan, Y . Zhu, C. Liu, and G. Shi, “Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,” 2 2025
2025
-
[8]
Bridging the sim-to-real gap for athletic loco-manipulation,
N. Fey, G. B. Margolis, M. Peticco, and P. Agrawal, “Bridging the sim-to-real gap for athletic loco-manipulation,” 2 2025
2025
-
[9]
Sim-to-real transfer for biped locomotion,
W. Yu, V . C. Kumar, G. Turk, and C. K. Liu, “Sim-to-real transfer for biped locomotion,” in2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 11 2019, pp. 3503–3510
2019
-
[10]
Preparing for the unknown: Learning a universal policy with online system identification,
W. Yu, J. Tan, C. K. Liu, and G. Turk, “Preparing for the unknown: Learning a universal policy with online system identification,” in Robotics: Science and Systems XIII. Robotics: Science and Systems Foundation, 7 2017
2017
-
[11]
Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion,
G. Ji, J. Mun, H. Kim, and J. Hwangbo, “Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion,”IEEE Robotics and Automation Letters, vol. 7, pp. 4630– 4637, 4 2022
2022
-
[12]
Adapting rapid motor adaptation for bipedal robots,
A. Kumar, Z. Li, J. Zeng, D. Pathak, K. Sreenath, and J. Malik, “Adapting rapid motor adaptation for bipedal robots,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10 2022, pp. 1161–1168
2022
-
[13]
Advancing humanoid locomotion: Mastering challenging terrains with denoising world model learning,
X. Gu, Y .-J. Wang, X. Zhu, C. Shi, Y . Guo, Y . Liu, and J. Chen, “Advancing humanoid locomotion: Mastering challenging terrains with denoising world model learning,” inRobotics: Science and Systems XX. Robotics: Science and Systems Foundation, 7 2024
2024
-
[14]
Cts: Concurrent teacher- student reinforcement learning for legged locomotion,
H. Wang, H. Luo, W. Zhang, and H. Chen, “Cts: Concurrent teacher- student reinforcement learning for legged locomotion,”IEEE Robotics and Automation Letters, vol. 9, pp. 9191–9198, 11 2024
2024
-
[15]
Rapid locomotion via reinforcement learning,
G. B. Margolis, G. Yang, K. Paigwar, T. Chen, and P. Agrawal, “Rapid locomotion via reinforcement learning,” inRobotics: Science and Systems XVIII. Robotics: Science and Systems Foundation, 6 2022
2022
-
[16]
Learning agile robotic locomotion skills by imitating animals,
X. B. Peng, E. Coumans, T. Zhang, T.-W. Lee, J. Tan, and S. Levine, “Learning agile robotic locomotion skills by imitating animals,” inRobotics: Science and Systems XVI. Robotics: Science and Systems Foundation, 2020. [Online]. Available: http://www.roboticsproceedings.org/rss16/p064.pdf
2020
-
[17]
Self-supervised policy adaptation during deployment,
N. Hansen, R. Jangir, Y . Sun, G. Aleny `a, P. Abbeel, A. A. Efros, L. Pinto, and X. Wang, “Self-supervised policy adaptation during deployment,” in9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. ICLR, 2021. [Online]. Available: https://openreview.net/forum?id= o V-MjyyGV
2021
-
[18]
Simgan: Hybrid simulator identification for domain adaptation via adversarial reinforcement learning,
Y . Jiang, T. Zhang, D. Ho, Y . Bai, C. K. Liu, S. Levine, and J. Tan, “Simgan: Hybrid simulator identification for domain adaptation via adversarial reinforcement learning,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 5 2021, pp. 2884–2890. [Online]. Available: https://ieeexplore.ieee.org/document/ 9561731/
2021
-
[19]
Improv- ing domain transfer of robot dynamics models with geometric system identification and learned friction compensation,
L. Schwendeman, A. SaLoutos, E. Stanger-Jones, and S. Kim, “Improv- ing domain transfer of robot dynamics models with geometric system identification and learned friction compensation,” in2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids). IEEE, 12 2023, pp. 1–8
2023
-
[20]
Estimation of inertial parameters of manipulator loads and links,
C. G. Atkeson, C. H. An, and J. M. Hollerbach, “Estimation of inertial parameters of manipulator loads and links,”The International Journal of Robotics Research, vol. 5, pp. 101–119, 9 1986
1986
-
[21]
Learning to walk in minutes using massively parallel deep reinforcement learning,
N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Proceedings of the 5th Conference on Robot Learning, A. Faust, D. Hsu, and G. Neumann, Eds., vol. 164. PMLR, 3 2022, pp. 91–100. [Online]. Available: https://proceedings.mlr.press/v164/rudin22a.html
2022
-
[22]
Imitate and repurpose: Learning reusable robot movement skills from human and animal behaviors,
S. Bohez, S. Tunyasuvunakool, P. Brakel, F. Sadeghi, L. Hasen- clever, Y . Tassa, E. Parisotto, J. Humplik, T. Haarnoja, R. Hafner, M. Wulfmeier, M. Neunert, B. Moran, N. Siegel, A. Huber, F. Romano, N. Batchelor, F. Casarini, J. Merel, R. Hadsell, and N. Heess, “Imitate and repurpose: Learning reusable robot movement skills from human and animal behavior...
2022
-
[23]
Sim-to-real: Learning agile locomotion for quadruped robots,
J. Tan, T. Zhang, E. Coumans, A. Iscen, Y . Bai, D. Hafner, S. Bohez, and V . Vanhoucke, “Sim-to-real: Learning agile locomotion for quadruped robots,” inRobotics: Science and Systems XIV. Robotics: Science and Systems Foundation, 6 2018. [Online]. Available: http://www.roboticsproceedings.org/rss14/p10.htmlhttp:// www.roboticsproceedings.org/rss14/p10.pdf
2018
-
[24]
Dynamic parameter identifi- cation of serial robots using a hybrid approach,
Y . Huang, J. Ke, X. Zhang, and J. Ota, “Dynamic parameter identifi- cation of serial robots using a hybrid approach,”IEEE Transactions on Robotics, vol. 39, pp. 1607–1621, 4 2023
2023
-
[25]
Learning agile and dynamic motor skills for legged robots,
J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V . Tsounis, V . Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,”Science Robotics, vol. 4, 1 2019
2019
-
[26]
Reinforced grounded action transformation for sim-to-real transfer,
H. Karnan, S. Desai, J. P. Hanna, G. Warnell, and P. Stone, “Reinforced grounded action transformation for sim-to-real transfer,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10 2020, pp. 4397–4402
2020
-
[27]
Rl2ac: Reinforcement learning-based rapid online adaptive control for legged robot robust locomotion,
S. Lyu, X. Lang, H. Zhao, H. Zhang, P. Ding, and D. Wang, “Rl2ac: Reinforcement learning-based rapid online adaptive control for legged robot robust locomotion,” inRobotics: Science and Systems XX. Robotics: Science and Systems Foundation, 7 2024
2024
-
[28]
Tossingbot: Learning to throw arbitrary objects with residual physics,
A. Zeng, S. Song, J. Lee, A. Rodriguez, and T. Funkhouser, “Tossingbot: Learning to throw arbitrary objects with residual physics,”IEEE Transactions on Robotics, vol. 36, pp. 1307–1319, 8 2020
2020
-
[29]
Data-efficient control policy search using residual dynamics learning,
M. Saveriano, Y . Yin, P. Falco, and D. Lee, “Data-efficient control policy search using residual dynamics learning,” in2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 9 2017, pp. 4709–4715. [Online]. Available: http://ieeexplore.ieee.org/document/8206343/
-
[30]
Neuralsim: Augmenting differentiable simulators with neural networks,
E. Heiden, D. Millard, E. Coumans, Y . Sheng, and G. S. Sukhatme, “Neuralsim: Augmenting differentiable simulators with neural networks,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 5 2021, pp. 9474–9481
2021
-
[31]
Sim- to-real of soft robots with learned residual physics,
J. Gao, M. Y . Michelis, A. Spielberg, and R. K. Katzschmann, “Sim- to-real of soft robots with learned residual physics,”IEEE Robotics and Automation Letters, vol. 9, pp. 8523–8530, 10 2024
2024
-
[32]
Residual physics learning and system identification for sim-to-real transfer of policies on buoyancy assisted legged robots,
N. Sontakke, H. Chae, S. Lee, T. Huang, D. W. Hong, and S. Hal, “Residual physics learning and system identification for sim-to-real transfer of policies on buoyancy assisted legged robots,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10 2023, pp. 392–399
2023
-
[33]
High-performance reinforcement learning on spot: Optimizing simulation parameters with distributional measures,
A. Miller, F. Yu, M. Brauckmann, and F. Farshidian, “High-performance reinforcement learning on spot: Optimizing simulation parameters with distributional measures,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 9981–9988
2025
-
[34]
2016, arXiv e-prints, arXiv:1604.00772, doi: 10.48550/arXiv.1604.00772
N. Hansen, “The CMA evolution strategy: A tutorial,”CoRR, vol. abs/1604.00772, 2016. [Online]. Available: http://arxiv.org/abs/1604. 00772
-
[35]
cmaes : A simple yet practical python library for cma-es,
M. Nomura and M. Shibata, “cmaes : A simple yet practical python library for cma-es,” 2024. [Online]. Available: https: //arxiv.org/abs/2402.01373
-
[36]
Multiple task optimization with a mixture of controllers for motion generation,
N. Dehio, R. F. Reinhart, and J. J. Steil, “Multiple task optimization with a mixture of controllers for motion generation,” in2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 9 2015, pp. 6416–6421
2015
-
[37]
Sampling-based system identification with active exploration for legged sim2real learning,
N. Sobanbabu, G. He, T. He, Y . Yang, and G. Shi, “Sampling-based system identification with active exploration for legged sim2real learning,” in9th Annual Conference on Robot Learning, 2025. [Online]. Available: https://openreview.net/forum?id=UTPBM4dEUS
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.