How Should a Simulation-to-Reality Transfer Budget Be Spent?
Pith reviewed 2026-06-26 12:03 UTC · model grok-4.3
The pith
In pendulum sim-to-real tests, a few identification rollouts closed most of the transfer gap while broad randomization did not substitute for measurement.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across the reality gaps and noise levels tested, the measurement budget did most of the work. A small number of identification rollouts closed most of the transfer gap, and once any real data was available, policies performed best when trained at the estimated parameters rather than over a widened randomization band. Broad randomization that contained the true system still did not substitute for measurement. These results hold in a benign regime where the dynamics are identifiable and only two parameters are unknown.
What carries the argument
The controlled tradeoff experiment that sweeps identification rollouts against randomization distribution width in a hidden-parameter pendulum model.
If this is right
- A small number of identification rollouts closes most of the transfer gap.
- Once real data is available, training at estimated parameters outperforms training over a widened band.
- Broad randomization that contains the true system does not substitute for measurement.
- Sim-to-real pipelines should first measure the parameters they can and reserve randomization for remaining uncertainty.
Where Pith is reading between the lines
- In regimes with structural model mismatch, the relative value of randomization breadth may increase.
- The same budget allocation logic could be tested on higher-dimensional robots or tasks with more parameters.
- Adaptive strategies that decide rollout allocation based on early identification results might further improve efficiency.
Load-bearing premise
The dynamics are identifiable with only two unknown parameters and no structural model mismatch.
What would settle it
An experiment in the same pendulum setup but with added structural mismatch or more unknown parameters where broad randomization closes a larger fraction of the gap than identification rollouts would falsify the claim.
Figures
read the original abstract
Simulation-to-reality transfer, often called sim-to-real transfer, is a central challenge in robot learning. Yet, the tradeoff between measuring a system more accurately and training over a broader range of simulated dynamics is still poorly understood. In this work, we focused on the allocation of real-robot measurement time between system identification and domain randomization. We studied this tradeoff in a controlled sim-to-sim pendulum setting, where a hidden-parameter model stands in for the physical robot, and the experiment sweeps identification rollouts against the width of the randomization distribution. Across the reality gaps and noise levels we tested, the measurement budget did most of the work. A small number of identification rollouts closed most of the transfer gap, and once any real data was available, policies performed best when trained at the estimated parameters rather than over a widened randomization band. Broad randomization that contained the true system still did not substitute for measurement. These results hold in a benign regime where the dynamics are identifiable and only two parameters are unknown, so structural model mismatch remains the setting where randomization breadth may become more valuable. Overall, our results suggest that sim-to-real pipelines should first measure the parameters they can and reserve randomization for the uncertainty that remains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript investigates the allocation of a fixed real-robot measurement budget between system identification and domain randomization for sim-to-real transfer. Using a controlled sim-to-sim pendulum with a hidden-parameter model and exactly two unknown parameters, the authors sweep the number of identification rollouts against the width of the randomization distribution across varying reality gaps and noise levels. They report that identification rollouts close most of the transfer gap, that policies trained at the estimated parameters outperform those trained over widened randomization bands once any real data is available, and that broad randomization containing the true system does not substitute for measurement. All claims are explicitly scoped to the benign regime of identifiable dynamics without structural mismatch.
Significance. If the empirical findings hold, the work supplies actionable guidance for sim-to-real practice by showing that measurement should be prioritized for identifiable parameters and randomization reserved for residual uncertainty. The controlled sim-to-sim design with direct sweeps isolates the tradeoff cleanly and yields falsifiable predictions for similar low-dimensional identifiable systems; this experimental clarity is a strength. The manuscript appropriately qualifies its scope rather than overclaiming generality.
major comments (1)
- [§4 (Experiments)] §4 (Experiments) and abstract: the description of the controlled sweep provides no information on the number of independent trials, statistical tests, variance estimation, or exact definition of the 'transfer gap' metric. Without these details it is not possible to assess whether the data robustly support the central claim that 'a small number of identification rollouts closed most of the transfer gap.'
minor comments (2)
- The abstract states results hold 'across the reality gaps and noise levels we tested' but does not name the specific gap magnitudes or noise variances; adding these values would improve reproducibility.
- [Methods] Notation for the randomization distribution width and the estimated-parameter policy is introduced without an explicit equation or table reference in the methods; a short definitional equation would aid clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive comment on experimental reporting. We agree that additional details are needed and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§4 (Experiments)] §4 (Experiments) and abstract: the description of the controlled sweep provides no information on the number of independent trials, statistical tests, variance estimation, or exact definition of the 'transfer gap' metric. Without these details it is not possible to assess whether the data robustly support the central claim that 'a small number of identification rollouts closed most of the transfer gap.'
Authors: We agree that the manuscript currently omits explicit information on the number of independent trials, statistical tests, variance estimation, and the precise definition of the transfer gap metric. These details are required for readers to evaluate the robustness of the reported trends. In the revised version we will expand §4 to state the number of independent trials per configuration, describe how variance was estimated across runs, note whether any statistical tests were applied, and provide the exact definition of the transfer gap metric used to generate the figures. The abstract will be updated to reference the added experimental rigor if space allows. These changes will directly address the concern while preserving the scope and conclusions of the work. revision: yes
Circularity Check
No significant circularity
full rationale
This is an empirical experimental study that directly compares policy performance across sweeps of identification rollouts versus randomization width in a controlled sim-to-sim pendulum. No mathematical derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text or abstract. The central claims are scoped to the tested regime with explicit qualifications about identifiability and structural mismatch, and results are grounded in direct experimental comparisons rather than any reduction of outputs to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Pendulum dynamics with two unknown parameters are identifiable from rollouts.
Reference graph
Works this paper leans on
-
[1]
Domain randomization for transferring deep neural networks from simulation to the real world,
J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2017, pp. 23–30
2017
-
[2]
Sim-to- real transfer of robotic control with dynamics randomization,
X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- real transfer of robotic control with dynamics randomization,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2018, pp. 3803–3810
2018
-
[3]
Closing the sim-to-real loop: Adapting simulation randomization with real world experience,
Y . Chebotar, A. Handa, V . Makoviychuk, M. Macklin, J. Issac, N. Ratliff, and D. Fox, “Closing the sim-to-real loop: Adapting simulation randomization with real world experience,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2019, pp. 8973–8979
2019
-
[4]
Rethinking sim2real: Lower fidelity simulation leads to higher sim2real transfer in navigation,
J. Truong, S. Chernova, and D. Batra, “Rethinking sim2real: Lower fidelity simulation leads to higher sim2real transfer in navigation,” in Proc. Conf. Robot Learn. (CoRL), 2021
2021
-
[5]
Sim-to-real transfer in deep reinforcement learning for robotics: A survey,
W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-real transfer in deep reinforcement learning for robotics: A survey,” inProc. IEEE Symp. Series Comput. Intell. (SSCI), 2020, pp. 737–744
2020
-
[6]
Data-efficient domain randomization with Bayesian optimization,
F. Muratore, C. Eilers, M. Gienger, and J. Peters, “Data-efficient domain randomization with Bayesian optimization,”IEEE Robot. Autom. Lett., vol. 6, no. 2, pp. 911–918, 2021
2021
-
[7]
DROPO: Sim-to-real transfer with offline domain randomization,
G. Tiboni, K. Arndt, and V . Kyrki, “DROPO: Sim-to-real transfer with offline domain randomization,”Robot. Auton. Syst., vol. 166, 2023
2023
-
[8]
Auto-tuned sim-to-real transfer,
Y . Du, O. Watkins, T. Darrell, P. Abbeel, and D. Pathak, “Auto-tuned sim-to-real transfer,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2021, pp. 1290–1296
2021
-
[9]
Q. Vuong, S. Vikram, H. Su, S. Gao, and H. I. Christensen, “How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?”arXiv preprint arXiv:1903.11774, 2019
Pith/arXiv arXiv 1903
-
[10]
Efficient sim-to-real transfer in reinforcement learning through domain randomization and domain adaptation,
A. Shakerimov, T. Alizadeh, and H. A. Varol, “Efficient sim-to-real transfer in reinforcement learning through domain randomization and domain adaptation,”IEEE Access, vol. 11, 2023
2023
-
[11]
Understanding domain randomization for sim-to-real transfer,
X. Chen, J. Hu, C. Jin, L. Li, and L. Wang, “Understanding domain randomization for sim-to-real transfer,” inProc. Int. Conf. Learn. Represent. (ICLR), 2022
2022
-
[12]
AdaptSim: Task- driven simulation adaptation for sim-to-real transfer,
A. Z. Ren, H. Dai, B. Burchfiel, and A. Majumdar, “AdaptSim: Task- driven simulation adaptation for sim-to-real transfer,” inProc. Conf. Robot Learn. (CoRL), 2023
2023
-
[13]
Crossing the gap: A deep dive into zero-shot sim-to-real transfer for dynamics,
E. Valassakis, Z. Ding, and E. Johns, “Crossing the gap: A deep dive into zero-shot sim-to-real transfer for dynamics,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2020, pp. 5372–5379
2020
-
[14]
Robot learning from randomized simulations: A review,
F. Muratore, F. Ramos, G. Turk, W. Yu, M. Gienger, and J. Peters, “Robot learning from randomized simulations: A review,”Front. Robot. AI, vol. 9, 2022
2022
-
[15]
Sampling-based system identification with active exploration for legged robot sim2real learning,
M. Sobanbabu, G. He, T. He, Y . Yang, and G. Shi, “Sampling-based system identification with active exploration for legged robot sim2real learning,” inProc. Conf. Robot Learn. (CoRL), 2025. [Online]. Avail- able: arXiv:2505.14266
arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.