How Should a Simulation-to-Reality Transfer Budget Be Spent?

Syed Hamzah Rizvi; Yash Vardhan Tomar

arxiv: 2606.22062 · v2 · pith:KMKKMMFYnew · submitted 2026-06-20 · 💻 cs.RO · cs.LG

How Should a Simulation-to-Reality Transfer Budget Be Spent?

Syed Hamzah Rizvi , Yash Vardhan Tomar This is my paper

Pith reviewed 2026-06-26 12:03 UTC · model grok-4.3

classification 💻 cs.RO cs.LG

keywords sim-to-real transfersystem identificationdomain randomizationmeasurement budgetpendulumrobot learningreality gap

0 comments

The pith

In pendulum sim-to-real tests, a few identification rollouts closed most of the transfer gap while broad randomization did not substitute for measurement.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the allocation of limited real-robot measurement time between system identification and domain randomization in sim-to-real transfer. Experiments in a controlled sim-to-sim pendulum setting vary the number of identification rollouts against the width of the randomization distribution. Across tested reality gaps and noise levels, measurement accounted for most performance gains. Once any real data became available, policies trained at the estimated parameters outperformed those trained over a widened randomization band. The work concludes that sim-to-real pipelines should prioritize measuring identifiable parameters and reserve randomization for what remains uncertain.

Core claim

Across the reality gaps and noise levels tested, the measurement budget did most of the work. A small number of identification rollouts closed most of the transfer gap, and once any real data was available, policies performed best when trained at the estimated parameters rather than over a widened randomization band. Broad randomization that contained the true system still did not substitute for measurement. These results hold in a benign regime where the dynamics are identifiable and only two parameters are unknown.

What carries the argument

The controlled tradeoff experiment that sweeps identification rollouts against randomization distribution width in a hidden-parameter pendulum model.

If this is right

A small number of identification rollouts closes most of the transfer gap.
Once real data is available, training at estimated parameters outperforms training over a widened band.
Broad randomization that contains the true system does not substitute for measurement.
Sim-to-real pipelines should first measure the parameters they can and reserve randomization for remaining uncertainty.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

In regimes with structural model mismatch, the relative value of randomization breadth may increase.
The same budget allocation logic could be tested on higher-dimensional robots or tasks with more parameters.
Adaptive strategies that decide rollout allocation based on early identification results might further improve efficiency.

Load-bearing premise

The dynamics are identifiable with only two unknown parameters and no structural model mismatch.

What would settle it

An experiment in the same pendulum setup but with added structural mismatch or more unknown parameters where broad randomization closes a larger fraction of the gap than identification rollouts would falsify the claim.

Figures

Figures reproduced from arXiv: 2606.22062 by Syed Hamzah Rizvi, Yash Vardhan Tomar.

**Figure 1.** Figure 1: Mean zero-shot return on the hidden system [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

read the original abstract

Simulation-to-reality transfer, often called sim-to-real transfer, is a central challenge in robot learning. Yet, the tradeoff between measuring a system more accurately and training over a broader range of simulated dynamics is still poorly understood. In this work, we focused on the allocation of real-robot measurement time between system identification and domain randomization. We studied this tradeoff in a controlled sim-to-sim pendulum setting, where a hidden-parameter model stands in for the physical robot, and the experiment sweeps identification rollouts against the width of the randomization distribution. Across the reality gaps and noise levels we tested, the measurement budget did most of the work. A small number of identification rollouts closed most of the transfer gap, and once any real data was available, policies performed best when trained at the estimated parameters rather than over a widened randomization band. Broad randomization that contained the true system still did not substitute for measurement. These results hold in a benign regime where the dynamics are identifiable and only two parameters are unknown, so structural model mismatch remains the setting where randomization breadth may become more valuable. Overall, our results suggest that sim-to-real pipelines should first measure the parameters they can and reserve randomization for the uncertainty that remains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

In this controlled sim-to-sim pendulum, a few system-ID rollouts close most of the gap and beat wider randomization once any real measurements exist.

read the letter

The main result is straightforward: when dynamics are identifiable and only two parameters are unknown, the measurement budget does the heavy lifting. A small number of identification rollouts gets policies close to the true system, and after that point training at the estimated parameters outperforms training over a wider randomization band that still contains the truth.

The paper runs a clean sweep of identification rollouts against randomization width in a hidden-parameter pendulum model. It reports that broad randomization does not substitute for measurement even when the true system lies inside the band. This is useful because it quantifies the tradeoff in a setting where the ground truth is known and controllable.

The authors are careful to scope the claim to this benign regime and note that structural model mismatch is where randomization breadth may matter more. That qualification keeps the work honest.

The obvious limit is that everything stays inside simulation with a known model structure. No real hardware, no unmodeled effects, and the abstract gives no numbers on trial counts or statistical tests. If the full paper supplies those details and shows the result is stable across noise levels, the finding is a solid data point rather than a general rule.

This is worth a serious referee for anyone who allocates real-robot time between identification and randomization. The experiment is narrow but the question is practical, and the scoping prevents overclaim. I would send it to review.

Referee Report

1 major / 2 minor

Summary. The manuscript investigates the allocation of a fixed real-robot measurement budget between system identification and domain randomization for sim-to-real transfer. Using a controlled sim-to-sim pendulum with a hidden-parameter model and exactly two unknown parameters, the authors sweep the number of identification rollouts against the width of the randomization distribution across varying reality gaps and noise levels. They report that identification rollouts close most of the transfer gap, that policies trained at the estimated parameters outperform those trained over widened randomization bands once any real data is available, and that broad randomization containing the true system does not substitute for measurement. All claims are explicitly scoped to the benign regime of identifiable dynamics without structural mismatch.

Significance. If the empirical findings hold, the work supplies actionable guidance for sim-to-real practice by showing that measurement should be prioritized for identifiable parameters and randomization reserved for residual uncertainty. The controlled sim-to-sim design with direct sweeps isolates the tradeoff cleanly and yields falsifiable predictions for similar low-dimensional identifiable systems; this experimental clarity is a strength. The manuscript appropriately qualifies its scope rather than overclaiming generality.

major comments (1)

[§4 (Experiments)] §4 (Experiments) and abstract: the description of the controlled sweep provides no information on the number of independent trials, statistical tests, variance estimation, or exact definition of the 'transfer gap' metric. Without these details it is not possible to assess whether the data robustly support the central claim that 'a small number of identification rollouts closed most of the transfer gap.'

minor comments (2)

The abstract states results hold 'across the reality gaps and noise levels we tested' but does not name the specific gap magnitudes or noise variances; adding these values would improve reproducibility.
[Methods] Notation for the randomization distribution width and the estimated-parameter policy is introduced without an explicit equation or table reference in the methods; a short definitional equation would aid clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on experimental reporting. We agree that additional details are needed and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§4 (Experiments)] §4 (Experiments) and abstract: the description of the controlled sweep provides no information on the number of independent trials, statistical tests, variance estimation, or exact definition of the 'transfer gap' metric. Without these details it is not possible to assess whether the data robustly support the central claim that 'a small number of identification rollouts closed most of the transfer gap.'

Authors: We agree that the manuscript currently omits explicit information on the number of independent trials, statistical tests, variance estimation, and the precise definition of the transfer gap metric. These details are required for readers to evaluate the robustness of the reported trends. In the revised version we will expand §4 to state the number of independent trials per configuration, describe how variance was estimated across runs, note whether any statistical tests were applied, and provide the exact definition of the transfer gap metric used to generate the figures. The abstract will be updated to reference the added experimental rigor if space allows. These changes will directly address the concern while preserving the scope and conclusions of the work. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is an empirical experimental study that directly compares policy performance across sweeps of identification rollouts versus randomization width in a controlled sim-to-sim pendulum. No mathematical derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text or abstract. The central claims are scoped to the tested regime with explicit qualifications about identifiability and structural mismatch, and results are grounded in direct experimental comparisons rather than any reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study rests on standard assumptions from robotics and system identification; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Pendulum dynamics with two unknown parameters are identifiable from rollouts.
Explicitly stated as the regime in which results hold.

pith-pipeline@v0.9.1-grok · 5743 in / 1188 out tokens · 30225 ms · 2026-06-26T12:03:54.800568+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 1 linked inside Pith

[1]

Domain randomization for transferring deep neural networks from simulation to the real world,

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2017, pp. 23–30

2017
[2]

Sim-to- real transfer of robotic control with dynamics randomization,

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- real transfer of robotic control with dynamics randomization,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2018, pp. 3803–3810

2018
[3]

Closing the sim-to-real loop: Adapting simulation randomization with real world experience,

Y . Chebotar, A. Handa, V . Makoviychuk, M. Macklin, J. Issac, N. Ratliff, and D. Fox, “Closing the sim-to-real loop: Adapting simulation randomization with real world experience,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2019, pp. 8973–8979

2019
[4]

Rethinking sim2real: Lower fidelity simulation leads to higher sim2real transfer in navigation,

J. Truong, S. Chernova, and D. Batra, “Rethinking sim2real: Lower fidelity simulation leads to higher sim2real transfer in navigation,” in Proc. Conf. Robot Learn. (CoRL), 2021

2021
[5]

Sim-to-real transfer in deep reinforcement learning for robotics: A survey,

W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-real transfer in deep reinforcement learning for robotics: A survey,” inProc. IEEE Symp. Series Comput. Intell. (SSCI), 2020, pp. 737–744

2020
[6]

Data-efficient domain randomization with Bayesian optimization,

F. Muratore, C. Eilers, M. Gienger, and J. Peters, “Data-efficient domain randomization with Bayesian optimization,”IEEE Robot. Autom. Lett., vol. 6, no. 2, pp. 911–918, 2021

2021
[7]

DROPO: Sim-to-real transfer with offline domain randomization,

G. Tiboni, K. Arndt, and V . Kyrki, “DROPO: Sim-to-real transfer with offline domain randomization,”Robot. Auton. Syst., vol. 166, 2023

2023
[8]

Auto-tuned sim-to-real transfer,

Y . Du, O. Watkins, T. Darrell, P. Abbeel, and D. Pathak, “Auto-tuned sim-to-real transfer,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2021, pp. 1290–1296

2021
[9]

How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?

Q. Vuong, S. Vikram, H. Su, S. Gao, and H. I. Christensen, “How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?”arXiv preprint arXiv:1903.11774, 2019

Pith/arXiv arXiv 1903
[10]

Efficient sim-to-real transfer in reinforcement learning through domain randomization and domain adaptation,

A. Shakerimov, T. Alizadeh, and H. A. Varol, “Efficient sim-to-real transfer in reinforcement learning through domain randomization and domain adaptation,”IEEE Access, vol. 11, 2023

2023
[11]

Understanding domain randomization for sim-to-real transfer,

X. Chen, J. Hu, C. Jin, L. Li, and L. Wang, “Understanding domain randomization for sim-to-real transfer,” inProc. Int. Conf. Learn. Represent. (ICLR), 2022

2022
[12]

AdaptSim: Task- driven simulation adaptation for sim-to-real transfer,

A. Z. Ren, H. Dai, B. Burchfiel, and A. Majumdar, “AdaptSim: Task- driven simulation adaptation for sim-to-real transfer,” inProc. Conf. Robot Learn. (CoRL), 2023

2023
[13]

Crossing the gap: A deep dive into zero-shot sim-to-real transfer for dynamics,

E. Valassakis, Z. Ding, and E. Johns, “Crossing the gap: A deep dive into zero-shot sim-to-real transfer for dynamics,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2020, pp. 5372–5379

2020
[14]

Robot learning from randomized simulations: A review,

F. Muratore, F. Ramos, G. Turk, W. Yu, M. Gienger, and J. Peters, “Robot learning from randomized simulations: A review,”Front. Robot. AI, vol. 9, 2022

2022
[15]

Sampling-based system identification with active exploration for legged robot sim2real learning,

M. Sobanbabu, G. He, T. He, Y . Yang, and G. Shi, “Sampling-based system identification with active exploration for legged robot sim2real learning,” inProc. Conf. Robot Learn. (CoRL), 2025. [Online]. Avail- able: arXiv:2505.14266

arXiv 2025

[1] [1]

Domain randomization for transferring deep neural networks from simulation to the real world,

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2017, pp. 23–30

2017

[2] [2]

Sim-to- real transfer of robotic control with dynamics randomization,

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- real transfer of robotic control with dynamics randomization,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2018, pp. 3803–3810

2018

[3] [3]

Closing the sim-to-real loop: Adapting simulation randomization with real world experience,

Y . Chebotar, A. Handa, V . Makoviychuk, M. Macklin, J. Issac, N. Ratliff, and D. Fox, “Closing the sim-to-real loop: Adapting simulation randomization with real world experience,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2019, pp. 8973–8979

2019

[4] [4]

Rethinking sim2real: Lower fidelity simulation leads to higher sim2real transfer in navigation,

J. Truong, S. Chernova, and D. Batra, “Rethinking sim2real: Lower fidelity simulation leads to higher sim2real transfer in navigation,” in Proc. Conf. Robot Learn. (CoRL), 2021

2021

[5] [5]

Sim-to-real transfer in deep reinforcement learning for robotics: A survey,

W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-real transfer in deep reinforcement learning for robotics: A survey,” inProc. IEEE Symp. Series Comput. Intell. (SSCI), 2020, pp. 737–744

2020

[6] [6]

Data-efficient domain randomization with Bayesian optimization,

F. Muratore, C. Eilers, M. Gienger, and J. Peters, “Data-efficient domain randomization with Bayesian optimization,”IEEE Robot. Autom. Lett., vol. 6, no. 2, pp. 911–918, 2021

2021

[7] [7]

DROPO: Sim-to-real transfer with offline domain randomization,

G. Tiboni, K. Arndt, and V . Kyrki, “DROPO: Sim-to-real transfer with offline domain randomization,”Robot. Auton. Syst., vol. 166, 2023

2023

[8] [8]

Auto-tuned sim-to-real transfer,

Y . Du, O. Watkins, T. Darrell, P. Abbeel, and D. Pathak, “Auto-tuned sim-to-real transfer,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2021, pp. 1290–1296

2021

[9] [9]

How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?

Q. Vuong, S. Vikram, H. Su, S. Gao, and H. I. Christensen, “How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?”arXiv preprint arXiv:1903.11774, 2019

Pith/arXiv arXiv 1903

[10] [10]

Efficient sim-to-real transfer in reinforcement learning through domain randomization and domain adaptation,

A. Shakerimov, T. Alizadeh, and H. A. Varol, “Efficient sim-to-real transfer in reinforcement learning through domain randomization and domain adaptation,”IEEE Access, vol. 11, 2023

2023

[11] [11]

Understanding domain randomization for sim-to-real transfer,

X. Chen, J. Hu, C. Jin, L. Li, and L. Wang, “Understanding domain randomization for sim-to-real transfer,” inProc. Int. Conf. Learn. Represent. (ICLR), 2022

2022

[12] [12]

AdaptSim: Task- driven simulation adaptation for sim-to-real transfer,

A. Z. Ren, H. Dai, B. Burchfiel, and A. Majumdar, “AdaptSim: Task- driven simulation adaptation for sim-to-real transfer,” inProc. Conf. Robot Learn. (CoRL), 2023

2023

[13] [13]

Crossing the gap: A deep dive into zero-shot sim-to-real transfer for dynamics,

E. Valassakis, Z. Ding, and E. Johns, “Crossing the gap: A deep dive into zero-shot sim-to-real transfer for dynamics,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2020, pp. 5372–5379

2020

[14] [14]

Robot learning from randomized simulations: A review,

F. Muratore, F. Ramos, G. Turk, W. Yu, M. Gienger, and J. Peters, “Robot learning from randomized simulations: A review,”Front. Robot. AI, vol. 9, 2022

2022

[15] [15]

Sampling-based system identification with active exploration for legged robot sim2real learning,

M. Sobanbabu, G. He, T. He, Y . Yang, and G. Shi, “Sampling-based system identification with active exploration for legged robot sim2real learning,” inProc. Conf. Robot Learn. (CoRL), 2025. [Online]. Avail- able: arXiv:2505.14266

arXiv 2025