pith. sign in

arxiv: 2606.22062 · v2 · pith:KMKKMMFYnew · submitted 2026-06-20 · 💻 cs.RO · cs.LG

How Should a Simulation-to-Reality Transfer Budget Be Spent?

Pith reviewed 2026-06-26 12:03 UTC · model grok-4.3

classification 💻 cs.RO cs.LG
keywords sim-to-real transfersystem identificationdomain randomizationmeasurement budgetpendulumrobot learningreality gap
0
0 comments X

The pith

In pendulum sim-to-real tests, a few identification rollouts closed most of the transfer gap while broad randomization did not substitute for measurement.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the allocation of limited real-robot measurement time between system identification and domain randomization in sim-to-real transfer. Experiments in a controlled sim-to-sim pendulum setting vary the number of identification rollouts against the width of the randomization distribution. Across tested reality gaps and noise levels, measurement accounted for most performance gains. Once any real data became available, policies trained at the estimated parameters outperformed those trained over a widened randomization band. The work concludes that sim-to-real pipelines should prioritize measuring identifiable parameters and reserve randomization for what remains uncertain.

Core claim

Across the reality gaps and noise levels tested, the measurement budget did most of the work. A small number of identification rollouts closed most of the transfer gap, and once any real data was available, policies performed best when trained at the estimated parameters rather than over a widened randomization band. Broad randomization that contained the true system still did not substitute for measurement. These results hold in a benign regime where the dynamics are identifiable and only two parameters are unknown.

What carries the argument

The controlled tradeoff experiment that sweeps identification rollouts against randomization distribution width in a hidden-parameter pendulum model.

If this is right

  • A small number of identification rollouts closes most of the transfer gap.
  • Once real data is available, training at estimated parameters outperforms training over a widened band.
  • Broad randomization that contains the true system does not substitute for measurement.
  • Sim-to-real pipelines should first measure the parameters they can and reserve randomization for remaining uncertainty.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • In regimes with structural model mismatch, the relative value of randomization breadth may increase.
  • The same budget allocation logic could be tested on higher-dimensional robots or tasks with more parameters.
  • Adaptive strategies that decide rollout allocation based on early identification results might further improve efficiency.

Load-bearing premise

The dynamics are identifiable with only two unknown parameters and no structural model mismatch.

What would settle it

An experiment in the same pendulum setup but with added structural mismatch or more unknown parameters where broad randomization closes a larger fraction of the gap than identification rollouts would falsify the claim.

Figures

Figures reproduced from arXiv: 2606.22062 by Syed Hamzah Rizvi, Yash Vardhan Tomar.

Figure 2
Figure 2. Figure 2: Best mean zero-shot return at each budget, with the chosen width [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 1
Figure 1. Figure 1: Mean zero-shot return on the hidden system [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

Simulation-to-reality transfer, often called sim-to-real transfer, is a central challenge in robot learning. Yet, the tradeoff between measuring a system more accurately and training over a broader range of simulated dynamics is still poorly understood. In this work, we focused on the allocation of real-robot measurement time between system identification and domain randomization. We studied this tradeoff in a controlled sim-to-sim pendulum setting, where a hidden-parameter model stands in for the physical robot, and the experiment sweeps identification rollouts against the width of the randomization distribution. Across the reality gaps and noise levels we tested, the measurement budget did most of the work. A small number of identification rollouts closed most of the transfer gap, and once any real data was available, policies performed best when trained at the estimated parameters rather than over a widened randomization band. Broad randomization that contained the true system still did not substitute for measurement. These results hold in a benign regime where the dynamics are identifiable and only two parameters are unknown, so structural model mismatch remains the setting where randomization breadth may become more valuable. Overall, our results suggest that sim-to-real pipelines should first measure the parameters they can and reserve randomization for the uncertainty that remains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript investigates the allocation of a fixed real-robot measurement budget between system identification and domain randomization for sim-to-real transfer. Using a controlled sim-to-sim pendulum with a hidden-parameter model and exactly two unknown parameters, the authors sweep the number of identification rollouts against the width of the randomization distribution across varying reality gaps and noise levels. They report that identification rollouts close most of the transfer gap, that policies trained at the estimated parameters outperform those trained over widened randomization bands once any real data is available, and that broad randomization containing the true system does not substitute for measurement. All claims are explicitly scoped to the benign regime of identifiable dynamics without structural mismatch.

Significance. If the empirical findings hold, the work supplies actionable guidance for sim-to-real practice by showing that measurement should be prioritized for identifiable parameters and randomization reserved for residual uncertainty. The controlled sim-to-sim design with direct sweeps isolates the tradeoff cleanly and yields falsifiable predictions for similar low-dimensional identifiable systems; this experimental clarity is a strength. The manuscript appropriately qualifies its scope rather than overclaiming generality.

major comments (1)
  1. [§4 (Experiments)] §4 (Experiments) and abstract: the description of the controlled sweep provides no information on the number of independent trials, statistical tests, variance estimation, or exact definition of the 'transfer gap' metric. Without these details it is not possible to assess whether the data robustly support the central claim that 'a small number of identification rollouts closed most of the transfer gap.'
minor comments (2)
  1. The abstract states results hold 'across the reality gaps and noise levels we tested' but does not name the specific gap magnitudes or noise variances; adding these values would improve reproducibility.
  2. [Methods] Notation for the randomization distribution width and the estimated-parameter policy is introduced without an explicit equation or table reference in the methods; a short definitional equation would aid clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on experimental reporting. We agree that additional details are needed and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [§4 (Experiments)] §4 (Experiments) and abstract: the description of the controlled sweep provides no information on the number of independent trials, statistical tests, variance estimation, or exact definition of the 'transfer gap' metric. Without these details it is not possible to assess whether the data robustly support the central claim that 'a small number of identification rollouts closed most of the transfer gap.'

    Authors: We agree that the manuscript currently omits explicit information on the number of independent trials, statistical tests, variance estimation, and the precise definition of the transfer gap metric. These details are required for readers to evaluate the robustness of the reported trends. In the revised version we will expand §4 to state the number of independent trials per configuration, describe how variance was estimated across runs, note whether any statistical tests were applied, and provide the exact definition of the transfer gap metric used to generate the figures. The abstract will be updated to reference the added experimental rigor if space allows. These changes will directly address the concern while preserving the scope and conclusions of the work. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is an empirical experimental study that directly compares policy performance across sweeps of identification rollouts versus randomization width in a controlled sim-to-sim pendulum. No mathematical derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text or abstract. The central claims are scoped to the tested regime with explicit qualifications about identifiability and structural mismatch, and results are grounded in direct experimental comparisons rather than any reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study rests on standard assumptions from robotics and system identification; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Pendulum dynamics with two unknown parameters are identifiable from rollouts.
    Explicitly stated as the regime in which results hold.

pith-pipeline@v0.9.1-grok · 5743 in / 1188 out tokens · 30225 ms · 2026-06-26T12:03:54.800568+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 1 linked inside Pith

  1. [1]

    Domain randomization for transferring deep neural networks from simulation to the real world,

    J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2017, pp. 23–30

  2. [2]

    Sim-to- real transfer of robotic control with dynamics randomization,

    X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- real transfer of robotic control with dynamics randomization,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2018, pp. 3803–3810

  3. [3]

    Closing the sim-to-real loop: Adapting simulation randomization with real world experience,

    Y . Chebotar, A. Handa, V . Makoviychuk, M. Macklin, J. Issac, N. Ratliff, and D. Fox, “Closing the sim-to-real loop: Adapting simulation randomization with real world experience,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2019, pp. 8973–8979

  4. [4]

    Rethinking sim2real: Lower fidelity simulation leads to higher sim2real transfer in navigation,

    J. Truong, S. Chernova, and D. Batra, “Rethinking sim2real: Lower fidelity simulation leads to higher sim2real transfer in navigation,” in Proc. Conf. Robot Learn. (CoRL), 2021

  5. [5]

    Sim-to-real transfer in deep reinforcement learning for robotics: A survey,

    W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-real transfer in deep reinforcement learning for robotics: A survey,” inProc. IEEE Symp. Series Comput. Intell. (SSCI), 2020, pp. 737–744

  6. [6]

    Data-efficient domain randomization with Bayesian optimization,

    F. Muratore, C. Eilers, M. Gienger, and J. Peters, “Data-efficient domain randomization with Bayesian optimization,”IEEE Robot. Autom. Lett., vol. 6, no. 2, pp. 911–918, 2021

  7. [7]

    DROPO: Sim-to-real transfer with offline domain randomization,

    G. Tiboni, K. Arndt, and V . Kyrki, “DROPO: Sim-to-real transfer with offline domain randomization,”Robot. Auton. Syst., vol. 166, 2023

  8. [8]

    Auto-tuned sim-to-real transfer,

    Y . Du, O. Watkins, T. Darrell, P. Abbeel, and D. Pathak, “Auto-tuned sim-to-real transfer,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2021, pp. 1290–1296

  9. [9]

    How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?

    Q. Vuong, S. Vikram, H. Su, S. Gao, and H. I. Christensen, “How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?”arXiv preprint arXiv:1903.11774, 2019

  10. [10]

    Efficient sim-to-real transfer in reinforcement learning through domain randomization and domain adaptation,

    A. Shakerimov, T. Alizadeh, and H. A. Varol, “Efficient sim-to-real transfer in reinforcement learning through domain randomization and domain adaptation,”IEEE Access, vol. 11, 2023

  11. [11]

    Understanding domain randomization for sim-to-real transfer,

    X. Chen, J. Hu, C. Jin, L. Li, and L. Wang, “Understanding domain randomization for sim-to-real transfer,” inProc. Int. Conf. Learn. Represent. (ICLR), 2022

  12. [12]

    AdaptSim: Task- driven simulation adaptation for sim-to-real transfer,

    A. Z. Ren, H. Dai, B. Burchfiel, and A. Majumdar, “AdaptSim: Task- driven simulation adaptation for sim-to-real transfer,” inProc. Conf. Robot Learn. (CoRL), 2023

  13. [13]

    Crossing the gap: A deep dive into zero-shot sim-to-real transfer for dynamics,

    E. Valassakis, Z. Ding, and E. Johns, “Crossing the gap: A deep dive into zero-shot sim-to-real transfer for dynamics,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2020, pp. 5372–5379

  14. [14]

    Robot learning from randomized simulations: A review,

    F. Muratore, F. Ramos, G. Turk, W. Yu, M. Gienger, and J. Peters, “Robot learning from randomized simulations: A review,”Front. Robot. AI, vol. 9, 2022

  15. [15]

    Sampling-based system identification with active exploration for legged robot sim2real learning,

    M. Sobanbabu, G. He, T. He, Y . Yang, and G. Shi, “Sampling-based system identification with active exploration for legged robot sim2real learning,” inProc. Conf. Robot Learn. (CoRL), 2025. [Online]. Avail- able: arXiv:2505.14266