Recognition: unknown
Betting for Sim-to-Real Performance Evaluation
Pith reviewed 2026-05-08 03:02 UTC · model grok-4.3
The pith
A betting mechanism yields more accurate real-world robot performance estimates than Monte Carlo sampling by constructing simulator-guided bets under specific theoretical conditions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes theoretical conditions under which a betting mechanism can yield accurate and efficient estimates of real-world robot performance, provably outperforming the Monte Carlo estimator. It characterizes how such bets should be constructed from available simulators, develops theoretically grounded yet practically implementable approximations of the ideal bet, and provides concrete decision rules that diagnose when these approximate betting strategies are working as intended. The approach is demonstrated on synthetic examples, cross-fidelity computational simulators, and an illustrative case using synthetic distributions to infer real-world pick-and-place accuracy of a robotic
What carries the argument
the betting mechanism, which constructs simulator-derived bets to produce lower-variance estimates of real-world performance than direct Monte Carlo averaging of physical trials.
If this is right
- When the stated conditions hold, betting reduces the number of physical trials needed for a target estimation accuracy compared with Monte Carlo.
- Approximate bets remain effective even without exact knowledge of the underlying distributions, provided the diagnostic rules confirm reliability.
- The same betting framework supports inference from groups of synthetic distributions to real manipulator accuracy without direct real-world sampling of that specific scenario.
- Decision rules allow users to detect and avoid cases where the betting strategy fails to deliver its promised advantage.
Where Pith is reading between the lines
- The betting perspective could be combined with existing variance-reduction techniques such as importance sampling to achieve further efficiency gains in sim-to-real settings.
- If the sim-to-real gap violates the construction assumptions, the method would revert to no better than Monte Carlo, pointing to the value of adaptive bet updating during real tests.
- The framework suggests a general template for other expensive evaluation domains where cheap simulators can be turned into informed bets rather than used only for pre-filtering.
Load-bearing premise
Theoretical conditions exist that let properly constructed bets from simulators outperform plain Monte Carlo sampling while the sim-to-real transfer assumptions remain valid.
What would settle it
A side-by-side experiment on a physical robot task with known ground-truth performance where the mean-squared error of the betting estimator exceeds the Monte Carlo estimator despite following the paper's construction and diagnostic rules.
Figures
read the original abstract
This paper studies the problem of robot performance evaluation, focusing on how to obtain accurate and efficient estimates of real-world behavior under severe constraints on physical experimentation. Such estimates are essential for benchmarking algorithms, comparing design alternatives, validating controllers, and supporting certification or regulatory decision-making, yet real-world testing with physical robots is often expensive, time-consuming, and safety-limited. To mitigate the scarcity of real-world trials, sim-to-real methodologies are commonly employed, using low-cost simulators to inform, supplement, or prioritize physical experiments. Departing from (and complementary to) existing approaches in variance reduction (e.g., importance-sampling variants) or bias-correction (e.g., through prediction-powered inference or learned control variates), we examine this performance-evaluation problem through the lens of betting. We establish theoretical conditions under which a betting mechanism can yield accurate and efficient estimates (provably outperforming the Monte Carlo estimator) and we characterize how such bets should be constructed. We further develop theoretically grounded yet practically implementable approximations of the ideal bet, and we provide concrete decision rules that diagnose when these approximate betting strategies are working as intended. We demonstrate the effectiveness of the proposed methods using both synthetic examples and cross-fidelity computational simulators. Notably, we also showcase an illustrative case in which a group of synthetic distributions are used to infer the real-world pick-and-place accuracy of a robotic manipulator, a seemingly unconventional sim-to-real transfer that becomes natural and feasible under the proposed betting perspective. Programs for reproducing empirical results are available at https://github.com/ISUSAIL/Bet4Sim2Real.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a betting mechanism for efficient estimation of real-world robot performance from limited physical trials and abundant simulator data. It derives theoretical conditions under which suitably constructed bets yield unbiased estimates that provably outperform standard Monte Carlo sampling in terms of variance, develops practical approximations to the ideal bet together with diagnostic rules for when the approximations succeed, and demonstrates the approach on synthetic distributions and cross-fidelity simulators, including an unconventional synthetic-to-real transfer for pick-and-place accuracy.
Significance. If the stated theoretical conditions and variance-reduction guarantees hold, the betting perspective supplies a principled, complementary tool to importance sampling and prediction-powered inference for sim-to-real benchmarking and certification tasks. The explicit provision of reproducible code is a clear strength that allows direct verification of the empirical claims.
major comments (2)
- [§3, Theorem 1] §3, Theorem 1: the claimed strict dominance over Monte Carlo is stated to hold under 'mild conditions on the simulator,' yet the precise measurability and integrability requirements that make the betting estimator unbiased and lower-variance are not spelled out; without them it is unclear whether the result applies to the discontinuous or heavy-tailed performance metrics typical in robotics.
- [§5.2, Eq. (18)–(20)] §5.2, Eq. (18)–(20): the practical approximation replaces the ideal bet with a learned surrogate; the paper does not quantify the bias introduced by this surrogate or provide a finite-sample bound showing that the diagnostic rule still controls type-I error when the surrogate error is non-negligible.
minor comments (3)
- [§2 and §4] Notation for the payoff function and the betting fraction is introduced in §2 but reused with different subscripts in §4; a single consolidated table of symbols would improve readability.
- [Figure 4] Figure 4 (pick-and-place results) lacks error bars on the real-world reference and does not state how many physical trials were used to obtain the ground-truth accuracy; this makes it hard to judge whether the betting estimator’s reported improvement is statistically meaningful.
- [Abstract] The abstract claims 'provably outperforming the Monte Carlo estimator,' yet the main text only shows dominance under the derived conditions; a brief caveat sentence in the abstract would align the two.
Simulated Author's Rebuttal
We thank the referee for the positive recommendation of minor revision and for the constructive comments. We address each major comment below.
read point-by-point responses
-
Referee: [§3, Theorem 1] §3, Theorem 1: the claimed strict dominance over Monte Carlo is stated to hold under 'mild conditions on the simulator,' yet the precise measurability and integrability requirements that make the betting estimator unbiased and lower-variance are not spelled out; without them it is unclear whether the result applies to the discontinuous or heavy-tailed performance metrics typical in robotics.
Authors: We agree that the assumptions underlying Theorem 1 should be stated more explicitly. The result requires the performance metric to be a measurable function with finite first and second moments under the real-world distribution, together with integrability of the likelihood ratio induced by the simulator. These conditions ensure unbiasedness of the betting estimator and allow the variance comparison. In the revised manuscript we will add a dedicated remark immediately after the theorem statement that lists these requirements and discusses their implications for common robotics metrics: discontinuous indicators (e.g., success/failure) remain admissible provided the expectation exists, while heavy-tailed distributions preserve unbiasedness but may lose the strict variance reduction if the second moment is infinite. revision: yes
-
Referee: [§5.2, Eq. (18)–(20)] §5.2, Eq. (18)–(20): the practical approximation replaces the ideal bet with a learned surrogate; the paper does not quantify the bias introduced by this surrogate or provide a finite-sample bound showing that the diagnostic rule still controls type-I error when the surrogate error is non-negligible.
Authors: The surrogate is obtained by minimizing a convex loss that approximates the ideal betting function, and the diagnostic rule monitors whether the empirical average of the surrogate bet remains close to its theoretical expectation. While we do not supply a finite-sample bound on type-I error under surrogate approximation error, the rule is constructed to be conservative and our synthetic and robotic experiments indicate that it reliably detects large deviations. In the revision we will augment §5.2 with a short analysis of the approximation bias, including a simple concentration argument under bounded surrogate error, together with practical guidance on when additional validation trials should be performed. revision: partial
Circularity Check
No significant circularity detected in derivation chain
full rationale
The paper derives theoretical conditions under which a betting mechanism yields accurate estimates provably outperforming Monte Carlo, characterizes ideal bets, develops practical approximations, and supplies diagnostic rules. These steps are supported by independent synthetic examples, cross-fidelity simulators, and an unconventional sim-to-real pick-and-place case, with external reproducible code. No load-bearing step reduces by construction to a fitted input, self-definition, or unverified self-citation chain; the central claims rest on explicit theoretical derivations and empirical validation outside the fitted values themselves.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard probabilistic assumptions underlying Monte Carlo estimation and betting mechanisms
Reference graph
Works this paper leans on
-
[1]
Intelligent driving intelligence test for autonomous vehicles with naturalistic and adversarial environment.Nature Communications, 12(1):1–14, 2021
Shuo Feng, Xintao Yan, Haowei Sun, Yiheng Feng, and Henry X Liu. Intelligent driving intelligence test for autonomous vehicles with naturalistic and adversarial environment.Nature Communications, 12(1):1–14, 2021
2021
-
[2]
Sim2real predictivity: Does evaluation in simulation predict real- world performance?IEEE Robotics and Automation Letters, 5(4):6670–6677, 2020
Abhishek Kadian, Joanne Truong, Aaron Gokaslan, Alexander Clegg, Erik Wijmans, Stefan Lee, Manolis Savva, Sonia Chernova, and Dhruv Batra. Sim2real predictivity: Does evaluation in simulation predict real- world performance?IEEE Robotics and Automation Letters, 5(4):6670–6677, 2020
2020
-
[3]
As- sessing transferability from simulation to reality for rein- forcement learning.IEEE transactions on pattern anal- ysis and machine intelligence, 43(4):1172–1183, 2019
Fabio Muratore, Michael Gienger, and Jan Peters. As- sessing transferability from simulation to reality for rein- forcement learning.IEEE transactions on pattern anal- ysis and machine intelligence, 43(4):1172–1183, 2019
2019
-
[4]
Towards standardized disturbance rejection testing of legged robot locomotion with lin- ear impactor: A preliminary study, observations, and implications
Bowen Weng, Guillermo A Castillo, Yun-Seok Kang, and Ayonga Hereid. Towards standardized disturbance rejection testing of legged robot locomotion with lin- ear impactor: A preliminary study, observations, and implications. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 9946–9952. IEEE, 2024
2024
-
[5]
Real-time sampling-based safe motion planning for robotic manipulators in dynamic environments,
Bowen Weng, Linda Capito, Guillermo A. Castillo, and Dylan Khor. Rethink Repeatable Measures of Robot Performance with Statistical Query.IEEE Transactions on Robotics, 42:561–578, 2025. doi: 10.1109/TRO.2025. 3645934
-
[6]
Deep rein- forcement learning at the edge of the statistical precipice
Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron C Courville, and Marc Bellemare. Deep rein- forcement learning at the edge of the statistical precipice. Advances in neural information processing systems, 34: 29304–29320, 2021
2021
-
[7]
Benchmarking deep reinforcement learn- ing for continuous control
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learn- ing for continuous control. InInternational conference on machine learning, pages 1329–1338. PMLR, 2016
2016
-
[8]
Robot learning as an empirical science: Best practices for policy evaluation, 2024
Hadas Kress-Gazit, Kunimatsu Hashimoto, Naveen Kup- puswamy, Paarth Shah, Phoebe Horgan, Gordon Richard- son, Siyuan Feng, and Benjamin Burchfiel. Robot learn- ing as an empirical science: Best practices for policy evaluation.arXiv preprint arXiv:2409.09491, 2024
-
[9]
On the comparability and optimal aggressiveness of the adversarial scenario-based safety testing of robots.IEEE Transactions on Robotics, 39(4): 3299–3318, 2023
Bowen Weng, Guillermo A Castillo, Wei Zhang, and Ayonga Hereid. On the comparability and optimal aggressiveness of the adversarial scenario-based safety testing of robots.IEEE Transactions on Robotics, 39(4): 3299–3318, 2023
2023
-
[10]
Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability?Transportation research part A: policy and practice, 94:182–193, 2016
Nidhi Kalra and Susan M Paddock. Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability?Transportation research part A: policy and practice, 94:182–193, 2016
2016
-
[11]
Dense reinforcement learning for safety validation of autonomous vehicles.Nature, 615(7953):620–627, 2023
Shuo Feng, Haowei Sun, Xintao Yan, Haojie Zhu, Zhengxia Zou, Shengyin Shen, and Henry X Liu. Dense reinforcement learning for safety validation of autonomous vehicles.Nature, 615(7953):620–627, 2023
2023
-
[12]
Performance evaluation of manipulators from a kinematic viewpoint.NBS Special Publication, 459:39–62, 1976
Bernard Roth. Performance evaluation of manipulators from a kinematic viewpoint.NBS Special Publication, 459:39–62, 1976
1976
-
[13]
How generalizable is my behavior cloning policy? a statistical approach to trustworthy performance evaluation.IEEE Robotics and Automation Letters, 2024
Joseph A Vincent, Haruki Nishimura, Masha Itkina, Paarth Shah, Mac Schwager, and Thomas Kollar. How generalizable is my behavior cloning policy? a statistical approach to trustworthy performance evaluation.IEEE Robotics and Automation Letters, 2024
2024
-
[14]
C ´edric Colas, Olivier Sigaud, and Pierre-Yves Oudeyer. A hitchhiker’s guide to statistical comparisons of reinforcement learning algorithms.arXiv preprint arXiv:1904.06979, 2019
-
[15]
ANSI/RIA R15.05: Industrial Robots and Robot Systems – Performance Characteristics, 1992
American National Standards Institute/Robotic Industries Association. ANSI/RIA R15.05: Industrial Robots and Robot Systems – Performance Characteristics, 1992
1992
-
[16]
ISO 9283: Manipulating Industrial Robots – Performance Criteria and Related Test Methods, 1998
International Organization for Standardization. ISO 9283: Manipulating Industrial Robots – Performance Criteria and Related Test Methods, 1998
1998
-
[17]
van Ratingen
Michiel R. van Ratingen. The Euro NCAP safety rating. In Alexander Piskun, editor,Karosseriebautage Hamburg 2017, pages 11–20, Wiesbaden, 2017. Springer Fachmedien Wiesbaden. ISBN 978-3-658-18107-9
2017
-
[18]
MIT press Cam- bridge, 1998
Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cam- bridge, 1998
1998
-
[19]
Sim-to-real transfer of robotic control with dynamics randomization
Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In2018 IEEE international conference on robotics and automa- tion (ICRA), pages 3803–3810. IEEE, 2018
2018
-
[20]
Passivity-based full- body force control for humanoids and application to dynamic balancing and locomotion
SangHo Hyon and Gordon Cheng. Passivity-based full- body force control for humanoids and application to dynamic balancing and locomotion. In2006 IEEE/RSJ International Conference on Intelligent Robots and Sys- tems, pages 4915–4922. IEEE, 2006
2006
-
[21]
Cambridge university press, 2004
Stephen Boyd and Lieven Vandenberghe.Convex opti- mization. Cambridge university press, 2004
2004
-
[22]
The monte carlo method.Journal of the American statistical asso- ciation, 44(247):335–341, 1949
Nicholas Metropolis and Stanislaw Ulam. The monte carlo method.Journal of the American statistical asso- ciation, 44(247):335–341, 1949
1949
-
[23]
Equa- tion of state calculations by fast computing machines
Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller. Equa- tion of state calculations by fast computing machines. The journal of chemical physics, 21(6):1087–1092, 1953
1953
-
[24]
Monte Carlo sampling methods using Markov chains and their applications.Biometrika, 57(1): 97–109, 1970
W Keith Hastings. Monte Carlo sampling methods using Markov chains and their applications.Biometrika, 57(1): 97–109, 1970
1970
-
[25]
Monte carlo methods.Ltd., London, 40:32, 1964
JM Hammersley and DC Handscomb. Monte carlo methods.Ltd., London, 40:32, 1964
1964
-
[26]
Curse of rarity for autonomous vehicles.nature communications, 15(1): 4808, 2024
Henry X Liu and Shuo Feng. Curse of rarity for autonomous vehicles.nature communications, 15(1): 4808, 2024
2024
-
[27]
A study on challenges of testing robotic systems
Afsoon Afzal, Claire Le Goues, Michael Hilton, and Christopher Steven Timperley. A study on challenges of testing robotic systems. In2020 IEEE 13th inter- national conference on software testing, validation and verification (ICST), pages 96–107. IEEE, 2020
2020
-
[28]
Challenges in autonomous vehicle testing and validation.SAE Inter- national Journal of Transportation Safety, 4(1):15–24, 2016
Philip Koopman and Michael Wagner. Challenges in autonomous vehicle testing and validation.SAE Inter- national Journal of Transportation Safety, 4(1):15–24, 2016
2016
-
[29]
Rare-event simula- tion
Søren Asmussen and Peter W Glynn. Rare-event simula- tion. InStochastic Simulation: Algorithms and Analysis, pages 158–205. Springer, 2007
2007
-
[30]
Estimation of particle transmission by random sampling.National Bureau of Standards applied mathematics series, 12:27– 30, 1951
Herman Kahn and Theodore E Harris. Estimation of particle transmission by random sampling.National Bureau of Standards applied mathematics series, 12:27– 30, 1951
1951
-
[31]
Springer, 2007
Søren Asmussen and Peter W Glynn.Stochastic sim- ulation: algorithms and analysis, volume 57. Springer, 2007
2007
-
[32]
Scalable end-to- end autonomous vehicle testing via rare-event simulation
Matthew O’Kelly, Aman Sinha, Hongseok Namkoong, Russ Tedrake, and John C Duchi. Scalable end-to- end autonomous vehicle testing via rare-event simulation. Advances in neural information processing systems, 31, 2018
2018
-
[33]
The sample size required in importance sampling.The Annals of Applied Probability, 28(2):1099–1135, 2018
Sourav Chatterjee and Persi Diaconis. The sample size required in importance sampling.The Annals of Applied Probability, 28(2):1099–1135, 2018
2018
-
[34]
Adaptive stress testing for autonomous vehicles
Mark Koren, Saud Alsaif, Ritchie Lee, and Mykel J Kochenderfer. Adaptive stress testing for autonomous vehicles. In2018 IEEE Intelligent Vehicles Symposium (IV), pages 1–7. IEEE, 2018
2018
-
[35]
Closing the sim-to-real loop: Adapting simulation randomization with real world experience
Yevgen Chebotar, Ankur Handa, Viktor Makoviychuk, Miles Macklin, Jan Issac, Nathan Ratliff, and Dieter Fox. Closing the sim-to-real loop: Adapting simulation randomization with real world experience. In2019 International Conference on Robotics and Automation (ICRA), pages 8973–8979. IEEE, 2019
2019
-
[36]
Prediction- powered inference.Science, 382(6671):669–674, 2023
Anastasios N Angelopoulos, Stephen Bates, Clara Fan- njiang, Michael I Jordan, and Tijana Zrnic. Prediction- powered inference.Science, 382(6671):669–674, 2023
2023
-
[37]
Apurva Badithela, David Snyder, Lihan Zha, Joseph Mikhail, Matthew O’Kelly, Anushri Dixit, and Anirudha Majumdar. Reliable and scalable robot policy eval- uation with imperfect simulators.arXiv preprint arXiv:2510.04354, 2025
-
[38]
Black box variational inference
Rajesh Ranganath, Sean Gerrish, and David Blei. Black box variational inference. InArtificial intelligence and statistics, pages 814–822. PMLR, 2014
2014
-
[39]
Sim2Val: Leveraging Correlation Across Test Platforms for Variance-Reduced Metric Estimation
Rachel Luo, Heng Yang, Michael Watson, Apoorva Sharma, Sushant Veer, Edward Schmerling, and Marco Pavone. Sim2Val: Leveraging Correlation Across Test Platforms for Variance-Reduced Metric Estimation. arXiv preprint arXiv:2506.20553, 2025
-
[40]
Domain ran- domization for transferring deep neural networks from simulation to the real world
Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain ran- domization for transferring deep neural networks from simulation to the real world. In2017 IEEE/RSJ in- ternational conference on intelligent robots and systems (IROS), pages 23–30. IEEE, 2017
2017
-
[41]
Sim-to-Real: Learning Agile Locomotion for Quadruped Robots
Jie Tan, Tingnan Zhang, Erwin Coumans, Atil Iscen, Yunfei Bai, Danijar Hafner, Steven Bohez, and Vincent Vanhoucke. Sim-to-Real: Learning Agile Locomotion for Quadruped Robots. InRobotics: Science and Systems, 2018
2018
-
[42]
Data-efficient domain randomization with bayesian optimization.IEEE Robotics and Automation Letters, 6(2):911–918, 2021
Fabio Muratore, Christian Eilers, Michael Gienger, and Jan Peters. Data-efficient domain randomization with bayesian optimization.IEEE Robotics and Automation Letters, 6(2):911–918, 2021
2021
-
[43]
Solving Rubik's Cube with a Robot Hand
Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, et al. Solving rubik’s cube with a robot hand.arXiv preprint arXiv:1910.07113, 2019
work page internal anchor Pith review arXiv 1910
-
[44]
Col- lision avoidance and navigation for a quadrotor swarm using end-to-end deep reinforcement learning
Zhehui Huang, Zhaojing Yang, Rahul Krupani, Baskın S ¸enbas ¸lar, Sumeet Batra, and Gaurav S Sukhatme. Col- lision avoidance and navigation for a quadrotor swarm using end-to-end deep reinforcement learning. In2024 IEEE International Conference on Robotics and Automa- tion (ICRA), pages 300–306. IEEE, 2024
2024
-
[45]
A survey on transfer learning, author=Pan, Sinno Jialin and Yang, Qiang.IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009
2009
-
[46]
Robust adversarial reinforcement learn- ing
Lerrel Pinto, James Davidson, Rahul Sukthankar, and Abhinav Gupta. Robust adversarial reinforcement learn- ing. InInternational conference on machine learning, pages 2817–2826. PMLR, 2017
2017
-
[47]
Using simulation to improve sample-efficiency of Bayesian optimization for bipedal robots.Journal of machine learning research, 20(49): 1–24, 2019
Akshara Rai, Rika Antonova, Franziska Meier, and Christopher G Atkeson. Using simulation to improve sample-efficiency of Bayesian optimization for bipedal robots.Journal of machine learning research, 20(49): 1–24, 2019
2019
-
[48]
Time-uniform Chernoff bounds via nonnegative supermartingales.Probability Surveys, 17: 257–317, 2020
Steven R Howard, Aaditya Ramdas, Jon McAuliffe, and Jasjeet Sekhon. Time-uniform Chernoff bounds via nonnegative supermartingales.Probability Surveys, 17: 257–317, 2020
2020
-
[49]
A new interpretation of information rate
John L Kelly. A new interpretation of information rate. the bell system technical journal, 35(4):917–926, 1956
1956
-
[50]
Portfolio choice and the Kelly crite- rion
Edward O Thorp. Portfolio choice and the Kelly crite- rion. InStochastic optimization models in finance, pages 599–619. Elsevier, 1975
1975
-
[51]
Understanding the Kelly criterion
Edward O Thorp. Understanding the Kelly criterion. In The Kelly capital growth investment criterion: theory and practice, pages 509–523. World Scientific, 2011
2011
-
[52]
The Kelly crite- rion and the stock market.The American Mathematical Monthly, 99(10):922–931, 1992
Louis M Rotando and Edward O Thorp. The Kelly crite- rion and the stock market.The American Mathematical Monthly, 99(10):922–931, 1992
1992
-
[53]
John Wiley & Sons, 1999
Thomas M Cover.Elements of information theory. John Wiley & Sons, 1999
1999
-
[54]
Growth versus security in dynamic investment analysis.Management Science, 38(11):1562–1585, 1992
Leonard C MacLean, William T Ziemba, and George Blazenko. Growth versus security in dynamic investment analysis.Management Science, 38(11):1562–1585, 1992
1992
-
[55]
Good and bad properties of the Kelly criterion.The Best of Wilmott, page 65, 2006
Bill Ziemba. Good and bad properties of the Kelly criterion.The Best of Wilmott, page 65, 2006
2006
-
[56]
Universal portfolios.Mathematical finance, 1(1):1–29, 1991
Thomas M Cover. Universal portfolios.Mathematical finance, 1(1):1–29, 1991
1991
-
[57]
Universal port- folios with side information.IEEE Transactions on Information Theory, 42(2):348–363, 2002
Thomas M Cover and Erik Ordentlich. Universal port- folios with side information.IEEE Transactions on Information Theory, 42(2):348–363, 2002
2002
-
[58]
Asymptotic optimality and asymptotic equipartition properties of log- optimum investment.The Annals of Probability, pages 876–898, 1988
Paul H Algoet and Thomas M Cover. Asymptotic optimality and asymptotic equipartition properties of log- optimum investment.The Annals of Probability, pages 876–898, 1988
1988
-
[59]
The weighted majority algorithm.Information and computation, 108 (2):212–261, 1994
Nick Littlestone and Manfred K Warmuth. The weighted majority algorithm.Information and computation, 108 (2):212–261, 1994
1994
-
[60]
Cambridge university press, 2006
Nicolo Cesa-Bianchi and G ´abor Lugosi.Prediction, learning, and games. Cambridge university press, 2006
2006
-
[61]
A decision-theoretic generalization of on-line learning and an application to boosting.Journal of computer and system sciences, 55 (1):119–139, 1997
Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting.Journal of computer and system sciences, 55 (1):119–139, 1997
1997
-
[62]
Strictly proper scoring rules, prediction, and estimation.Journal of the American Statistical Association, 102(477):359–378, 2007
Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation.Journal of the American Statistical Association, 102(477):359–378, 2007
2007
-
[63]
A game of prediction with expert advice
Vladimir G V ovk. A game of prediction with expert advice. InProceedings of the eighth annual conference on Computational learning theory, pages 51–60, 1995
1995
-
[64]
Game-theoretic statistics and safe anytime- valid inference.Statistical Science, 38(4):576–601, 2023
Aaditya Ramdas, Peter Gr ¨unwald, Vladimir V ovk, and Glenn Shafer. Game-theoretic statistics and safe anytime- valid inference.Statistical Science, 38(4):576–601, 2023
2023
-
[65]
ISO 18646: Robots and Robotic Devices – Performance Cri- teria and Related Test Methods for Service Robots, 2016
International Organization for Standardization. ISO 18646: Robots and Robotic Devices – Performance Cri- teria and Related Test Methods for Service Robots, 2016
2016
-
[66]
SO-ARM100: Open-Source Robotic Arm Platform
The Robot Studio. SO-ARM100: Open-Source Robotic Arm Platform. https://github.com/TheRobotStudio/ SO-ARM100, 2024. Accessed: 2025-01-XX
2024
-
[67]
Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn
Tony Z. Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning Fine-Grained Bimanual Ma- nipulation with Low-Cost Hardware. InProceedings of Robotics: Science and Systems, Daegu, Republic of Korea, July 2023. doi: 10.15607/RSS.2023.XIX.016
-
[68]
Unitree RL Gym
Unitree Robotics. Unitree RL Gym. https://github.com/ unitreerobotics/unitree rl gym, 2024
2024
-
[69]
arXiv preprint arXiv:2509.10771 , year=
Clemens Schwarke, Mayank Mittal, Nikita Rudin, David Hoeller, and Marco Hutter. RSL-RL: A learning library for robotics research.arXiv preprint arXiv:2509.10771, 2025
-
[70]
Betting for Sim-to-Real Performance Evaluation
Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012. Supplementary Material:“Betting for Sim-to-Real Performance Evaluation” This document supplements the paper titled “Betting for Sim-to-Real Performance Evalua...
2012
-
[71]
Nevertheless, this allocation reflects the most balanced and fair use of the available simulator budget for our comparison
The sim-real pairwise testing required by SureSim may be limited by the relatively small number of samples available in our setting (30 samples), and the additional sim-only samples (20 samples) may also be insufficient to fully realize its potential (as mentioned above). Nevertheless, this allocation reflects the most balanced and fair use of the availab...
-
[72]
SureSim involves a larger number of hyperparameters that may require careful tuning; in our reproduction, we did not perform extensive hyperparameter optimization
-
[73]
From this perspective, direct point- estimate comparison may not fully reflect its intended use, though it remains the most practical basis for comparison in our setting
A primary strength of SureSim (and PPI-based methods more broadly) lies not in producing the most accurate point estimate of the mean, but in providing confidence intervals with guaranteed coverage. From this perspective, direct point- estimate comparison may not fully reflect its intended use, though it remains the most practical basis for comparison in ...
-
[74]
SureSim is primarily designed around a single simulator and relies on correlation-based adjustments, whereas the proposed Kelly-style betting variants naturally accommodate and benefit from a diverse bank of simulators
-
[75]
theoretically
The two approaches are not mutually exclusive. As discussed in the paper, PPI-style bias correction and betting-based variance reduction address complementary aspects of the sim-to-real inference problem and could potentially be combined in future work. B. Comparisons with IS The practical implementation of IS (importance sampling) is highly case-specific...
-
[76]
The zero-variance guarantee is asymptotic: while variance vanishes asn→ ∞, both bias and variance can remain non-negligible for practical budgets (heren≤300)
The self-normalized IS estimator (3) is biased at finite sample sizes, even whenq=q ∗. The zero-variance guarantee is asymptotic: while variance vanishes asn→ ∞, both bias and variance can remain non-negligible for practical budgets (heren≤300)
-
[77]
No edge” simply means no useful predictive signal. •Wealth: “Wealth
Unlike IS, which draws samples from a fixed proposal, Kelly betting is sequential and adaptive. This adaptivity allows it to incorporate early outcomes and progressively allocate weight toward uncertainty reduction. As discussed in the main paper, the proposed Kelly-style betting mechanism is not intended to replace IS or debiasing methods such as PPI. Ra...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.