Recognition: unknown
Abstract Sim2Real through Approximate Information States
Pith reviewed 2026-05-10 10:37 UTC · model grok-4.3
The pith
An abstract simulator can be grounded to the real world if its dynamics account for state history and are corrected with real task data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using the language of state abstraction from reinforcement learning, the paper establishes that an abstract simulator matches the target task when its grounded dynamics incorporate the history of states. A correction procedure is introduced that updates the abstract dynamics from real-world task data, after which reinforcement learning in the corrected simulator produces policies that transfer to the real world.
What carries the argument
Grounded abstract dynamics that depend on the full history of states, derived from state-abstraction formalism in RL, to compensate for details omitted by the coarse simulator.
If this is right
- Policies trained with reinforcement learning in the corrected abstract simulator transfer to the real world.
- The same correction approach improves transfer in sim2sim settings as well as sim2real settings.
- Accounting for state history in the abstract dynamics is necessary to bridge the gap created by simulator abstraction.
- Real-world data can be used directly to adjust simulator dynamics rather than to train policies from scratch.
Where Pith is reading between the lines
- This framing suggests that many existing coarse simulators could be made usable for policy transfer by adding a lightweight history-dependent correction layer.
- The amount of real data needed may be smaller than for full system identification because only the abstract mismatch must be learned.
- The approach could extend to other sequential decision problems where simulators are necessarily incomplete.
Load-bearing premise
Real-world task data is sufficient to correct the abstract dynamics accurately enough that a policy trained in the corrected simulator will transfer to the target task.
What would settle it
A policy trained in the history-corrected abstract simulator fails to transfer successfully to the real world even after the dynamics have been updated with real task data.
Figures
read the original abstract
In recent years, reinforcement learning (RL) has shown remarkable success in robotics when a fast and accurate simulator is available for a given task. When using RL and simulation, more simulator realism is generally beneficial but becomes harder to obtain as robots are deployed in increasingly complex and widescale domains. In such settings, simulators will likely fail to model all relevant details of a given target task and this observation motivates the study of sim2real with simulators that leave out key task details. In this paper, we formalize and study the abstract sim2real problem: given an abstract simulator that models a target task at a coarse level of abstraction, how can we train a policy with RL in the abstract simulator and successfully transfer it to the real-world? Our first contribution is to formalize this problem using the language of state abstraction from the RL literature. This framing shows that an abstract simulator can be grounded to match the target task if the grounded abstract dynamics take the history of states into account. Based on the formalism, we then introduce a method that uses real-world task data to correct the dynamics of the abstract simulator. We then show that this method enables successful policy transfer both in sim2sim and sim2real evaluation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper formalizes the abstract sim2real problem in robotics RL: given a coarse abstract simulator that omits key task details, how to train an RL policy in it that transfers to the real world. Using state abstraction, it shows that grounding requires history-dependent abstract dynamics. It then proposes a method to correct those dynamics from real-world task trajectories and reports successful policy transfer in both sim2sim and sim2real experiments.
Significance. If the correction procedure reliably produces history-dependent abstract dynamics that generalize beyond the collected trajectories, the work would meaningfully lower the barrier to RL in complex robotics domains by permitting the use of fast but incomplete simulators. The state-abstraction framing supplies a clean conceptual tool for analyzing simulator-reality mismatch.
major comments (2)
- [§3] §3 (Formalism): The claim that history-dependent grounding suffices to match the target task is stated but not accompanied by a bound on the residual approximation error in the information state after correction from finite real trajectories; without such a bound or a concrete counter-example analysis, it is unclear whether the formalism guarantees transfer for policies that visit states outside the support of the collected data.
- [§4 and §5] §4 (Correction method) and §5 (Experiments): The procedure that updates the abstract dynamics from real-world task data is presented as sufficient for transfer, yet the manuscript provides no ablation on data volume, bias, or coverage of policy-induced state distributions; this directly tests the weakest assumption that limited real trajectories will produce an approximate information state close enough for RL policies to transfer without post-hoc fitting.
minor comments (2)
- [Abstract] The abstract is dense and would benefit from a single illustrative diagram of the history-dependent grounding step.
- [§2] Notation for the approximate information state is introduced without an explicit comparison table to prior state-abstraction definitions (e.g., those in Li et al. or Abel et al.); adding such a table would improve readability.
Simulated Author's Rebuttal
We thank the referee for their constructive and insightful comments. The feedback identifies important gaps in the theoretical analysis and empirical validation of our approach to abstract sim2real transfer. We respond to each major comment below and indicate the planned revisions.
read point-by-point responses
-
Referee: [§3] §3 (Formalism): The claim that history-dependent grounding suffices to match the target task is stated but not accompanied by a bound on the residual approximation error in the information state after correction from finite real trajectories; without such a bound or a concrete counter-example analysis, it is unclear whether the formalism guarantees transfer for policies that visit states outside the support of the collected data.
Authors: We agree that the current formalism shows sufficiency of history-dependent abstract dynamics for recovering the target information state in the infinite-data limit but does not supply a finite-sample bound on residual error or an explicit counter-example analysis for out-of-support states. This is a genuine limitation of the theoretical development. In the revision we will expand the discussion in §3 to explicitly state the infinite-data assumption, clarify that finite-trajectory correction produces only an approximation, and include a short paragraph on potential failure modes when policies visit states outside the collected data support. Deriving a general PAC-style bound lies beyond the scope of the present work. revision: partial
-
Referee: [§4 and §5] §4 (Correction method) and §5 (Experiments): The procedure that updates the abstract dynamics from real-world task data is presented as sufficient for transfer, yet the manuscript provides no ablation on data volume, bias, or coverage of policy-induced state distributions; this directly tests the weakest assumption that limited real trajectories will produce an approximate information state close enough for RL policies to transfer without post-hoc fitting.
Authors: The referee correctly notes that our experiments report successful transfer but omit systematic ablations on real-world data volume, collection bias, and coverage of the state distributions induced by the learned policies. These omissions leave the core practical assumption under-tested. We will revise §5 to add new ablation experiments that vary the number of real trajectories used for dynamics correction, report transfer performance as a function of data volume, and include quantitative analysis of state-distribution coverage between the collected trajectories and the final policy rollouts. revision: yes
Circularity Check
No circularity in the derivation chain of abstract sim2real formalization
full rationale
The paper's abstract describes formalizing the abstract sim2real problem using state abstraction from the RL literature. This framing leads to the observation that grounded abstract dynamics should account for state history. A method is introduced to use real-world task data to correct the abstract simulator's dynamics, with evaluations showing successful policy transfer in sim2sim and sim2real settings. No equations are provided in the abstract, and no derivation chain reduces predictions or results to inputs by construction. There are no visible self-definitional elements, fitted parameters called predictions, or load-bearing self-citations. The approach is self-contained as it builds on external RL concepts and demonstrates through method and evaluation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption State abstraction concepts from the RL literature can be used to model the mismatch between abstract simulator and target task.
Reference graph
Works this paper leans on
-
[1]
Outracing champion gran turismo drivers with deep reinforcement learning,
P. R. Wurman, S. Barrett, K. Kawamoto, J. MacGlashan, K. Subrama- nian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchset al., “Outracing champion gran turismo drivers with deep reinforcement learning,”Nature, vol. 602, no. 7896, pp. 223–228, 2022
2022
-
[2]
Learning dexterous in-hand manipulation,
O. M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Rayet al., “Learning dexterous in-hand manipulation,”The International Journal of Robotics Research, vol. 39, no. 1, pp. 3–20, 2020
2020
-
[3]
Dd-ppo: Learning near-perfect pointgoal navigators from 2.5 billion frames,
E. Wijmans, A. Kadian, A. Morcos, S. Lee, I. Essa, D. Parikh, M. Savva, and D. Batra, “Dd-ppo: Learning near-perfect pointgoal navigators from 2.5 billion frames,”Arxiv Pre-print, 2019
2019
-
[4]
Learning agile and dynamic motor skills for legged robots,
J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V . Tsounis, V . Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,”Science Robotics, vol. 4, no. 26, p. eaau5872, 2019
2019
-
[5]
Sim-to-real transfer of robotic control with dynamics randomization,
X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018, pp. 3803–3810
2018
-
[6]
Comparative study of physics engines for robot simulation with mechanical interaction,
J. Yoon, B. Son, and D. Lee, “Comparative study of physics engines for robot simulation with mechanical interaction,”Applied Sciences, vol. 13, no. 2, p. 680, 2023
2023
-
[7]
From abstraction to reality: Darpa’s vision for robust sim-to-real autonomy,
E. Noorani, Z. Serlin, B. Price, and A. Velasquez, “From abstraction to reality: Darpa’s vision for robust sim-to-real autonomy,”AI Magazine, vol. 46, no. 2, p. e70015, 2025
2025
-
[8]
Rethinking sim2real: Lower fidelity simulation leads to higher sim2real transfer in navigation,
J. Truong, M. Rudolph, N. H. Yokoyama, S. Chernova, D. Batra, and A. Rai, “Rethinking sim2real: Lower fidelity simulation leads to higher sim2real transfer in navigation,” inConference on Robot Learning. PMLR, 2023, pp. 859–870
2023
-
[9]
Perspectives on Sim2Real Transfer for Robotics: A Summary of the R:SS 2020 Workshop,
S. H ¨ofer, K. Bekris, A. Handa, J. C. Gamboa, F. Golemo, M. Mozifian, C. Atkeson, D. Fox, K. Goldberg, J. Leonard, C. K. Liu, J. Peters, S. Song, P. Welinder, and M. White, “Perspectives on Sim2Real Transfer for Robotics: A Summary of the R:SS 2020 Workshop,” Dec. 2020, arXiv:2012.03806 [cs]. [Online]. Available: http://arxiv.org/abs/2012.03806
-
[10]
Multi-Robot Collaboration through Re- inforcement Learning and Abstract Simulation,
A. Labiosa and J. P. Hanna, “Multi-Robot Collaboration through Re- inforcement Learning and Abstract Simulation,” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA), May 2025
2025
-
[11]
Reinforcement Learning Within the Classical Robotics Stack: A Case Study in Robot Soccer,
A. Labiosa, Z. Wang, Agarwal, Siddhant, W. Cong, G. Hemkumar, A. N. Harish, B. Hong, J. Kelle, C. Li, Y . Li, Z. Shao, P. Stone, and J. P. Hanna, “Reinforcement Learning Within the Classical Robotics Stack: A Case Study in Robot Soccer,” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2025
2025
-
[12]
What went wrong? closing the sim-to-real gap via differentiable causal discovery,
P. Huang, X. Zhang, Z. Cao, S. Liu, M. Xu, W. Ding, J. Francis, B. Chen, and D. Zhao, “What went wrong? closing the sim-to-real gap via differentiable causal discovery,” inConference on Robot Learning. PMLR, 2023, pp. 734–760
2023
-
[13]
Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey,
W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey,” in2020 IEEE Symposium Series on Computational Intelligence (SSCI), Dec. 2020, pp. 737–744
2020
-
[14]
People construct simplified mental representations to plan,
M. K. Ho, D. Abel, C. G. Correa, M. L. Littman, J. D. Cohen, and T. L. Griffiths, “People construct simplified mental representations to plan,”Nature, vol. 606, no. 7912, pp. 129–136, Jun. 2022, number: 7912 Publisher: Nature Publishing Group. [Online]. Available: https://www.nature.com/articles/s41586-022-04743-9
2022
-
[15]
Multi-agent manipulation via locomotion using hierarchical sim2real,
O. Nachum, M. Ahn, H. Ponte, S. Gu, and V . Kumar, “Multi-agent manipulation via locomotion using hierarchical sim2real,”arXiv preprint arXiv:1908.05224, 2019
-
[16]
M. M ¨uller, A. Dosovitskiy, B. Ghanem, and V . Koltun, “Driv- ing policy transfer via modularity and abstraction,”arXiv preprint arXiv:1804.09364, 2018
-
[17]
Gridtopix: Training embodied agents with minimal super- vision,
U. Jain, I.-J. Liu, S. Lazebnik, A. Kembhavi, L. Weihs, and A. G. Schwing, “Gridtopix: Training embodied agents with minimal super- vision,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 141–15 151
2021
-
[18]
Reinforcement learning with multi-fidelity simulators,
M. Cutler, T. J. Walsh, and J. P. How, “Reinforcement learning with multi-fidelity simulators,” in2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2014, pp. 3888–3895
2014
-
[19]
Available: https://www.sciencedirect.com/science/article/ pii/0005109889900022
K. J. ˚Astr¨om and P. Eykhoff, “System identification—A survey,” Automatica, vol. 7, no. 2, pp. 123–162, Mar. 1971. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0005109871900598
-
[20]
B. Armstrong, “On finding ’exciting’ trajectories for identification experiments involving systems with non-linear dynamics,” in1987 IEEE International Conference on Robotics and Automation Proceedings, vol. 4, Mar. 1987, pp. 1131–1139. [Online]. Available: https: //ieeexplore.ieee.org/document/1087968
-
[21]
Using simulation and domain adaptation to improve efficiency of deep robotic grasping,
K. Bousmalis, A. Irpan, P. Wohlhart, Y . Bai, M. Kelcey, M. Kalakrish- nan, L. Downs, J. Ibarz, P. Pastor, K. Konoligeet al., “Using simulation and domain adaptation to improve efficiency of deep robotic grasping,” in2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018, pp. 4243–4250
2018
-
[22]
Sim-to- real transfer with neural-augmented robot simulation,
F. Golemo, A. A. Taiga, A. Courville, and P.-Y . Oudeyer, “Sim-to- real transfer with neural-augmented robot simulation,” inConference on Robot Learning. PMLR, 2018, pp. 817–828
2018
-
[23]
Grounded action transformation for sim-to-real reinforcement learning,
J. P. Hanna, S. Desai, H. Karnan, G. Warnell, and P. Stone, “Grounded action transformation for sim-to-real reinforcement learning,”Machine Learning, vol. 110, no. 9, pp. 2469–2499, 2021
2021
-
[24]
Reinforced Grounded Action Transformation for Sim-to-Real Transfer,
H. Karnan, S. Desai, J. P. Hanna, G. Warnell, and P. Stone, “Reinforced Grounded Action Transformation for Sim-to-Real Transfer,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, NV , USA: IEEE, Oct. 2020, pp. 4397–4402. [Online]. Available: https://ieeexplore.ieee.org/document/9341149/
-
[25]
A Theory of Abstraction in Reinforcement Learning,
D. Abel, “A Theory of Abstraction in Reinforcement Learning,” Mar. 2022, arXiv:2203.00397 [cs]. [Online]. Available: http://arxiv.org/abs/ 2203.00397
-
[26]
Abstraction Selection in Model-based Reinforcement Learning,
N. Jiang, A. Kulesza, and S. Singh, “Abstraction Selection in Model-based Reinforcement Learning,” inProceedings of the 32nd International Conference on Machine Learning. PMLR, Jun. 2015, pp. 179–188, iSSN: 1938-7228. [Online]. Available: https://proceedings. mlr.press/v37/jiang15.html
2015
-
[27]
Abstract Reward Processes: Leveraging State Abstraction for Consistent Off- Policy Evaluation,
S. Chaudhari, A. Deshpande, B. C. d. Silva, and P. S. Thomas, “Abstract Reward Processes: Leveraging State Abstraction for Consistent Off- Policy Evaluation,” Oct. 2024, arXiv:2410.02172 [cs]. [Online]. Available: http://arxiv.org/abs/2410.02172
-
[28]
Learning Markov State Abstractions for Deep Reinforcement Learning,
C. Allen, N. Parikh, O. Gottesman, and G. Konidaris, “Learning Markov State Abstractions for Deep Reinforcement Learning,” 2021
2021
-
[29]
Predictive representations of state,
M. Littman and R. S. Sutton, “Predictive representations of state,” Advances in neural information processing systems, vol. 14, 2001
2001
-
[30]
Deep recurrent q-learning for partially observable mdps,
M. Hausknecht and P. Stone, “Deep recurrent q-learning for partially observable mdps,” in2015 aaai fall symposium series, 2015
2015
-
[31]
Sim-to-Real: Learning Agile Locomotion For Quadruped Robots,
J. Tan, T. Zhang, E. Coumans, A. Iscen, Y . Bai, D. Hafner, S. Bohez, and V . Vanhoucke, “Sim-to-Real: Learning Agile Locomotion For Quadruped Robots,” inProceedings of Robotics: Science and Systems,
-
[32]
Sim-to-Real: Learning Agile Locomotion For Quadruped Robots
[Online]. Available: http://arxiv.org/abs/1804.10332
-
[33]
Approximate information state for approximate planning and reinforcement learning in partially observed systems,
J. Subramanian, A. Sinha, R. Seraj, and A. Mahajan, “Approximate information state for approximate planning and reinforcement learning in partially observed systems,”Journal of Machine Learning Research, vol. 23, no. 12, pp. 1–83, 2022
2022
-
[34]
On learning history-based policies for controlling markov decision processes,
G. Patil, A. Mahajan, and D. Precup, “On learning history-based policies for controlling markov decision processes,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2024, pp. 3511–3519
2024
-
[35]
BYOL-Explore: Exploration by Bootstrapped Prediction,
Z. D. Guo, S. Thakoor, M. P ˆıslar, B. A. Pires, F. Altch ´e, C. Tallec, A. Saade, D. Calandriello, J.-B. Grill, Y . Tang, M. Valko, R. Munos, M. G. Azar, and B. Piot, “BYOL-Explore: Exploration by Bootstrapped Prediction,” Jun. 2022, arXiv:2206.08332 [cs, stat]. [Online]. Available: http://arxiv.org/abs/2206.08332
-
[36]
Data-efficient reinforcement learning with self-predictive representations
M. Schwarzer, A. Anand, R. Goel, R. D. Hjelm, A. Courville, and P. Bachman, “Data-Efficient Reinforcement Learning with Self- Predictive Representations,” May 2021, arXiv:2007.05929 [cs, stat]. [Online]. Available: http://arxiv.org/abs/2007.05929
-
[37]
Rma: Rapid motor adaptation for legged robots,
A. Kumar, Z. Fu, D. Pathak, and J. Malik, “Rma: Rapid motor adaptation for legged robots,”arXiv preprint arXiv:2107.04034, 2021
-
[38]
Offline Reinforcement Learning with Implicit Q-Learning
I. Kostrikov, A. Nair, and S. Levine, “Offline reinforcement learning with implicit q-learning,”arXiv preprint arXiv:2110.06169, 2021
work page internal anchor Pith review arXiv 2021
-
[39]
D4rl: Datasets for deep data-driven reinforcement learning,
J. Fu, A. Kumar, O. Nachum, G. Tucker, and S. Levine, “D4rl: Datasets for deep data-driven reinforcement learning,”Arxiv Pre-print, 2020
2020
-
[40]
Bench- marking deep reinforcement learning for continuous control,
Y . Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Bench- marking deep reinforcement learning for continuous control,” inInter- national conference on machine learning. PMLR, 2016
2016
-
[41]
Real-world humanoid locomotion with reinforcement learning,
I. Radosavovic, T. Xiao, B. Zhang, T. Darrell, J. Malik, and K. Sreenath, “Real-world humanoid locomotion with reinforcement learning,”Sci- ence Robotics, vol. 9, no. 89, p. eadi9579, 2024
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.