pith. machine review for the scientific record. sign in

arxiv: 2605.12084 · v1 · submitted 2026-05-12 · 💻 cs.RO · cs.AI· cs.IT· cs.LG· cs.SY· eess.SY· math.IT

Recognition: 1 theorem link

· Lean Theorem

Learning What Matters: Adaptive Information-Theoretic Objectives for Robot Exploration

Authors on Pith no claims yet

Pith reviewed 2026-05-13 05:02 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.ITcs.LGcs.SYeess.SYmath.IT
keywords robot explorationinformation-theoretic objectivesoptimal experimental designFisher information matrixparameter identifiabilitynuisance parametersreinforcement learning
0
0 comments X

The pith

Quasi-Optimal Experimental Design lets robots focus exploration on identifiable model parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes Quasi-Optimal Experimental Design (QOED) to guide robot exploration using information objectives that target parameters whose values can actually be learned from data. The method analyzes the Fisher information matrix to find observable directions in parameter space and adjusts the objective to reduce distortion from unidentifiable nuisance parameters. A reader would care because in complex robots many parameters are weakly observable, so standard objectives waste effort on data that cannot reduce uncertainty in the right places. If correct, this leads to more efficient data collection that improves downstream policy learning in navigation and manipulation tasks.

Core claim

QOED performs eigenspace analysis of the Fisher information matrix to identify an observable subspace and select identifiable parameter directions. It modifies the exploration objective to emphasize these directions while suppressing nuisance effects from non-critical parameters. Under bounded nuisance influence and limited coupling between critical and nuisance directions, QOED provides a constant-factor approximation to the ideal information objective that explores all parameters.

What carries the argument

Quasi-Optimal Experimental Design (QOED) that uses eigenspace analysis on the Fisher information matrix to select identifiable parameter directions and suppress nuisance parameter effects in the objective.

If this is right

  • Selection of identifiable directions improves performance by 35.23 percent in navigation and manipulation tasks.
  • Nuisance suppression adds 21.98 percent further improvement.
  • Integration as an exploration objective in model-based policy optimization beats established RL baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could help in other high-dimensional learning settings like vision-based control where many latent factors are unobservable.
  • Relaxing the bounded coupling assumption might require new analysis to maintain the approximation guarantee.
  • Testing on a wider range of robotic platforms would show how broadly the constant-factor result applies.

Load-bearing premise

The influence of nuisance parameters is bounded and their coupling to critical directions is limited.

What would settle it

An experiment in which nuisance parameters are allowed to strongly couple with critical ones, to check if the performance improvements from QOED disappear.

Figures

Figures reproduced from arXiv: 2605.12084 by Jionghao Wang, Lantao Liu, Wenping Wang, Youwei Yu, Zhengming Yu.

Figure 1
Figure 1. Figure 1: Fisher information matrix values for physical parameters of 26 representative robots. Color denotes parameter type; size indicates information content. The large variation in information distribution indicates that information can be concentrated in only a few critical physical parameters. dynamics: ∇ϕ log p(τt|ϕ,π) = Xt−1 k=0 ∇ϕ log p(sk+1|sk, ak, ϕ). (4) The full derivation is given in Appendix B. To obt… view at source ↗
Figure 2
Figure 2. Figure 2: Information gain trajectories. Top: Fisher information landscape over mass θ1 and friction θ2 with induced trajectories for the box pushing task. Bottom: local geometry of the dynamics. Op￾timizing information in both parameters can drift off the identifiable ridge. Restricting the objective to selected coordinates can stay closer to the ridge but may still excite non-identifiable directions. Our QOED meth… view at source ↗
Figure 3
Figure 3. Figure 3: Policy performance across diverse robot environments. QOED-PHYSICS with ground-truth physics consistently outperforms baselines, validating our adaptive information objective. QOED with learned dynamics also performs well, highlighting the promise of learned models for exploration. BBOED(ϕ|τt) = tr (Fϕ). We do not include an “observable￾subspace” BOED variant because the observable directions W⊤ o ϕ need n… view at source ↗
Figure 4
Figure 4. Figure 4: Rod balancing demonstration and parameter-estimation error bars. Our QOED identifies the parameters quickly and accurately. a broader discussion of incorporating privileged information in dynamics models. QOED performance drops in G1-Rough because the learned dynamics model does not reliably capture the critical parameters during early learning. Appendix H provides further result analysis of the learned dy… view at source ↗
Figure 5
Figure 5. Figure 5: Real-world snapshots with success rates shown in the text boxes. QOED achieves the highest success rate and the lowest dynam￾ics prediction RMSE. By explicitly suppressing nuisance directions, it attains the highest cumulative information gain across environments. than apply the filters. Although QOED-AGNOSTIC targets a similar objective-adaptation strategy, it does not account for the influence of frictio… view at source ↗
Figure 6
Figure 6. Figure 6: Critical parameter identification miss rate via our learned dynamics with respect to learning iterations. respects the physical nature of each parameter: strictly positive quantities (e.g., friction, armature, damping) are perturbed via [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
read the original abstract

Designing learnable information-theoretic objectives for robot exploration remains challenging. Such objectives aim to guide exploration toward data that reduces uncertainty in model parameters, yet it is often unclear what information the collected data can actually reveal. Although reinforcement learning (RL) can optimize a given objective, constructing objectives that reflect parametric learnability is difficult in high-dimensional robotic systems. Many parameter directions are weakly observable or unidentifiable, and even when identifiable directions are selected, omitted directions can still influence exploration and distort information measures. To address this challenge, we propose Quasi-Optimal Experimental Design (Q{\footnotesize OED}), an adaptive information objective grounded in optimal experimental design. Q{\footnotesize OED} (i) performs eigenspace analysis of the Fisher information matrix to identify an observable subspace and select identifiable parameter directions, and (ii) modifies the exploration objective to emphasize these directions while suppressing nuisance effects from non-critical parameters. Under bounded nuisance influence and limited coupling between critical and nuisance directions, Q{\footnotesize OED} provides a constant-factor approximation to the ideal information objective that explores all parameters. We evaluate Q{\footnotesize OED} on simulated and real-world navigation and manipulation tasks, where identifiable-direction selection and nuisance suppression yield performance improvements of \SI{35.23}{\percent} and \SI{21.98}{\percent}, respectively. When integrated as an exploration objective in model-based policy optimization, Q{\footnotesize OED} further improves policy performance over established RL baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Quasi-Optimal Experimental Design (QOED) for robot exploration. QOED performs eigenspace analysis on the Fisher information matrix to isolate an observable subspace of identifiable parameters and re-weights the information objective to suppress nuisance directions. Under the assumptions of bounded nuisance influence and limited coupling between critical and nuisance directions, the method is claimed to deliver a constant-factor approximation to the ideal information objective that considers all parameters. Evaluations on simulated and real navigation and manipulation tasks report gains of 35.23% and 21.98%, respectively, and integration into model-based policy optimization yields improved policy performance over RL baselines.

Significance. If the constant-factor guarantee can be shown to apply in the evaluated domains, the approach would offer a principled way to focus exploration on learnable parameters in high-dimensional robotic systems, potentially improving sample efficiency. The empirical results on both simulation and hardware, together with the RL integration, constitute a concrete strength; however, the absence of verification for the key bounding assumptions limits the immediate impact of the theoretical claim.

major comments (2)
  1. [Abstract / theoretical section] Abstract and theoretical development section: The central claim that QOED yields a constant-factor approximation is explicitly conditioned on 'bounded nuisance influence and limited coupling between critical and nuisance directions,' yet no derivation of the required bounds on nuisance norms or cross-term magnitudes is supplied, nor are these quantities measured on the Fisher information matrices arising in the navigation and manipulation tasks.
  2. [Experimental results] Experimental results section (performance tables): The reported 35.23% and 21.98% improvements are presented as support for the method, but without evidence that the nuisance bounds hold in the evaluated domains these gains are consistent with the claim yet do not constitute confirmation of the approximation guarantee.
minor comments (2)
  1. [Method] The precise definition of the re-weighted objective (the modified information measure) would benefit from an explicit equation immediately following the eigenspace analysis description.
  2. [Experiments] Baseline details (exact RL algorithms, hyper-parameter ranges, and data-exclusion rules) are only summarized; a supplementary table listing these would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We agree that the theoretical claims regarding the constant-factor approximation would benefit from explicit derivations of the bounding assumptions and empirical verification in the evaluated domains. Below we address each major comment and describe the revisions we will incorporate.

read point-by-point responses
  1. Referee: [Abstract / theoretical section] Abstract and theoretical development section: The central claim that QOED yields a constant-factor approximation is explicitly conditioned on 'bounded nuisance influence and limited coupling between critical and nuisance directions,' yet no derivation of the required bounds on nuisance norms or cross-term magnitudes is supplied, nor are these quantities measured on the Fisher information matrices arising in the navigation and manipulation tasks.

    Authors: We acknowledge that the current manuscript does not supply an explicit derivation of the bounds on nuisance norms or cross-term magnitudes, nor does it report measurements of these quantities for the Fisher information matrices in the navigation and manipulation tasks. In the revised manuscript we will add a dedicated subsection in the theoretical development that derives the required bounds on nuisance influence and coupling terms under the stated assumptions. We will also compute and report the nuisance norms and cross-term magnitudes directly from the Fisher matrices obtained during the simulated and real-world experiments to verify that the assumptions hold in the evaluated domains. revision: yes

  2. Referee: [Experimental results] Experimental results section (performance tables): The reported 35.23% and 21.98% improvements are presented as support for the method, but without evidence that the nuisance bounds hold in the evaluated domains these gains are consistent with the claim yet do not constitute confirmation of the approximation guarantee.

    Authors: The reported performance gains illustrate the practical utility of QOED for focusing exploration on identifiable parameters. We agree, however, that these empirical results alone do not confirm the constant-factor approximation guarantee without verification that the nuisance bounds hold. As described in our response to the first comment, the revised manuscript will include explicit measurements of the nuisance norms and coupling terms from the experimental Fisher matrices; this will enable a direct assessment of whether the observed gains are consistent with the theoretical guarantee. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The paper's core construction applies standard eigenspace decomposition to the Fisher information matrix to isolate identifiable directions, then reweights the information objective to down-weight nuisance components. The constant-factor approximation claim is explicitly conditional on external assumptions (bounded nuisance norms and limited cross-coupling) rather than derived from or fitted to the same quantities. Reported performance deltas are empirical outcomes measured on navigation and manipulation tasks, not algebraic identities. No equation reduces a prediction to a fitted input by construction, no load-bearing self-citation chain is invoked, and the method does not rename a known result under new coordinates. The chain therefore retains independent grounding from classical optimal experimental design.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on a single domain assumption about nuisance effects to obtain the constant-factor approximation guarantee; no free parameters or new invented entities are introduced in the abstract.

axioms (1)
  • domain assumption bounded nuisance influence and limited coupling between critical and nuisance directions
    Invoked explicitly to guarantee that QOED provides a constant-factor approximation to the ideal information objective.

pith-pipeline@v0.9.0 · 5591 in / 1305 out tokens · 73622 ms · 2026-05-13T05:02:38.527047+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 1 internal anchor

  1. [1]

    Modern Bayesian experimental design.Statistical Science, 39(1):100–114, 2024

    Tom Rainforth, Adam Foster, Desi R Ivanova, and Fred- die Bickford Smith. Modern Bayesian experimental design.Statistical Science, 39(1):100–114, 2024

  2. [2]

    Fisherrf: Active view selection and uncertainty quantification for radiance fields using fisher information

    Wen Jiang, Boshu Lei, and Kostas Daniilidis. Fisherrf: Active view selection and uncertainty quantification for radiance fields using fisher information.arXiv preprint arXiv:2311.17874, 2023

  3. [3]

    GauSS-MI: Gaussian Splatting Shannon Mutual Information for Active 3D Reconstruction

    Yuhan Xie, Yixi Cai, Yinqiang Zhang, Lei Yang, and Jia Pan. GauSS-MI: Gaussian Splatting Shannon Mutual Information for Active 3D Reconstruction. InProceed- ings of Robotics: Science and Systems, LosAngeles, CA, USA, June 2025

  4. [4]

    Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting

    Matthew Strong, Boshu Lei, Aiden Swann, Wen Jiang, Kostas Daniilidis, and Monroe Kennedy. Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 3204–3210, 2025

  5. [5]

    ASID: Active Explo- ration for System Identification in Robotic Manipulation

    Marius Memmel, Andrew Wagenmaker, Chuning Zhu, Dieter Fox, and Abhishek Gupta. ASID: Active Explo- ration for System Identification in Robotic Manipulation. InThe Twelfth International Conference on Learning Representations, 2024

  6. [6]

    Sampling-based system identification with active exploration for legged robot sim2real learn- ing.arXiv preprint arXiv:2505.14266, 2025

    Nikhil Sobanbabu, Guanqi He, Tairan He, Yuxiang Yang, and Guanya Shi. Sampling-based system identification with active exploration for legged robot sim2real learn- ing.arXiv preprint arXiv:2505.14266, 2025

  7. [7]

    Behavior Synthesis via Contact-Aware Fisher Information Max- imization

    Hrishikesh Sathyanarayan and Ian Abraham. Behavior Synthesis via Contact-Aware Fisher Information Max- imization. InProceedings of Robotics: Science and Systems, LosAngeles, CA, USA, June 2025

  8. [8]

    You’ve Got to Feel It To Believe It: Multi-Modal Bayesian Inference for Semantic and Property Prediction

    Parker Ewen, Hao Chen, Yuzhen Chen, Anran Li, Anup Bagali, Gitesh Gunjal, and Ram Vasudevan. You’ve Got to Feel It To Believe It: Multi-Modal Bayesian Inference for Semantic and Property Prediction. InProceedings of Robotics: Science and Systems, Delft, Netherlands, July 2024

  9. [9]

    C. Cao, H. Zhu, Z. Ren, H. Choset, and J. Zhang. Rep- resentation granularity enables time-efficient autonomous exploration in large, complex worlds.Science Robotics, 8(80):eadf0970, 2023

  10. [10]

    Bayesian Q-learning.Aaai/iaai, 1998:761–768, 1998

    Richard Dearden, Nir Friedman, Stuart Russell, et al. Bayesian Q-learning.Aaai/iaai, 1998:761–768, 1998

  11. [11]

    Efficient exploration through bayesian deep q-networks

    Kamyar Azizzadenesheli, Emma Brunskill, and Ani- mashree Anandkumar. Efficient exploration through bayesian deep q-networks. In2018 Information Theory and Applications Workshop (ITA), pages 1–9. IEEE, 2018

  12. [12]

    Gen- eralization and Exploration via Randomized Value Func- tions

    Ian Osband, Benjamin Van Roy, and Zheng Wen. Gen- eralization and Exploration via Randomized Value Func- tions. In Maria Florina Balcan and Kilian Q. Weinberger, editors,Proceedings of The 33rd International Confer- ence on Machine Learning, volume 48 ofProceedings of Machine Learning Research, pages 2377–2386, New York, New York, USA, 20–22 Jun 2016. PMLR

  13. [13]

    Russo, and Zheng Wen

    Ian Osband, Benjamin Van Roy, Daniel J. Russo, and Zheng Wen. Deep Exploration via Randomized Value Functions.Journal of Machine Learning Research, 20 (124):1–62, 2019

  14. [14]

    Information Di- rected Sampling and Bandits with Heteroscedastic Noise

    Johannes Kirschner and Andreas Krause. Information Di- rected Sampling and Bandits with Heteroscedastic Noise. In S ´ebastien Bubeck, Vianney Perchet, and Philippe Rigollet, editors,Proceedings of the 31st Conference On Learning Theory, volume 75 ofProceedings of Machine Learning Research, pages 358–384. PMLR, 06–09 Jul 2018

  15. [15]

    Efficient exploration with double uncertain value networks.arXiv preprint arXiv:1711.10789, 2017

    Thomas M Moerland, Joost Broekens, and Catholijn M Jonker. Efficient exploration with double uncertain value networks.arXiv preprint arXiv:1711.10789, 2017

  16. [16]

    A distributional perspective on reinforcement learning

    Marc G Bellemare, Will Dabney, and R ´emi Munos. A distributional perspective on reinforcement learning. InInternational conference on machine learning, pages 449–458. PMLR, 2017

  17. [17]

    Distributional reinforcement learning for efficient exploration

    Borislav Mavrin, Hengshuai Yao, Linglong Kong, Kai- wen Wu, and Yaoliang Yu. Distributional reinforcement learning for efficient exploration. InInternational con- ference on machine learning, pages 4424–4434. PMLR, 2019

  18. [18]

    Information-Directed Exploration for Deep Reinforcement Learning

    Nikolay Nikolov, Johannes Kirschner, Felix Berkenkamp, and Andreas Krause. Information-Directed Exploration for Deep Reinforcement Learning. InInternational Conference on Learning Representations, 2019

  19. [19]

    #exploration: a study of count-based exploration for deep reinforcement learn- ing

    Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, OpenAI Xi Chen, Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel. #exploration: a study of count-based exploration for deep reinforcement learn- ing. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Proces...

  20. [20]

    Unifying Count-Based Exploration and Intrinsic Motivation

    Marc Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, and Remi Munos. Unifying Count-Based Exploration and Intrinsic Motivation. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Gar- nett, editors,Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016

  21. [21]

    Dora the explorer: Directed outreaching reinforcement action-selection.arXiv preprint arXiv:1804.04012, 2018

    Leshem Choshen, Lior Fox, and Yonatan Loewenstein. Dora the explorer: Directed outreaching reinforcement action-selection.arXiv preprint arXiv:1804.04012, 2018

  22. [22]

    First return, then explore

    Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O Stanley, and Jeff Clune. First return, then explore. Nature, 590(7847):580–586, 2021

  23. [23]

    Exploration by Random Network Distillation

    Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018

  24. [24]

    Incentivizing exploration in reinforcement learning with deep predictive models.arXiv preprint arXiv:1507.00814, 2015

    Bradly C Stadie, Sergey Levine, and Pieter Abbeel. Incentivizing exploration in reinforcement learning with deep predictive models.arXiv preprint arXiv:1507.00814, 2015

  25. [25]

    Pierre-Yves Oudeyer, Frdric Kaplan, and Verena V . Hafner. Intrinsic Motivation Systems for Autonomous Mental Development.IEEE Transactions on Evolution- ary Computation, 11(2):265–286, 2007

  26. [26]

    Efros, and Trevor Darrell

    Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. Curiosity-driven Exploration by Self- supervised Prediction. InICML, 2017

  27. [27]

    Learning to perform physics experiments via deep reinforcement learning.arXiv preprint arXiv:1611.01843, 2016

    Misha Denil, Pulkit Agrawal, Tejas D Kulkarni, Tom Erez, Peter Battaglia, and Nando De Freitas. Learning to perform physics experiments via deep reinforcement learning.arXiv preprint arXiv:1611.01843, 2016

  28. [28]

    Wilson, Jarvis A

    Andrew D. Wilson, Jarvis A. Schultz, and Todd D. Murphey. Trajectory Synthesis for Fisher Information Maximization.IEEE Transactions on Robotics, 30(6): 1358–1370, 2014

  29. [29]

    Rafael Oliveira, Dino Sejdinovic, David Howard, and Edwin V . Bonilla. Bayesian Adaptive Calibration and Optimal Design. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

  30. [30]

    Constantine.Active Subspaces

    Paul G. Constantine.Active Subspaces. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2015

  31. [31]

    Sloman, Ayush Bharti, Julien Martinelli, and Samuel Kaski

    Sabina J. Sloman, Ayush Bharti, Julien Martinelli, and Samuel Kaski. Bayesian Active Learning in the Presence of Nuisance Parameters. InThe 40th Conference on Uncertainty in Artificial Intelligence, 2024

  32. [32]

    TD- MPC2: Scalable, Robust World Models for Continuous Control

    Nicklas Hansen, Hao Su, and Xiaolong Wang. TD- MPC2: Scalable, Robust World Models for Continuous Control. InThe Twelfth International Conference on Learning Representations, 2024

  33. [33]

    ANYmal parkour: Learning agile navigation for quadrupedal robots.Science Robotics, 9(88):eadi7566, 2024

    David Hoeller, Nikita Rudin, Dhionis Sako, and Marco Hutter. ANYmal parkour: Learning agile navigation for quadrupedal robots.Science Robotics, 9(88):eadi7566, 2024

  34. [34]

    Humanoid Parkour Learning

    Ziwen Zhuang, Shenzhe Yao, and Hang Zhao. Humanoid Parkour Learning. In8th Annual Conference on Robot Learning, 2024

  35. [35]

    DTC: Deep Tracking Control.Science Robotics, 9(86):eadh5401, 2024

    Fabian Jenelten, Junzhe He, Farbod Farshidian, and Marco Hutter. DTC: Deep Tracking Control.Science Robotics, 9(86):eadh5401, 2024

  36. [36]

    DeX- treme: Transfer of Agile In-hand Manipulation from Simulation to Reality

    Ankur Handa, Arthur Allshire, Viktor Makoviychuk, Aleksei Petrenko, Ritvik Singh, Jingzhou Liu, Denys Makoviichuk, Karl Van Wyk, Alexander Zhurkevich, Balakumar Sundaralingam, and Yashraj Narang. DeX- treme: Transfer of Agile In-hand Manipulation from Simulation to Reality. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 5977...

  37. [37]

    Reconciling Reality through Simulation: A Real-To-Sim-to-Real Approach for Robust Manipulation

    Marcel Torne Villasevil, Anthony Simeonov, Zechu Li, April Chan, Tao Chen, Abhishek Gupta, and Pulkit Agrawal. Reconciling Reality through Simulation: A Real-To-Sim-to-Real Approach for Robust Manipulation. InProceedings of Robotics: Science and Systems, Delft, Netherlands, July 2024

  38. [38]

    Newton: GPU-accelerated physics simulation for robotics, and simulation research., 2025

    Newton Contributors. Newton: GPU-accelerated physics simulation for robotics, and simulation research., 2025

  39. [39]

    When to Trust Your Model: Model-Based Policy Optimization

    Michael Janner, Justin Fu, Marvin Zhang, and Sergey Levine. When to Trust Your Model: Model-Based Policy Optimization. In H. Wallach, H. Larochelle, A. Beygelz- imer, F. d'Alch ´e-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

  40. [40]

    Robotic world model: A neural network simulator for robust policy optimization in robotics.arXiv preprint arXiv:2501.10100, 2025

    Chenhao Li, Andreas Krause, and Marco Hutter. Robotic world model: A neural network simulator for ro- bust policy optimization in robotics.arXiv preprint arXiv:2501.10100, 2025

  41. [41]

    Kronecker- Factored Approximate Curvature for Modern Neural Net- work Architectures

    Runa Eschenhagen, Alexander Immer, Richard E Turner, Frank Schneider, and Philipp Hennig. Kronecker- Factored Approximate Curvature for Modern Neural Net- work Architectures. InThirty-seventh Conference on Neural Information Processing Systems, 2023

  42. [42]

    Springer Science & Business Media, 2012

    Mark J Schervish.Theory of statistics. Springer Science & Business Media, 2012

  43. [43]

    Society for Industrial and Applied Mathematics, 2006

    Friedrich Pukelsheim.Optimal Design of Experiments. Society for Industrial and Applied Mathematics, 2006

  44. [44]

    Universal statistics of fisher information in deep neural networks: Mean field approach

    Ryo Karakida, Shotaro Akaho, and Shun-ichi Amari. Universal statistics of fisher information in deep neural networks: Mean field approach. InThe 22nd Interna- tional Conference on Artificial Intelligence and Statistics, pages 1032–1041. PMLR, 2019

  45. [45]

    Hauber, Marcus Rosen- blatt, Christian T ¨onsing, and Jens Timmer

    Franz-Georg Wieland, Adrian L. Hauber, Marcus Rosen- blatt, Christian T ¨onsing, and Jens Timmer. On structural and practical identifiability.Current Opinion in Systems Biology, 25:60–69, 2021. ISSN 2452-3100

  46. [46]

    The cross-entropy method for com- binatorial and continuous optimization.Methodology and computing in applied probability, 1(2):127–190, 1999

    Reuven Rubinstein. The cross-entropy method for com- binatorial and continuous optimization.Methodology and computing in applied probability, 1(2):127–190, 1999

  47. [47]

    DART: Dense Articulated Real-Time Tracking

    Tanner Schmidt, Richard Newcombe, and Dieter Fox. DART: Dense Articulated Real-Time Tracking. InPro- ceedings of Robotics: Science and Systems, Berkeley, USA, July 2014

  48. [48]

    Radhakrishna Rao

    C. Radhakrishna Rao. Minimum variance and the esti- mation of several parameters.Mathematical Proceedings of the Cambridge Philosophical Society, 43(2):280–283, 1947

  49. [49]

    Accelerated greedy algorithms for maximizing submodular set functions

    Michel Minoux. Accelerated greedy algorithms for maximizing submodular set functions. In J. Stoer, editor, Optimization Techniques, pages 234–243, Berlin, Heidel- berg, 1978. Springer Berlin Heidelberg. ISBN 978-3- 540-35890-9

  50. [50]

    One Step Diffusion via Shortcut Models

    Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One Step Diffusion via Shortcut Models. In The Thirteenth International Conference on Learning Representations, 2025

  51. [51]

    Neural ordinary differential equations.Advances in neural information processing systems, 31, 2018

    Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations.Advances in neural information processing systems, 31, 2018

  52. [52]

    Attention is All you Need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

  53. [53]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

  54. [54]

    Understanding Domain Randomization for Sim- to-real Transfer

    Xiaoyu Chen, Jiachen Hu, Chi Jin, Lihong Li, and Liwei Wang. Understanding Domain Randomization for Sim- to-real Transfer. InInternational Conference on Learning Representations, 2022

  55. [55]

    MuJoCo: A physics engine for model-based control

    Emanuel Todorov, Tom Erez, and Yuval Tassa. MuJoCo: A physics engine for model-based control. In2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012

  56. [56]

    mjlab: A Lightweight Framework for GPU-Accelerated Robot Learning

    Kevin Zakka, Qiayuan Liao, Brent Yi, Louis Le Lay, Koushil Sreenath, and Pieter Abbeel. mjlab: A Lightweight Framework for GPU-Accelerated Robot Learning. 2026

  57. [57]

    Action Flow Matching for Continual Robot Learning

    Alejandro Murillo-Gonz ´alez and Lantao Liu. Action Flow Matching for Continual Robot Learning. InPro- ceedings of Robotics: Science and Systems, Los Angeles, CA, USA, June 2025

  58. [58]

    Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor

    Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. InInternational conference on machine learning, pages 1861–1870. Pmlr, 2018

  59. [59]

    Redeeming intrinsic rewards via constrained optimization

    Eric Chen, Hong Zhang-Wei, Joni Pajarinen, and Pulkit Agrawal. Redeeming intrinsic rewards via constrained optimization. InAdvances in Neural Information Pro- cessing Systems (NeurIPS), 2022

  60. [60]

    Self- Supervised Exploration via Disagreement

    Deepak Pathak, Dhiraj Gandhi, and Abhinav Gupta. Self- Supervised Exploration via Disagreement. InICML, 2019

  61. [61]

    Planning to Explore via Self-Supervised World Models

    Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, and Deepak Pathak. Planning to Explore via Self-Supervised World Models. InICML, 2020

  62. [62]

    Discovering and Achieving Goals via World Models

    Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, and Deepak Pathak. Discovering and Achieving Goals via World Models. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Pro- cessing Systems, volume 34, pages 24379–24391. Curran Associates, Inc., 2021

  63. [63]

    Stable-Baselines3: Reliable Reinforcement Learning Im- plementations.Journal of Machine Learning Research, 22(268):1–8, 2021

    Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-Baselines3: Reliable Reinforcement Learning Im- plementations.Journal of Machine Learning Research, 22(268):1–8, 2021

  64. [64]

    Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

    Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 31. Cur- ran Associates, Inc., 2018

  65. [65]

    Hu, James Springer, Oleh Rybkin, and Dinesh Jayaraman

    Edward S. Hu, James Springer, Oleh Rybkin, and Dinesh Jayaraman. Privileged Sensing Scaffolds Reinforcement Learning. InThe Twelfth International Conference on Learning Representations, 2024

  66. [66]

    Real-to-Sim: Predict- ing Residual Errors of Robotic Systems with Sparse Data using a Learning-Based Unscented Kalman Filter

    Alexander Schperberg, Yusuke Tanaka, Feng Xu, Mar- cel Menner, and Dennis Hong. Real-to-Sim: Predict- ing Residual Errors of Robotic Systems with Sparse Data using a Learning-Based Unscented Kalman Filter. In2023 20th International Conference on Ubiquitous Robots (UR), pages 27–34, 2023

  67. [67]

    Faster-LIO: Lightweight Tightly Coupled Lidar-Inertial Odometry Using Parallel Sparse Incremental V oxels.IEEE Robotics and Automation Letters, 7(2):4861–4868, 2022

    Chunge Bai, Tao Xiao, Yajie Chen, Haoqian Wang, Fang Zhang, and Xiang Gao. Faster-LIO: Lightweight Tightly Coupled Lidar-Inertial Odometry Using Parallel Sparse Incremental V oxels.IEEE Robotics and Automation Letters, 7(2):4861–4868, 2022

  68. [68]

    Fast Extrinsic Calibration for Multiple Inertial Mea- surement Units in Visual-Inertial System

    Youwei Yu, Yanqing Liu, Fengjie Fu, Sihan He, Dongchen Zhu, Lei Wang, Xiaolin Zhang, and Jiamao Li. Fast Extrinsic Calibration for Multiple Inertial Mea- surement Units in Visual-Inertial System. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 01–07, 2023

  69. [69]

    Adaptive Diffusion Terrain Generator for Autonomous Uneven Terrain Navigation

    Youwei Yu, Junhong Xu, and Lantao Liu. Adaptive Diffusion Terrain Generator for Autonomous Uneven Terrain Navigation. In8th Annual Conference on Robot Learning, 2024. APPENDIX A. Shortcut Flow Matching In the main text, we instantiate the surrogate transition like- lihoodq θ using shortcut models [50]. This appendix provides the full training objective an...

  70. [70]

    To match the dynamics model definition in the main text, we model the state increment x=s t+1 −s t Let the base noise distribution bep 0(δ) =N(0,I)

    Setup and notation:For each transition(s t,a t,ϕ,s t+1) in the dataset, we define the conditioning variable c= (s t,a t,ϕ). To match the dynamics model definition in the main text, we model the state increment x=s t+1 −s t Let the base noise distribution bep 0(δ) =N(0,I). Given a data incrementx, a noise sampleδ, and a “time”u∈[0,1], we define the linear ...

  71. [71]

    The total loss is L(θ) :=L FM(θ) +L SC(θ)

    Learning objective:Shortcut models learn a conditional velocity fieldv θ(zu, u,c, d), whered≥0is an extra input that represents a step size. The total loss is L(θ) :=L FM(θ) +L SC(θ). The flow-matching termL FM matches the predicted veloc- ity to the target velocity: LFM(θ) =E (x,c)∼D,δ∼p 0,u∼U[0,1] h vθ(zu, u,c,0)−(δ−x) 2i . The self-consistency termL SC...

  72. [72]

    Givenδ∼p 0(·)and conditioning c, define the one-step map Tθ(δ,c) :=δ−v θ(δ,1,c,1)

    One-step sampling:After training, we can generate an increment in a single step. Givenδ∼p 0(·)and conditioning c, define the one-step map Tθ(δ,c) :=δ−v θ(δ,1,c,1). We sample an increment ˆx=T θ(δ,c)and then sets t+1 = st + ˆx. In the main text, we overload notation and write this transport map asT θ(δ,s t,a t,ϕ)

  73. [73]

    For the shortcut-model transport map, the Jacobian is ∇δTθ(δ,c) =I− ∇ δvθ(δ,1,c,1), so the conditional log-density becomes logq θ(x|c) = logp 0(δ)−log|det(I− ∇ δvθ(δ,1,c,1))|

    Conditional log-density:IfT θ(·,c)is a diffeomorphism inδ, the conditional density follows from the change-of- variables formula: logq θ(x|c) = logp 0(δ)−log|det∇ δTθ(δ,c)|, withδ=T −1 θ (x,c). For the shortcut-model transport map, the Jacobian is ∇δTθ(δ,c) =I− ∇ δvθ(δ,1,c,1), so the conditional log-density becomes logq θ(x|c) = logp 0(δ)−log|det(I− ∇ δvθ...

  74. [74]

    SinceF ϕ =E[gg ⊤], E[˜g˜g⊤] =E[W ⊤gg⊤W] =W ⊤F ϕW=Λ

    Third, we take expectations using the eigen-decomposition. SinceF ϕ =E[gg ⊤], E[˜g˜g⊤] =E[W ⊤gg⊤W] =W ⊤F ϕW=Λ. ThereforeE[ ˜go˜g⊤ o ] =Λ o = diag({λi}i /∈o)and E h ∥˜go∥2 2 i = tr(Λo) = X i /∈o λi. Combining with the second step gives the last equality. Finally, the trace form follows fromE∥(I−P o)g∥2 2 =E[g ⊤(I− Po)g] = tr((I−P o)E[gg⊤]), which yields th...