A Path-Space Formulation of Prediction in World Models: From a Single Action to Prediction, Planning, and Irreversibility

Gunn Kim

arxiv: 2606.28751 · v1 · pith:TYLGOZM7new · submitted 2026-06-27 · 💻 cs.LG · cond-mat.stat-mech

A Path-Space Formulation of Prediction in World Models: From a Single Action to Prediction, Planning, and Irreversibility

Gunn Kim This is my paper

Pith reviewed 2026-06-30 09:25 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.stat-mech

keywords world modelspath spaceirreversibilityentropy productionattention modelstrajectory predictionOnsager-Machlup functional

0 comments

The pith

World models define probability measures over future trajectories rather than sequences of states.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that prediction in world models is best understood as specifying a probability distribution over entire future paths. This path measure allows prediction to be the most likely trajectory, planning to be constrained optimization over paths, and uncertainty to be fluctuations around it, all derived from one action functional. In the local Markov regime, this measure takes the Onsager-Machlup form. Experiments with small attention models show that the attention mechanism becomes asymmetric during training in a way that matches the irreversibility of the training data. Making the attention symmetric reduces entropy production and hurts long-horizon prediction only for irreversible dynamics.

Core claim

The central claim is that the fundamental predictive object in a world model is a distribution over future paths. In controlled attention-based models, attention asymmetry is acquired during training in proportion to the irreversibility of the data. Symmetrizing the learned attention suppresses entropy production and selectively degrades long-horizon prediction of irreversible dynamics while preserving relaxational prediction. This suggests that irreversibility may serve as a computational resource for predictive world models.

What carries the argument

The path-space probability measure, which reduces to the Onsager-Machlup action functional under local Markovian dynamics; this functional unifies prediction, planning, and uncertainty as operations on it.

Load-bearing premise

Latent dynamics admit an effective Markovian description in the local regime, allowing the path measure to take the Onsager-Machlup form.

What would settle it

Train attention models on data with varying degrees of irreversibility, symmetrize the attention, and measure whether the increase in long-horizon prediction error correlates with the degree of irreversibility.

Figures

Figures reproduced from arXiv: 2606.28751 by Gunn Kim.

**Figure 2.** Figure 2: FIG. 2 [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3 [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

read the original abstract

We propose a path-space formulation of prediction in AI world models. Rather than sequences of one-step conditional distributions, we argue that a world model implicitly defines a probability measure over future trajectories. In the local regime where latent dynamics admit an effective Markovian description, this path measure takes the Onsager-Machlup form. Within this framework, prediction (most probable trajectory), planning (constrained optimization), and uncertainty (fluctuations) emerge as operations on a single action functional. We decompose the latent dynamics into reversible and irreversible components and introduce operational measures of entropy production from model rollouts. In controlled small-scale attention-based models, we find that attention asymmetry is acquired during training in proportion to the irreversibility of the data. Symmetrizing the learned attention suppresses entropy production and selectively degrades long-horizon prediction of irreversible dynamics while preserving relaxational prediction. These results suggest that irreversibility may serve as a computational resource for predictive world models. More generally, the fundamental predictive object is a distribution over future paths rather than states.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The empirical link between attention asymmetry and data irreversibility in small models is the concrete new piece, but the Onsager-Machlup identification rests on an unverified Markov assumption.

read the letter

The paper reframes prediction in world models as operations on a path measure rather than one-step conditionals. In the local Markov regime it invokes the Onsager-Machlup functional to split reversible and irreversible dynamics and then defines entropy production from model rollouts. The main empirical result is that attention asymmetry in small controlled models grows with the irreversibility of the training sequences, and symmetrizing attention selectively impairs long-horizon prediction on irreversible data while leaving relaxational cases intact.

That observation is useful. It gives a direct, testable handle on how a common architectural bias interacts with time asymmetry, which matters for anyone building world models that must handle non-equilibrium dynamics.

The weak point is the central theoretical step. The decomposition and the entropy-production measure both depend on the latent dynamics satisfying the conditions for the Onsager-Machlup form, yet the manuscript supplies no check that the trained attention rollouts are effectively Markovian, continuous-time, or driven by additive Gaussian noise. Without that verification the reported correlation could arise from generic sequence-model biases instead of the claimed path-space structure. The abstract also omits error bars, dataset sizes, and rollout lengths, so the quantitative strength of the finding is difficult to assess.

This is aimed at people working on physics-motivated losses or long-horizon planning in generative models. A reader looking for new empirical angles on irreversibility will find something worth testing. The work shows clear thinking and honest engagement with its own framing, so it deserves a serious referee once the Markov check and experimental details are added.

Referee Report

2 major / 2 minor

Summary. The paper proposes a path-space formulation of prediction in world models, arguing that the fundamental object is a probability measure over future trajectories rather than sequences of state distributions. In the local regime where latent dynamics admit an effective Markovian description, this path measure is identified with the Onsager-Machlup functional. The framework decomposes dynamics into reversible and irreversible components, introduces operational entropy-production measures from model rollouts, and reports that in small-scale attention-based models, attention asymmetry is acquired during training in proportion to data irreversibility; symmetrizing attention suppresses entropy production and selectively degrades long-horizon prediction of irreversible dynamics while preserving relaxational prediction.

Significance. If the central identification and experimental correlation hold, the work supplies a unified action-functional view of prediction, planning, and uncertainty, together with an empirical demonstration that irreversibility can function as a computational resource for world models. The reported link between learned attention asymmetry and data irreversibility, plus the selective degradation result after symmetrization, would be a concrete contribution to understanding why asymmetric mechanisms emerge in predictive sequence models.

major comments (2)

[Abstract / theoretical development] Abstract / theoretical development: the identification of the induced path measure with the Onsager-Machlup functional (and the consequent decomposition into reversible/irreversible parts plus entropy-production measures) is asserted once the latent dynamics are assumed to admit an effective Markovian description, yet the manuscript supplies no explicit verification that the trained attention rollouts satisfy the required conditions (continuous-time diffusion limit, additive Gaussian noise structure, or a well-defined local Markov property in latent space). This single identification underpins both the theoretical claims and the operational interpretation of the attention-asymmetry experiments; without it the reported correlation could arise from generic sequence-model biases.
[Experimental section (attention-asymmetry results)] Experimental section (attention-asymmetry results): the claim that attention asymmetry scales with irreversibility and that symmetrization selectively degrades long-horizon irreversible prediction rests on the entropy-production measure derived from the Onsager-Machlup identification. Because that identification is unverified, the experimental interpretation remains conditional; an independent check (e.g., direct estimation of the Markov property or noise structure from the latent trajectories) is needed before the selective-degradation result can be attributed to path-space irreversibility rather than to other architectural biases.

minor comments (2)

[Theoretical development] Notation for the path measure and the reversible/irreversible decomposition should be introduced with explicit equations rather than descriptive prose, to allow direct comparison with standard Onsager-Machlup literature.
[Experiments] The manuscript should report the precise dataset sizes, training hyperparameters, and number of independent runs with error bars for the attention-asymmetry and symmetrization experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and precise comments. The two major points correctly identify a gap in verification of the key assumptions underlying the Onsager-Machlup identification. We address each below and commit to revisions that directly strengthen the manuscript.

read point-by-point responses

Referee: [Abstract / theoretical development] Abstract / theoretical development: the identification of the induced path measure with the Onsager-Machlup functional (and the consequent decomposition into reversible/irreversible parts plus entropy-production measures) is asserted once the latent dynamics are assumed to admit an effective Markovian description, yet the manuscript supplies no explicit verification that the trained attention rollouts satisfy the required conditions (continuous-time diffusion limit, additive Gaussian noise structure, or a well-defined local Markov property in latent space). This single identification underpins both the theoretical claims and the operational interpretation of the attention-asymmetry experiments; without it the reported correlation could arise from generic sequence-model biases.

Authors: We agree that the manuscript states the effective Markovian assumption in the local regime but does not supply explicit verification of the continuous-time diffusion limit, additive Gaussian noise structure, or local Markov property for the trained attention rollouts. This verification is necessary to ground the identification and the subsequent entropy-production interpretation. In revision we will add an appendix that reports direct empirical checks on the latent trajectories, including tests for approximate Markovianity (e.g., via conditional independence statistics) and noise structure (e.g., residual analysis). These additions will make the theoretical claims and experimental interpretation more robust. revision: yes
Referee: [Experimental section (attention-asymmetry results)] Experimental section (attention-asymmetry results): the claim that attention asymmetry scales with irreversibility and that symmetrization selectively degrades long-horizon irreversible prediction rests on the entropy-production measure derived from the Onsager-Machlup identification. Because that identification is unverified, the experimental interpretation remains conditional; an independent check (e.g., direct estimation of the Markov property or noise structure from the latent trajectories) is needed before the selective-degradation result can be attributed to path-space irreversibility rather than to other architectural biases.

Authors: The symmetrization experiment provides an architecture-level test showing that attention asymmetry is functionally relevant for long-horizon irreversible prediction. Nevertheless, we concur that confident attribution to path-space irreversibility requires verification of the underlying Markov and noise assumptions. The same appendix described above will include these independent checks on the latent trajectories, allowing readers to assess whether the selective degradation is better explained by the proposed path measure or by other model biases. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation applies standard path-measure result under stated assumption

full rationale

The manuscript states that 'in the local regime where latent dynamics admit an effective Markovian description, this path measure takes the Onsager-Machlup form' and then decomposes into reversible/irreversible parts. This is an invocation of a known functional from stochastic process theory rather than a self-definition, fitted parameter renamed as prediction, or load-bearing self-citation. No equations are shown reducing the claimed path measure or entropy-production measures to the model's fitted outputs by construction. The attention-asymmetry experiments are described as empirical observations on trained models versus data irreversibility, with no quoted reduction showing that irreversibility itself is computed from the same rollouts in a circular loop. The central claims therefore remain independent of the inputs they purport to explain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; the ledger is populated from the abstract alone. The Markovian assumption is the main structural premise.

axioms (1)

domain assumption Latent dynamics admit an effective Markovian description in the local regime
Stated explicitly in the abstract as the regime where the path measure takes the Onsager-Machlup form.

pith-pipeline@v0.9.1-grok · 5714 in / 1292 out tokens · 24818 ms · 2026-06-30T09:25:04.119995+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Path-Measure Dynamics of Attention-Driven World Models: A Nonlocal Onsager--Machlup Approach
cond-mat.stat-mech 2026-07 unverdicted novelty 5.0

Derives that attention-induced non-Markovian dynamics yield a nonlocal Onsager-Machlup action whose short-memory expansion recovers the local action of a companion paper.

Reference graph

Works this paper leans on

30 extracted references · 4 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

World Models

D. Ha and J. Schmidhuber, “World Models,” arXiv:1803.10122 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[2]

Mastering Diverse Domains through World Models

D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap, “Mastering Diverse Domains through World Models,” arXiv:2301.04104 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

The free-energy principle: a unified brain theory?

K. Friston, “The free-energy principle: a unified brain theory?” Nat. Rev. Neurosci.11, 127 (2010)

2010
[4]

Attention Is All You Need,

A. Vaswaniet al., “Attention Is All You Need,” inAd- vances in Neural Information Processing Systems30 (2017)

2017
[5]

Transport, Collective Motion, and Brownian Motion,

H. Mori, “Transport, Collective Motion, and Brownian Motion,” Prog. Theor. Phys.33, 423 (1965)

1965
[6]

Zwanzig,Nonequilibrium Statistical Mechanics(Ox- ford Univ

R. Zwanzig,Nonequilibrium Statistical Mechanics(Ox- ford Univ. Press, 2001)

2001
[7]

Optimal prediction and the Mori–Zwanzig representation of irre- versible processes,

A. J. Chorin, O. H. Hald, and R. Kupferman, “Optimal prediction and the Mori–Zwanzig representation of irre- versible processes,” Proc. Natl. Acad. Sci. USA97, 2968 (2000)

2000
[8]

Fluctuations and Irre- versible Processes,

L. Onsager and S. Machlup, “Fluctuations and Irre- versible Processes,” Phys. Rev.91, 1505 (1953)

1953
[9]

Stochastic thermodynamics, fluctuation the- orems and molecular machines,

U. Seifert, “Stochastic thermodynamics, fluctuation the- orems and molecular machines,” Rep. Prog. Phys.75, 126001 (2012)

2012
[10]

Hopfield Networks is All You Need,

H. Ramsaueret al., “Hopfield Networks is All You Need,” inInt. Conf. on Learning Representations (ICLR) (2021)

2021
[11]

Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review

S. Levine, “Reinforcement Learning and Control as Prob- abilistic Inference,” arXiv:1805.00909 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

Score-Based Generative Modeling through Stochastic Differential Equations,

Y. Songet al., “Score-Based Generative Modeling through Stochastic Differential Equations,” inInt. Conf. on Learning Representations (ICLR)(2021)

2021
[13]

Flow Matching for Generative Modeling,

Y. Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow Matching for Generative Modeling,” inInt. Conf. on Learning Representations (ICLR)(2023)

2023
[14]

Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager–Machlup Functional,

S. Rajaet al., “Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager–Machlup Functional,” inProc. 42nd Int. Conf. on Machine Learning (ICML), PMLR267, 50972 (2025)

2025
[15]

Odd elasticity,

C. Scheibneret al., “Odd elasticity,” Nat. Phys.16, 475 (2020)

2020
[16]

Opening the Black Box: Low- Dimensional Dynamics in High-Dimensional Recurrent Neural Networks,

D. Sussillo and O. Barak, “Opening the Black Box: Low- Dimensional Dynamics in High-Dimensional Recurrent Neural Networks,” Neural Comput.25, 626 (2013)

2013
[17]

arXiv preprint arXiv:2312.10794 , year=

B. Geshkovski, C. Letrouit, Y. Polyanskiy, and P. Rigol- let, “A mathematical perspective on Transformers,” arXiv:2312.10794 (2023)

work page arXiv 2023
[18]

Broken detailed balance and entropy production in the human brain,

C. W. Lynnet al., “Broken detailed balance and entropy production in the human brain,” Proc. Natl. Acad. Sci. USA118, e2109889118 (2021)

2021
[19]

Decomposing ther- modynamic dissipation of linear Langevin systems via os- cillatory modes and its application to neural dynamics,

D. Sekizawa, S. Ito, and M. Oizumi, “Decomposing ther- modynamic dissipation of linear Langevin systems via os- cillatory modes and its application to neural dynamics,” Phys. Rev. X14, 041003 (2024)

2024
[20]

Learning Force Fields from Stochastic Trajectories,

A. Frishman and P. Ronceray, “Learning Force Fields from Stochastic Trajectories,” Phys. Rev. X10, 021009 (2020)

2020
[21]

Nonequilibrium Equality for Free Energy Differences,

C. Jarzynski, “Nonequilibrium Equality for Free Energy Differences,” Phys. Rev. Lett.78, 2690 (1997)

1997
[22]

Entropy production fluctuation theorem and the nonequilibrium work relation for free energy dif- ferences,

G. E. Crooks, “Entropy production fluctuation theorem and the nonequilibrium work relation for free energy dif- ferences,” Phys. Rev. E60, 2721 (1999)

1999
[23]

Broken detailed balance at mesoscopic scales in active biological systems,

C. Battleet al., “Broken detailed balance at mesoscopic scales in active biological systems,” Science352, 604 (2016)

2016
[24]

Broken detailed balance and non-equilibrium dy- namics in living systems: a review,

F. S. Gnesotto, F. Mura, J. Gladrow, and C. P. Broed- ersz, “Broken detailed balance and non-equilibrium dy- namics in living systems: a review,” Rep. Prog. Phys. 81, 066601 (2018)

2018
[25]

Thermodynamics of information,

J. M. R. Parrondo, J. M. Horowitz, and T. Sagawa, “Thermodynamics of information,” Nat. Phys.11, 131 (2015)

2015
[26]

Estimation of Non-Normalized Statistical Models by Score Matching,

A. Hyv¨ arinen, “Estimation of Non-Normalized Statistical Models by Score Matching,” J. Mach. Learn. Res.6, 695 (2005)

2005
[27]

Denoising Diffusion Probabilistic Models,

J. Ho, A. Jain, and P. Abbeel, “Denoising Diffusion Probabilistic Models,” inAdvances in Neural Informa- tion Processing Systems33(2020)

2020
[28]

Thermodynamic Uncer- tainty Relation for Biomolecular Processes,

A. C. Barato and U. Seifert, “Thermodynamic Uncer- tainty Relation for Biomolecular Processes,” Phys. Rev. Lett.114, 158101 (2015)

2015
[29]

Estimat- ing entropy production by machine learning of short-time fluctuating currents,

S. Otsubo, S. Ito, A. Dechant, and T. Sagawa, “Estimat- ing entropy production by machine learning of short-time fluctuating currents,” Phys. Rev. E101, 062106 (2020)

2020
[30]

Learning Entropy Production via Neural Networks,

D.-K. Kim, Y. Bae, S. Lee, and H. Jeong, “Learning Entropy Production via Neural Networks,” Phys. Rev. Lett.125, 140604 (2020)

2020

[1] [1]

World Models

D. Ha and J. Schmidhuber, “World Models,” arXiv:1803.10122 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[2] [2]

Mastering Diverse Domains through World Models

D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap, “Mastering Diverse Domains through World Models,” arXiv:2301.04104 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

The free-energy principle: a unified brain theory?

K. Friston, “The free-energy principle: a unified brain theory?” Nat. Rev. Neurosci.11, 127 (2010)

2010

[4] [4]

Attention Is All You Need,

A. Vaswaniet al., “Attention Is All You Need,” inAd- vances in Neural Information Processing Systems30 (2017)

2017

[5] [5]

Transport, Collective Motion, and Brownian Motion,

H. Mori, “Transport, Collective Motion, and Brownian Motion,” Prog. Theor. Phys.33, 423 (1965)

1965

[6] [6]

Zwanzig,Nonequilibrium Statistical Mechanics(Ox- ford Univ

R. Zwanzig,Nonequilibrium Statistical Mechanics(Ox- ford Univ. Press, 2001)

2001

[7] [7]

Optimal prediction and the Mori–Zwanzig representation of irre- versible processes,

A. J. Chorin, O. H. Hald, and R. Kupferman, “Optimal prediction and the Mori–Zwanzig representation of irre- versible processes,” Proc. Natl. Acad. Sci. USA97, 2968 (2000)

2000

[8] [8]

Fluctuations and Irre- versible Processes,

L. Onsager and S. Machlup, “Fluctuations and Irre- versible Processes,” Phys. Rev.91, 1505 (1953)

1953

[9] [9]

Stochastic thermodynamics, fluctuation the- orems and molecular machines,

U. Seifert, “Stochastic thermodynamics, fluctuation the- orems and molecular machines,” Rep. Prog. Phys.75, 126001 (2012)

2012

[10] [10]

Hopfield Networks is All You Need,

H. Ramsaueret al., “Hopfield Networks is All You Need,” inInt. Conf. on Learning Representations (ICLR) (2021)

2021

[11] [11]

Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review

S. Levine, “Reinforcement Learning and Control as Prob- abilistic Inference,” arXiv:1805.00909 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

Score-Based Generative Modeling through Stochastic Differential Equations,

Y. Songet al., “Score-Based Generative Modeling through Stochastic Differential Equations,” inInt. Conf. on Learning Representations (ICLR)(2021)

2021

[13] [13]

Flow Matching for Generative Modeling,

Y. Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow Matching for Generative Modeling,” inInt. Conf. on Learning Representations (ICLR)(2023)

2023

[14] [14]

Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager–Machlup Functional,

S. Rajaet al., “Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager–Machlup Functional,” inProc. 42nd Int. Conf. on Machine Learning (ICML), PMLR267, 50972 (2025)

2025

[15] [15]

Odd elasticity,

C. Scheibneret al., “Odd elasticity,” Nat. Phys.16, 475 (2020)

2020

[16] [16]

Opening the Black Box: Low- Dimensional Dynamics in High-Dimensional Recurrent Neural Networks,

D. Sussillo and O. Barak, “Opening the Black Box: Low- Dimensional Dynamics in High-Dimensional Recurrent Neural Networks,” Neural Comput.25, 626 (2013)

2013

[17] [17]

arXiv preprint arXiv:2312.10794 , year=

B. Geshkovski, C. Letrouit, Y. Polyanskiy, and P. Rigol- let, “A mathematical perspective on Transformers,” arXiv:2312.10794 (2023)

work page arXiv 2023

[18] [18]

Broken detailed balance and entropy production in the human brain,

C. W. Lynnet al., “Broken detailed balance and entropy production in the human brain,” Proc. Natl. Acad. Sci. USA118, e2109889118 (2021)

2021

[19] [19]

Decomposing ther- modynamic dissipation of linear Langevin systems via os- cillatory modes and its application to neural dynamics,

D. Sekizawa, S. Ito, and M. Oizumi, “Decomposing ther- modynamic dissipation of linear Langevin systems via os- cillatory modes and its application to neural dynamics,” Phys. Rev. X14, 041003 (2024)

2024

[20] [20]

Learning Force Fields from Stochastic Trajectories,

A. Frishman and P. Ronceray, “Learning Force Fields from Stochastic Trajectories,” Phys. Rev. X10, 021009 (2020)

2020

[21] [21]

Nonequilibrium Equality for Free Energy Differences,

C. Jarzynski, “Nonequilibrium Equality for Free Energy Differences,” Phys. Rev. Lett.78, 2690 (1997)

1997

[22] [22]

Entropy production fluctuation theorem and the nonequilibrium work relation for free energy dif- ferences,

G. E. Crooks, “Entropy production fluctuation theorem and the nonequilibrium work relation for free energy dif- ferences,” Phys. Rev. E60, 2721 (1999)

1999

[23] [23]

Broken detailed balance at mesoscopic scales in active biological systems,

C. Battleet al., “Broken detailed balance at mesoscopic scales in active biological systems,” Science352, 604 (2016)

2016

[24] [24]

Broken detailed balance and non-equilibrium dy- namics in living systems: a review,

F. S. Gnesotto, F. Mura, J. Gladrow, and C. P. Broed- ersz, “Broken detailed balance and non-equilibrium dy- namics in living systems: a review,” Rep. Prog. Phys. 81, 066601 (2018)

2018

[25] [25]

Thermodynamics of information,

J. M. R. Parrondo, J. M. Horowitz, and T. Sagawa, “Thermodynamics of information,” Nat. Phys.11, 131 (2015)

2015

[26] [26]

Estimation of Non-Normalized Statistical Models by Score Matching,

A. Hyv¨ arinen, “Estimation of Non-Normalized Statistical Models by Score Matching,” J. Mach. Learn. Res.6, 695 (2005)

2005

[27] [27]

Denoising Diffusion Probabilistic Models,

J. Ho, A. Jain, and P. Abbeel, “Denoising Diffusion Probabilistic Models,” inAdvances in Neural Informa- tion Processing Systems33(2020)

2020

[28] [28]

Thermodynamic Uncer- tainty Relation for Biomolecular Processes,

A. C. Barato and U. Seifert, “Thermodynamic Uncer- tainty Relation for Biomolecular Processes,” Phys. Rev. Lett.114, 158101 (2015)

2015

[29] [29]

Estimat- ing entropy production by machine learning of short-time fluctuating currents,

S. Otsubo, S. Ito, A. Dechant, and T. Sagawa, “Estimat- ing entropy production by machine learning of short-time fluctuating currents,” Phys. Rev. E101, 062106 (2020)

2020

[30] [30]

Learning Entropy Production via Neural Networks,

D.-K. Kim, Y. Bae, S. Lee, and H. Jeong, “Learning Entropy Production via Neural Networks,” Phys. Rev. Lett.125, 140604 (2020)

2020