Equilibrium World Models

Andreas Schaab; Simon Scheidegger

arxiv: 2606.23463 · v1 · pith:AHYGK2ZAnew · submitted 2026-06-22 · 💰 econ.GN · q-fin.EC

Equilibrium World Models

Simon Scheidegger , Andreas Schaab This is my paper

Pith reviewed 2026-06-26 05:52 UTC · model grok-4.3

classification 💰 econ.GN q-fin.EC

keywords equilibrium world modelsdeep learning solversdynamic stochastic modelsrare disastersrational expectationsneural network solversheterogeneous agentsbinding constraints

0 comments

The pith

Equilibrium World Models enforce exact rational-expectations conditions on ordinary, rare, stressed, and counterfactual states using a certified learned surrogate for continuations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Equilibrium World Models to globally solve dynamic stochastic models featuring rare disasters, binding constraints, and counterfactual states. Standard neural-network solvers impose equilibrium conditions only on states generated by their own simulated policy, which can yield self-confirming solutions accurate on the path but untested off it. EWMs instead generate a broader distribution of states and enforce the model's exact equilibrium conditions there, carrying continuations via a learned surrogate while certifying the policy strictly against the true conditions. The approach supplies an error decomposition, an off-path residual bound, and a convergence result that connects self-confirming solutions to rational-expectations equilibria. A reader would care because it promises reliable global solutions without repeated expensive expectation evaluations at each step.

Core claim

Equilibrium World Models enforce the model's exact equilibrium conditions on a broader, model-generated distribution of ordinary, rare, stressed, and counterfactual states. They carry the continuation with a learned surrogate, but certify the resulting policy strictly against the true equilibrium conditions. We provide an error decomposition, an off-path residual bound, and a convergence result linking self-confirming solutions to rational-expectations equilibria.

What carries the argument

Enforcement of exact equilibrium conditions on a broad model-generated state distribution, with a learned surrogate for continuation values that is certified against the true conditions.

If this is right

In a rare-disaster Brock-Mirman laboratory, coverage reduces disaster-region residuals by an order of magnitude.
In a high-dimensional international real-business-cycle model, EWMs converge from nearly all random starts while classical solvers fail from all.
When actions move transition measures, action-conditioned continuations recover the relevant policy margin.
In a heterogeneous-agent economy with aggregate risk, EWMs compress the numerical representation of the wealth distribution by at least 25x while imposing exact full-distribution conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The certification of surrogates against true conditions on expanded distributions could be adapted to verify approximate solutions in other classes of dynamic models with uncertainty.
Lower frequency of continuation evaluations may support faster evaluation of policy counterfactuals in large-scale economies.
The convergence result from self-confirming to full rational-expectations solutions suggests an iterative refinement procedure that starts from classical neural outputs.

Load-bearing premise

The learned surrogate for continuation values combined with certification against true equilibrium conditions on the broader state distribution produces policies that satisfy the model's rational-expectations equilibrium without material approximation error from the surrogate.

What would settle it

Simulating an EWM-certified policy on states outside the certified distribution and observing equilibrium residuals that exceed the stated off-path bound would falsify the claim of reliable global solutions without material surrogate error.

Figures

Figures reproduced from arXiv: 2606.23463 by Andreas Schaab, Simon Scheidegger.

**Figure 2.** Figure 2: Per-iteration structure of an unsupervised residual solver (left) versus EWM (right). The [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: The 𝜅-homotopy as a bridge from a self-confirming equilibrium to rational expectations on the coverage axis. Each stage enlarges the imagined support 𝜇𝜅, from the ergodic path to a neighborhood, the rare regime, and post-shock cross-sections, imposing the same exact residual on strictly more of the reachable set. At every finite 𝜅 the fixed point is a coverage-confirmed fixed point on 𝜇𝜅, self-confirming i… view at source ↗

**Figure 4.** Figure 4: How the coverage measure 𝜇𝜅 of (7) is built in Brock–Mirman, on the state (𝑘, 𝑧) of capital and productivity, and how the continuation is amortized on it. Building 𝜇𝜅 (the three-step coverage sampling of this section): (1) ergodic 𝜇𝜋𝜃 , the policy’s own path, obtained by simulating the exact transition Γ forward (blue), the set DEQN trains on; (2) stress, seeds drawn off the ergodic set, a low-productivity… view at source ↗

**Figure 5.** Figure 5: Brock–Mirman with a rare disaster: held-out exact disaster residuals by arm. Panel (a) [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗

**Figure 6.** Figure 6: Seed-basin certification at 𝑁=2 and 𝑁=4 (ten seeds per arm). Each point is one seed: the on-path exact residual against the disaster-region exact residual, with the 45◦ line shown and filled markers for seeds that pass verified stationarity. The pathwise baseline sits an order of magnitude above the diagonal and never verifies; the coverage and surrogate arms collapse onto the diagonal and verify in eight … view at source ↗

**Figure 7.** Figure 7: What is approximated, and where the encoder enters. The Bewley solve approximates [PITH_FULL_IMAGE:figures/full_fig_p043_7.png] view at source ↗

**Figure 8.** Figure 8: The learned embedding keeps the decision-relevant cross-section. From the trained [PITH_FULL_IMAGE:figures/full_fig_p046_8.png] view at source ↗

**Figure 9.** Figure 9: Network architecture and training, Brock–Mirman setting (Table [PITH_FULL_IMAGE:figures/full_fig_p060_9.png] view at source ↗

**Figure 10.** Figure 10: Endogenous protection. Left: converged normal-regime protection by arm, with the [PITH_FULL_IMAGE:figures/full_fig_p064_10.png] view at source ↗

**Figure 11.** Figure 11: Normal-times price of the one-period disaster Arrow claim (implied risk-neutral disaster [PITH_FULL_IMAGE:figures/full_fig_p068_11.png] view at source ↗

**Figure 12.** Figure 12: The warm-started surrogate-capacity homotopy on the international real business cycle [PITH_FULL_IMAGE:figures/full_fig_p069_12.png] view at source ↗

**Figure 13.** Figure 13: The world model’s encoder, drawn for the heterogeneous-agent economy. The state [PITH_FULL_IMAGE:figures/full_fig_p072_13.png] view at source ↗

read the original abstract

We introduce \emph{Equilibrium World Models} (EWMs), a deep-learning method for globally solving dynamic stochastic models that feature rare disasters, binding constraints, and counterfactual states. Standard unsupervised neural-network-based solvers impose equilibrium conditions only on states generated by their own simulated policy. Their solutions can therefore be self-confirming: accurate on the simulated path, but untested off it, sensitive to initialization, and costly when expectations must be recomputed at each step. EWMs change the computational representation, not the economics. They enforce the model's exact equilibrium conditions on a broader, model-generated distribution of ordinary, rare, stressed, and counterfactual states. They carry the continuation with a learned surrogate, but certify the resulting policy strictly against the true equilibrium conditions. We provide an error decomposition, an off-path residual bound, and a convergence result linking self-confirming solutions to rational-expectations equilibria. We demonstrate EWMs through a sequence of test cases that isolate the main pathologies of classical deep-learning solvers and then scale them to richer economies. In a rare-disaster Brock--Mirman laboratory, coverage reduces disaster-region residuals by an order of magnitude. In a high-dimensional international real-business-cycle model, classical deep-learning solvers fail from all random starts, whereas EWMs converge from nearly all and evaluate continuations up to two orders of magnitude less often. When actions move transition measures, EWMs use action-conditioned continuations to recover the relevant policy margin. In a heterogeneous-agent economy with aggregate risk, EWMs compress the numerical representation of the wealth distribution by at least 25x while imposing exact full-distribution rational-expectations conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EWMs fix self-confirming solutions in neural solvers by enforcing exact equilibrium on a wider model-generated state distribution and certifying the policy against true conditions.

read the letter

The main advance here is the change in computational representation: instead of checking equilibrium only on paths the solver itself generates, EWMs build a broader distribution that includes ordinary, rare, stressed, and counterfactual states, enforce the model's exact conditions there, and certify the final policy against those true conditions even while using a learned surrogate for continuations. They supply an error decomposition, off-path residual bound, and convergence result that links self-confirming solutions to rational-expectations equilibria.

The paper does this cleanly. The test cases isolate the known weaknesses of classical deep-learning solvers—initialization sensitivity, off-path failures, and high evaluation costs—and show EWMs cutting disaster-region residuals by an order of magnitude, converging from nearly all random starts in a high-dimensional international RBC model, and compressing the wealth distribution representation by at least 25x in a heterogeneous-agent economy while still imposing full-distribution rational-expectations conditions. These are practical gains.

The soft spots are modest but worth noting. The surrogate for continuation values is central, and while the certification step is meant to protect against approximation error, it is not obvious how robust the bounds remain when the broader distribution must be generated in high dimensions or when actions shift transition measures. More detail on distribution construction and sensitivity to surrogate quality would strengthen the claims.

This is for computational macroeconomists who already use or are considering neural solvers for models with tail risks, constraints, or heterogeneity. A reader working on those methods will get concrete value from the pathology-isolating experiments and the stated guarantees.

It deserves peer review. The core idea targets a real limitation with explicit theoretical backing and measurable improvements.

Referee Report

0 major / 2 minor

Summary. The paper introduces Equilibrium World Models (EWMs), a deep-learning method for globally solving dynamic stochastic models featuring rare disasters, binding constraints, and counterfactual states. Unlike standard neural-network solvers that impose equilibrium conditions only on states generated by their own policy (risking self-confirming solutions), EWMs enforce the model's exact equilibrium conditions on a broader model-generated distribution of ordinary, rare, stressed, and counterfactual states. They use a learned surrogate for continuation values but certify the resulting policy strictly against true equilibrium conditions, supported by an error decomposition, an off-path residual bound, and a convergence result linking self-confirming solutions to rational-expectations equilibria. Empirical demonstrations include a rare-disaster Brock-Mirman model (order-of-magnitude residual reduction in disaster regions), a high-dimensional international RBC model (improved convergence and reduced continuation evaluations), and a heterogeneous-agent economy with aggregate risk (25x compression of wealth distribution representation while imposing exact full-distribution conditions).

Significance. If the stated guarantees and empirical results hold, EWMs would address a central limitation of unsupervised neural solvers for DSGE models by reducing sensitivity to initialization and off-path errors, enabling more reliable solutions in settings with rare events and high dimensionality. The explicit error decomposition, residual bound, and convergence result are notable strengths, as is the reproducible demonstration across isolated test cases and scaled applications. This could meaningfully advance computational methods in macroeconomics and related fields.

minor comments (2)

The abstract refers to 'a sequence of test cases' and specific models (Brock-Mirman, international RBC, heterogeneous-agent); the main text should include explicit section references or table numbers for each demonstration to allow readers to locate the corresponding error metrics and convergence statistics.
Notation for the surrogate continuation and the certification step should be introduced with a clear equation or definition early in the methods section to distinguish the learned component from the exact equilibrium conditions being enforced.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed and positive summary of our work on Equilibrium World Models, as well as the recommendation for minor revision. No specific major comments were provided in the report, so we have no points to address point-by-point at this stage. We will make minor revisions to enhance clarity and presentation as appropriate.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central approach enforces the model's exact equilibrium conditions on a broader, model-generated distribution of states (ordinary, rare, stressed, and counterfactual) while using a learned surrogate only for carrying the continuation; the final policy is certified strictly against the true equilibrium conditions via an explicit error decomposition, off-path residual bound, and convergence result that connects self-confirming solutions to rational-expectations equilibria. This structure is self-contained against external model conditions rather than reducing any load-bearing claim to a fitted parameter, self-definition, or self-citation chain. No instances of the enumerated circularity patterns appear in the provided description or abstract.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities beyond the method name itself; the central contribution is algorithmic rather than resting on new economic assumptions.

invented entities (1)

Equilibrium World Models no independent evidence
purpose: Deep-learning solver that enforces equilibrium conditions on broad model-generated state distributions
Newly proposed method whose properties are asserted in the abstract.

pith-pipeline@v0.9.1-grok · 5814 in / 1224 out tokens · 26667 ms · 2026-06-26T05:52:35.504062+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 1 canonical work pages

[1]

Achdou, Y., Han, J., Lasry, J.-M., Lions, P.-L., and Moll, B. (2022). Income and wealth distribution in macroeconomics: A continuous-time approach.The Review of Economic Studies, 89(1):45–86

2022
[2]

Adam, K., Marcet, A., and Nicolini, J. P. (2016). Stock market volatility and learning.Journal of Finance, 71(1):33–82

2016
[3]

Aiyagari, R. (1994). Uninsured idiosyncratic risk and aggregate saving.The Quarterly Journal of Economics, 109(3):659–684

1994
[4]

Aliprantis, C. D. and Border, K. C. (2006).Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer, 3rd edition

2006
[5]

Azinovic, M., Gaegauf, L., and Scheidegger, S. (2022). DEEP EQUILIBRIUM NETS.International Economic Review, 63(4):1471–1525. Azinovic-Yang,M.andŽemlička,J.(2024). Intergenerationalconsequencesofraredisasters.Avail- able at SSRN 4386477. Azinovic-Yang,M.andŽemlička,J.(2025). Deeplearninginthesequencespace. arXiv:2509.13623

arXiv 2022
[6]

and LeCun, Y

Balestriero, R. and LeCun, Y. (2025). SIGReg: Sketched isotropic gaussian regularization. arXiv:2511.08544

Pith/arXiv arXiv 2025
[7]

Bauschke, H. H. and Combettes, P. L. (2011).Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer. Bellman,R.(1961).AdaptiveControlProcesses: AGuidedTour. ’RandCorporation.Researchstudies. Princeton University Press. Bewley,T.(1986). Stationarymonetaryequilibriumwithacontinuumofindependentlyfluctuating consumers.Contributions to Mathema...

2011
[8]

(1999).Convergence of Probability Measures

Billingsley, P. (1999).Convergence of Probability Measures. Wiley, 2nd edition

1999
[9]

Branch, W. A. and Evans, G. W. (2006). Intrinsic heterogeneity in expectation formation.Journal of Economic Theory, 127(1):264–295

2006
[10]

Bray, M. M. (1982). Learning, estimation, and the stability of rational expectations.Journal of Economic Theory, 26(2):318–339

1982
[11]

and Scheidegger, S

Brumm, J. and Scheidegger, S. (2017). Using adaptive sparse grids to solve high-dimensional dynamic models.Econometrica, 85(5):1575–1612

2017
[12]

(2019).The Master Equation and the Convergence Problem in Mean Field Games

Cardaliaguet, P., Delarue, F., Lasry, J.-M., and Lions, P.-L. (2019).The Master Equation and the Convergence Problem in Mean Field Games. Annals of Mathematics Studies. Princeton University Press

2019
[13]

M., Covarrubias, M., and Nuno, G

Carvalho, V. M., Covarrubias, M., and Nuno, G. (2025). Planning against disasters in dynamic production networks. Technical report, Working Paper. Chen,H.,Didisheim,A.,andScheidegger,S.(2026). Deepsurrogatesforfinance: Withanapplica- tion to option pricing.Journal of Financial Economics, 177:104222. 73

2025
[14]

Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function.Mathematics of

1989
[15]

Den Haan, W

Control, Signals and Systems, 2(4):303–314. Den Haan, W. J. (2010). Comparison of solutions to the incomplete markets model with aggregate uncertainty.Journal of Economic Dynamics and Control, 34(1):4–27. Den Haan, W. J. and Marcet, A. (1990). Solving the stochastic growth model by parameterizing expectations.Journal of Business and Economic Statistics, 8...

2010
[16]

Duarte, V., Duarte, D., and Silva, D. (2024). Machine learning for continuous-time finance.Review of Financial Studies, 37(11):3217–3271

2024
[17]

and McNelis, P

Duffy, J. and McNelis, P. D. (2001). Approximating and simulating the stochastic growth model: Parameterized expectations, neural networks, and the genetic algorithm.Journal of Economic Dynamics and Control, 25(9):1273–1303

2001
[18]

and Pouzo, D

Esponda, I. and Pouzo, D. (2016). Berk–nash equilibrium: A framework for modeling agents with misspecified models.Econometrica, 84(3):1093–1130. Eusepi,S.andPreston,B.(2011). Expectations,learning,andbusinesscyclefluctuations.American Economic Review, 101(6):2844–2872

2016
[19]

Evans, G. W. and Honkapohja, S. (2001).Learning and Expectations in Macroeconomics. Princeton University Press. Fernández-Villaverde, J., Hurtado, S., and Nuño, G. (2023). Financial frictions and the wealth distribution.Econometrica, 91(3):869–901. Fernández-Villaverde, J., Nuño, G., and Perla, J. (2024). Taming the curse of dimensionality: Quantitativeec...

2001
[20]

Fischer, A. (1992). A special Newton-type optimization method.Optimization, 24(3–4):269–284. Folini,D.,Friedl,A.,Kübler,F.,andScheidegger,S.(2024). TheClimateinClimateEconomics.The Review of Economic Studies, forthcoming

1992
[21]

Friedl, A., Kübler, F., Scheidegger, S., and Usui, T. (2023). Deep uncertainty quantification: With an application to integrated assessment models. Working paper, University of Lausanne

2023
[22]

and Levine, D

Fudenberg, D. and Levine, D. K. (1993). Self-confirming equilibrium.Econometrica, 61(3):523–545

1993
[23]

Gopalakrishna, G. (2024). ALIENs and continuous time economies.Available at SSRN

2024
[24]

Gu, Z., Lauriere, M., Merkel, S., and Payne, J. (2024). Global solutions to master equations for continuoustimeheterogeneousagentmacroeconomicmodels. arXivpreprintarXiv:2406.13726

arXiv 2024
[25]

and Schmidhuber, J

Ha, D. and Schmidhuber, J. (2018). World models. arXiv:1803.10122

Pith/arXiv arXiv 2018
[26]

Hafner, D., Lillicrap, T., Ba, J., and Norouzi, M. (2020). Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representations

2020
[27]

Hafner, D., Pasukonis, J., Ba, J., and Lillicrap, T. (2023). Mastering diverse domains through world models. arXiv:2301.04104. 74

Pith/arXiv arXiv 2023
[28]

Han, J., Yang, Y., and E, W. (2024). DeepHAM: A global solution method for heterogeneous agent models with aggregate shocks.Quantitative Economics. Forthcoming; preprint arXiv:2112.14377 (first version December 2021). Hornik,K.,Stinchcombe,M.,andWhite,H.(1989). Multilayerfeedforwardnetworksareuniversal approximators.Neural Networks, 2(5):359–366

arXiv 2024
[29]

Huang, H., Gao, T., Gui, Y., Guo, J., and Zhang, P. (2022). Stock trading optimization through model-basedreinforcementlearningwithresistancesupportrelativestrength. arXiv:2205.15056

arXiv 2022
[30]

E., Fernández-Villaverde, J., Perla, J., and Sood, A

Kahou, M. E., Fernández-Villaverde, J., Perla, J., and Sood, A. (2021). Exploiting symmetry in high-dimensional dynamic programming.NBER Working Paper, (28981)

2021
[31]

(2022).Estimating nonlinear heterogeneous agents models with neural networks

Kase, H., Melosi, L., and Rottner, M. (2022).Estimating nonlinear heterogeneous agents models with neural networks. Centre for Economic Policy Research. Kingma,D.P.andBa,J.(2015). Adam: Amethodforstochasticoptimization.Proceedingsofthe3rd International Conference on Learning Representations (ICLR)

2022
[32]

H., and Potter, S

Koop, G., Pesaran, M. H., and Potter, S. M. (1996). Impulse response analysis in nonlinear multi- variate models.Journal of Econometrics, 74(1):119–147

1996
[33]

and Smith, Jr, A

Krusell, P. and Smith, Jr, A. A. (1998). Income and wealth heterogeneity in the macroeconomy. Journal of Political Economy, 106(5):867–896. Kubler,F.andScheidegger,S.(2023). Uniformlyself-justifiedequilibria.JournalofEconomicTheory, 212:105707. Kübler, F. and Scheidegger, S. (2025). Self-justified equilibria: Existence and computation.Journal of the Europ...

arXiv 1998
[34]

LeCun, Y. (2022). A path towards autonomous machine intelligence. OpenReview

2022
[35]

Li, J., Liu, Y., Liu, W., Fang, S., Wang, L., Xu, C., and Bian, J. (2025). MarS: a financial market simulation engine powered by generative foundation model. arXiv:2409.07486. Lillicrap,T.P.,Hunt,J.J.,Pritzel,A.,Heess,N.,Erez,T.,Tassa,Y.,Silver,D.,andWierstra,D.(2016). Continuous control with deep reinforcement learning. InInternational Conference on Lear...

arXiv 2025
[36]

Lucas, R. E. (1976). Econometric policy evaluation: A critique. In Brunner, K. and Meltzer, A. H., editors,The Phillips Curve and Labor Markets, volume 1 ofCarnegie-Rochester Conference Series on Public Policy, pages 19–46. North-Holland

1976
[37]

MacKay, D. J. C. (1992). Information-based objective functions for active data selection.Neural Computation, 4(4):590–604

1992
[38]

Maes, L., Le Lidec, Q., Scieur, D., LeCun, Y., and Balestriero, R. (2026). LeWorldModel: Stable end-to-end joint-embedding predictive architecture from pixels. arXiv:2603.19312. 75

Pith/arXiv arXiv 2026
[39]

Maliar, L., Maliar, S., and Winant, P. (2021). Deep learning for solving dynamic economic models. Journal of Monetary Economics, 122:76–101

2021
[40]

Marcet, A. (1988). Solution of nonlinear models by parameterizing expectations. Technical report, Carnegie Mellon University

1988
[41]

and Sargent, T

Marcet, A. and Sargent, T. J. (1989). Convergence of least-squares learning mechanisms in self- referential linear stochastic models.Journal of Economic Theory, 48(2):337–368. Moll,B.(2026). Heterogeneousagentmacroeconomics: Eightlessonsandachallenge.TheEconomic Journal, 136(676):1173–1205. Economic Journal Lecture, Royal Economic Society. Nuño, G., Renne...

1989
[42]

Deeplearningforsearchandmatchingmodels

Payne, J., Rebei, A., andYang, Y.(2025). Deeplearningforsearchandmatchingmodels. Technical Report 25-05, Swiss Finance Institute

2025
[43]

Polyak, B. T. and Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838–855

1992
[44]

Rasmussen, C. E. and Williams, C. K. I. (2005).Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press. Renner,P.andScheidegger,S.(2018). Machinelearningfordynamicincentiveproblems. Working paper. Available at SSRN: http://dx.doi.org/10.2139/ssrn.3282487

work page doi:10.2139/ssrn.3282487 2005
[45]

(1976).Principles of Mathematical Analysis

Rudin, W. (1976).Principles of Mathematical Analysis. McGraw-Hill, 3rd edition

1976
[46]

Sargent, T. J. (1993).Bounded Rationality in Macroeconomics. Oxford University Press

1993
[47]

Sargent, T. J. (1999).The Conquest of American Inflation. Princeton University Press

1999
[48]

Sargent, T. J. (2024). Macroeconomics after Lucas. Sequel to Lucas and Sargent (1978)

2024
[49]

Scheidegger, S. (2026). Deep learning for solving and estimating dynamic models in economics and finance. arXiv:2605.14493

Pith/arXiv arXiv 2026
[50]

and Bilionis, I

Scheidegger, S. and Bilionis, I. (2019). Machine learning for high-dimensional dynamic stochastic economies.Journal of Computational Science, 33:68–82

2019
[51]

Schmidhuber, J. (1990). Making the world differentiable: On using self-supervised fully recurrent neural networks for dynamic reinforcement learning and planning in non-stationary environ- ments.Technical Report FKI-126-90, Technische Universität München

1990
[52]

Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. InAdvances in Neural Information Processing Systems (NeurIPS 25)

2012
[53]

L., Lucas, R

Stokey, N. L., Lucas, R. E., and Prescott, E. C. (1989).Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge, MA. Valaitis,V.andVilla,A.T.(2024). Amachinelearningprojectionmethodformacro-financemodels. Quantitative Economics, 15(1):145–173. 76

1989
[54]

Yang, Y., Wang, C., Schaab, A., and Moll, B. (2026). Structural reinforcement learning for hetero- geneous agent macroeconomics. arXiv:2512.18892

arXiv 2026
[55]

Young, E. R. (2010). Solving the incomplete markets model with aggregate uncertainty using the krusell–smith algorithm and non-stochastic simulations.Journal of Economic Dynamics and Control, 34(1):36–41. 77

2010

[1] [1]

Achdou, Y., Han, J., Lasry, J.-M., Lions, P.-L., and Moll, B. (2022). Income and wealth distribution in macroeconomics: A continuous-time approach.The Review of Economic Studies, 89(1):45–86

2022

[2] [2]

Adam, K., Marcet, A., and Nicolini, J. P. (2016). Stock market volatility and learning.Journal of Finance, 71(1):33–82

2016

[3] [3]

Aiyagari, R. (1994). Uninsured idiosyncratic risk and aggregate saving.The Quarterly Journal of Economics, 109(3):659–684

1994

[4] [4]

Aliprantis, C. D. and Border, K. C. (2006).Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer, 3rd edition

2006

[5] [5]

Azinovic, M., Gaegauf, L., and Scheidegger, S. (2022). DEEP EQUILIBRIUM NETS.International Economic Review, 63(4):1471–1525. Azinovic-Yang,M.andŽemlička,J.(2024). Intergenerationalconsequencesofraredisasters.Avail- able at SSRN 4386477. Azinovic-Yang,M.andŽemlička,J.(2025). Deeplearninginthesequencespace. arXiv:2509.13623

arXiv 2022

[6] [6]

and LeCun, Y

Balestriero, R. and LeCun, Y. (2025). SIGReg: Sketched isotropic gaussian regularization. arXiv:2511.08544

Pith/arXiv arXiv 2025

[7] [7]

Bauschke, H. H. and Combettes, P. L. (2011).Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer. Bellman,R.(1961).AdaptiveControlProcesses: AGuidedTour. ’RandCorporation.Researchstudies. Princeton University Press. Bewley,T.(1986). Stationarymonetaryequilibriumwithacontinuumofindependentlyfluctuating consumers.Contributions to Mathema...

2011

[8] [8]

(1999).Convergence of Probability Measures

Billingsley, P. (1999).Convergence of Probability Measures. Wiley, 2nd edition

1999

[9] [9]

Branch, W. A. and Evans, G. W. (2006). Intrinsic heterogeneity in expectation formation.Journal of Economic Theory, 127(1):264–295

2006

[10] [10]

Bray, M. M. (1982). Learning, estimation, and the stability of rational expectations.Journal of Economic Theory, 26(2):318–339

1982

[11] [11]

and Scheidegger, S

Brumm, J. and Scheidegger, S. (2017). Using adaptive sparse grids to solve high-dimensional dynamic models.Econometrica, 85(5):1575–1612

2017

[12] [12]

(2019).The Master Equation and the Convergence Problem in Mean Field Games

Cardaliaguet, P., Delarue, F., Lasry, J.-M., and Lions, P.-L. (2019).The Master Equation and the Convergence Problem in Mean Field Games. Annals of Mathematics Studies. Princeton University Press

2019

[13] [13]

M., Covarrubias, M., and Nuno, G

Carvalho, V. M., Covarrubias, M., and Nuno, G. (2025). Planning against disasters in dynamic production networks. Technical report, Working Paper. Chen,H.,Didisheim,A.,andScheidegger,S.(2026). Deepsurrogatesforfinance: Withanapplica- tion to option pricing.Journal of Financial Economics, 177:104222. 73

2025

[14] [14]

Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function.Mathematics of

1989

[15] [15]

Den Haan, W

Control, Signals and Systems, 2(4):303–314. Den Haan, W. J. (2010). Comparison of solutions to the incomplete markets model with aggregate uncertainty.Journal of Economic Dynamics and Control, 34(1):4–27. Den Haan, W. J. and Marcet, A. (1990). Solving the stochastic growth model by parameterizing expectations.Journal of Business and Economic Statistics, 8...

2010

[16] [16]

Duarte, V., Duarte, D., and Silva, D. (2024). Machine learning for continuous-time finance.Review of Financial Studies, 37(11):3217–3271

2024

[17] [17]

and McNelis, P

Duffy, J. and McNelis, P. D. (2001). Approximating and simulating the stochastic growth model: Parameterized expectations, neural networks, and the genetic algorithm.Journal of Economic Dynamics and Control, 25(9):1273–1303

2001

[18] [18]

and Pouzo, D

Esponda, I. and Pouzo, D. (2016). Berk–nash equilibrium: A framework for modeling agents with misspecified models.Econometrica, 84(3):1093–1130. Eusepi,S.andPreston,B.(2011). Expectations,learning,andbusinesscyclefluctuations.American Economic Review, 101(6):2844–2872

2016

[19] [19]

Evans, G. W. and Honkapohja, S. (2001).Learning and Expectations in Macroeconomics. Princeton University Press. Fernández-Villaverde, J., Hurtado, S., and Nuño, G. (2023). Financial frictions and the wealth distribution.Econometrica, 91(3):869–901. Fernández-Villaverde, J., Nuño, G., and Perla, J. (2024). Taming the curse of dimensionality: Quantitativeec...

2001

[20] [20]

Fischer, A. (1992). A special Newton-type optimization method.Optimization, 24(3–4):269–284. Folini,D.,Friedl,A.,Kübler,F.,andScheidegger,S.(2024). TheClimateinClimateEconomics.The Review of Economic Studies, forthcoming

1992

[21] [21]

Friedl, A., Kübler, F., Scheidegger, S., and Usui, T. (2023). Deep uncertainty quantification: With an application to integrated assessment models. Working paper, University of Lausanne

2023

[22] [22]

and Levine, D

Fudenberg, D. and Levine, D. K. (1993). Self-confirming equilibrium.Econometrica, 61(3):523–545

1993

[23] [23]

Gopalakrishna, G. (2024). ALIENs and continuous time economies.Available at SSRN

2024

[24] [24]

Gu, Z., Lauriere, M., Merkel, S., and Payne, J. (2024). Global solutions to master equations for continuoustimeheterogeneousagentmacroeconomicmodels. arXivpreprintarXiv:2406.13726

arXiv 2024

[25] [25]

and Schmidhuber, J

Ha, D. and Schmidhuber, J. (2018). World models. arXiv:1803.10122

Pith/arXiv arXiv 2018

[26] [26]

Hafner, D., Lillicrap, T., Ba, J., and Norouzi, M. (2020). Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representations

2020

[27] [27]

Hafner, D., Pasukonis, J., Ba, J., and Lillicrap, T. (2023). Mastering diverse domains through world models. arXiv:2301.04104. 74

Pith/arXiv arXiv 2023

[28] [28]

Han, J., Yang, Y., and E, W. (2024). DeepHAM: A global solution method for heterogeneous agent models with aggregate shocks.Quantitative Economics. Forthcoming; preprint arXiv:2112.14377 (first version December 2021). Hornik,K.,Stinchcombe,M.,andWhite,H.(1989). Multilayerfeedforwardnetworksareuniversal approximators.Neural Networks, 2(5):359–366

arXiv 2024

[29] [29]

Huang, H., Gao, T., Gui, Y., Guo, J., and Zhang, P. (2022). Stock trading optimization through model-basedreinforcementlearningwithresistancesupportrelativestrength. arXiv:2205.15056

arXiv 2022

[30] [30]

E., Fernández-Villaverde, J., Perla, J., and Sood, A

Kahou, M. E., Fernández-Villaverde, J., Perla, J., and Sood, A. (2021). Exploiting symmetry in high-dimensional dynamic programming.NBER Working Paper, (28981)

2021

[31] [31]

(2022).Estimating nonlinear heterogeneous agents models with neural networks

Kase, H., Melosi, L., and Rottner, M. (2022).Estimating nonlinear heterogeneous agents models with neural networks. Centre for Economic Policy Research. Kingma,D.P.andBa,J.(2015). Adam: Amethodforstochasticoptimization.Proceedingsofthe3rd International Conference on Learning Representations (ICLR)

2022

[32] [32]

H., and Potter, S

Koop, G., Pesaran, M. H., and Potter, S. M. (1996). Impulse response analysis in nonlinear multi- variate models.Journal of Econometrics, 74(1):119–147

1996

[33] [33]

and Smith, Jr, A

Krusell, P. and Smith, Jr, A. A. (1998). Income and wealth heterogeneity in the macroeconomy. Journal of Political Economy, 106(5):867–896. Kubler,F.andScheidegger,S.(2023). Uniformlyself-justifiedequilibria.JournalofEconomicTheory, 212:105707. Kübler, F. and Scheidegger, S. (2025). Self-justified equilibria: Existence and computation.Journal of the Europ...

arXiv 1998

[34] [34]

LeCun, Y. (2022). A path towards autonomous machine intelligence. OpenReview

2022

[35] [35]

Li, J., Liu, Y., Liu, W., Fang, S., Wang, L., Xu, C., and Bian, J. (2025). MarS: a financial market simulation engine powered by generative foundation model. arXiv:2409.07486. Lillicrap,T.P.,Hunt,J.J.,Pritzel,A.,Heess,N.,Erez,T.,Tassa,Y.,Silver,D.,andWierstra,D.(2016). Continuous control with deep reinforcement learning. InInternational Conference on Lear...

arXiv 2025

[36] [36]

Lucas, R. E. (1976). Econometric policy evaluation: A critique. In Brunner, K. and Meltzer, A. H., editors,The Phillips Curve and Labor Markets, volume 1 ofCarnegie-Rochester Conference Series on Public Policy, pages 19–46. North-Holland

1976

[37] [37]

MacKay, D. J. C. (1992). Information-based objective functions for active data selection.Neural Computation, 4(4):590–604

1992

[38] [38]

Maes, L., Le Lidec, Q., Scieur, D., LeCun, Y., and Balestriero, R. (2026). LeWorldModel: Stable end-to-end joint-embedding predictive architecture from pixels. arXiv:2603.19312. 75

Pith/arXiv arXiv 2026

[39] [39]

Maliar, L., Maliar, S., and Winant, P. (2021). Deep learning for solving dynamic economic models. Journal of Monetary Economics, 122:76–101

2021

[40] [40]

Marcet, A. (1988). Solution of nonlinear models by parameterizing expectations. Technical report, Carnegie Mellon University

1988

[41] [41]

and Sargent, T

Marcet, A. and Sargent, T. J. (1989). Convergence of least-squares learning mechanisms in self- referential linear stochastic models.Journal of Economic Theory, 48(2):337–368. Moll,B.(2026). Heterogeneousagentmacroeconomics: Eightlessonsandachallenge.TheEconomic Journal, 136(676):1173–1205. Economic Journal Lecture, Royal Economic Society. Nuño, G., Renne...

1989

[42] [42]

Deeplearningforsearchandmatchingmodels

Payne, J., Rebei, A., andYang, Y.(2025). Deeplearningforsearchandmatchingmodels. Technical Report 25-05, Swiss Finance Institute

2025

[43] [43]

Polyak, B. T. and Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838–855

1992

[44] [44]

Rasmussen, C. E. and Williams, C. K. I. (2005).Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press. Renner,P.andScheidegger,S.(2018). Machinelearningfordynamicincentiveproblems. Working paper. Available at SSRN: http://dx.doi.org/10.2139/ssrn.3282487

work page doi:10.2139/ssrn.3282487 2005

[45] [45]

(1976).Principles of Mathematical Analysis

Rudin, W. (1976).Principles of Mathematical Analysis. McGraw-Hill, 3rd edition

1976

[46] [46]

Sargent, T. J. (1993).Bounded Rationality in Macroeconomics. Oxford University Press

1993

[47] [47]

Sargent, T. J. (1999).The Conquest of American Inflation. Princeton University Press

1999

[48] [48]

Sargent, T. J. (2024). Macroeconomics after Lucas. Sequel to Lucas and Sargent (1978)

2024

[49] [49]

Scheidegger, S. (2026). Deep learning for solving and estimating dynamic models in economics and finance. arXiv:2605.14493

Pith/arXiv arXiv 2026

[50] [50]

and Bilionis, I

Scheidegger, S. and Bilionis, I. (2019). Machine learning for high-dimensional dynamic stochastic economies.Journal of Computational Science, 33:68–82

2019

[51] [51]

Schmidhuber, J. (1990). Making the world differentiable: On using self-supervised fully recurrent neural networks for dynamic reinforcement learning and planning in non-stationary environ- ments.Technical Report FKI-126-90, Technische Universität München

1990

[52] [52]

Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. InAdvances in Neural Information Processing Systems (NeurIPS 25)

2012

[53] [53]

L., Lucas, R

Stokey, N. L., Lucas, R. E., and Prescott, E. C. (1989).Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge, MA. Valaitis,V.andVilla,A.T.(2024). Amachinelearningprojectionmethodformacro-financemodels. Quantitative Economics, 15(1):145–173. 76

1989

[54] [54]

Yang, Y., Wang, C., Schaab, A., and Moll, B. (2026). Structural reinforcement learning for hetero- geneous agent macroeconomics. arXiv:2512.18892

arXiv 2026

[55] [55]

Young, E. R. (2010). Solving the incomplete markets model with aggregate uncertainty using the krusell–smith algorithm and non-stochastic simulations.Journal of Economic Dynamics and Control, 34(1):36–41. 77

2010