Equilibrium World Models
Pith reviewed 2026-06-26 05:52 UTC · model grok-4.3
The pith
Equilibrium World Models enforce exact rational-expectations conditions on ordinary, rare, stressed, and counterfactual states using a certified learned surrogate for continuations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Equilibrium World Models enforce the model's exact equilibrium conditions on a broader, model-generated distribution of ordinary, rare, stressed, and counterfactual states. They carry the continuation with a learned surrogate, but certify the resulting policy strictly against the true equilibrium conditions. We provide an error decomposition, an off-path residual bound, and a convergence result linking self-confirming solutions to rational-expectations equilibria.
What carries the argument
Enforcement of exact equilibrium conditions on a broad model-generated state distribution, with a learned surrogate for continuation values that is certified against the true conditions.
If this is right
- In a rare-disaster Brock-Mirman laboratory, coverage reduces disaster-region residuals by an order of magnitude.
- In a high-dimensional international real-business-cycle model, EWMs converge from nearly all random starts while classical solvers fail from all.
- When actions move transition measures, action-conditioned continuations recover the relevant policy margin.
- In a heterogeneous-agent economy with aggregate risk, EWMs compress the numerical representation of the wealth distribution by at least 25x while imposing exact full-distribution conditions.
Where Pith is reading between the lines
- The certification of surrogates against true conditions on expanded distributions could be adapted to verify approximate solutions in other classes of dynamic models with uncertainty.
- Lower frequency of continuation evaluations may support faster evaluation of policy counterfactuals in large-scale economies.
- The convergence result from self-confirming to full rational-expectations solutions suggests an iterative refinement procedure that starts from classical neural outputs.
Load-bearing premise
The learned surrogate for continuation values combined with certification against true equilibrium conditions on the broader state distribution produces policies that satisfy the model's rational-expectations equilibrium without material approximation error from the surrogate.
What would settle it
Simulating an EWM-certified policy on states outside the certified distribution and observing equilibrium residuals that exceed the stated off-path bound would falsify the claim of reliable global solutions without material surrogate error.
Figures
read the original abstract
We introduce \emph{Equilibrium World Models} (EWMs), a deep-learning method for globally solving dynamic stochastic models that feature rare disasters, binding constraints, and counterfactual states. Standard unsupervised neural-network-based solvers impose equilibrium conditions only on states generated by their own simulated policy. Their solutions can therefore be self-confirming: accurate on the simulated path, but untested off it, sensitive to initialization, and costly when expectations must be recomputed at each step. EWMs change the computational representation, not the economics. They enforce the model's exact equilibrium conditions on a broader, model-generated distribution of ordinary, rare, stressed, and counterfactual states. They carry the continuation with a learned surrogate, but certify the resulting policy strictly against the true equilibrium conditions. We provide an error decomposition, an off-path residual bound, and a convergence result linking self-confirming solutions to rational-expectations equilibria. We demonstrate EWMs through a sequence of test cases that isolate the main pathologies of classical deep-learning solvers and then scale them to richer economies. In a rare-disaster Brock--Mirman laboratory, coverage reduces disaster-region residuals by an order of magnitude. In a high-dimensional international real-business-cycle model, classical deep-learning solvers fail from all random starts, whereas EWMs converge from nearly all and evaluate continuations up to two orders of magnitude less often. When actions move transition measures, EWMs use action-conditioned continuations to recover the relevant policy margin. In a heterogeneous-agent economy with aggregate risk, EWMs compress the numerical representation of the wealth distribution by at least 25x while imposing exact full-distribution rational-expectations conditions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Equilibrium World Models (EWMs), a deep-learning method for globally solving dynamic stochastic models featuring rare disasters, binding constraints, and counterfactual states. Unlike standard neural-network solvers that impose equilibrium conditions only on states generated by their own policy (risking self-confirming solutions), EWMs enforce the model's exact equilibrium conditions on a broader model-generated distribution of ordinary, rare, stressed, and counterfactual states. They use a learned surrogate for continuation values but certify the resulting policy strictly against true equilibrium conditions, supported by an error decomposition, an off-path residual bound, and a convergence result linking self-confirming solutions to rational-expectations equilibria. Empirical demonstrations include a rare-disaster Brock-Mirman model (order-of-magnitude residual reduction in disaster regions), a high-dimensional international RBC model (improved convergence and reduced continuation evaluations), and a heterogeneous-agent economy with aggregate risk (25x compression of wealth distribution representation while imposing exact full-distribution conditions).
Significance. If the stated guarantees and empirical results hold, EWMs would address a central limitation of unsupervised neural solvers for DSGE models by reducing sensitivity to initialization and off-path errors, enabling more reliable solutions in settings with rare events and high dimensionality. The explicit error decomposition, residual bound, and convergence result are notable strengths, as is the reproducible demonstration across isolated test cases and scaled applications. This could meaningfully advance computational methods in macroeconomics and related fields.
minor comments (2)
- The abstract refers to 'a sequence of test cases' and specific models (Brock-Mirman, international RBC, heterogeneous-agent); the main text should include explicit section references or table numbers for each demonstration to allow readers to locate the corresponding error metrics and convergence statistics.
- Notation for the surrogate continuation and the certification step should be introduced with a clear equation or definition early in the methods section to distinguish the learned component from the exact equilibrium conditions being enforced.
Simulated Author's Rebuttal
We thank the referee for the detailed and positive summary of our work on Equilibrium World Models, as well as the recommendation for minor revision. No specific major comments were provided in the report, so we have no points to address point-by-point at this stage. We will make minor revisions to enhance clarity and presentation as appropriate.
Circularity Check
No significant circularity identified
full rationale
The paper's central approach enforces the model's exact equilibrium conditions on a broader, model-generated distribution of states (ordinary, rare, stressed, and counterfactual) while using a learned surrogate only for carrying the continuation; the final policy is certified strictly against the true equilibrium conditions via an explicit error decomposition, off-path residual bound, and convergence result that connects self-confirming solutions to rational-expectations equilibria. This structure is self-contained against external model conditions rather than reducing any load-bearing claim to a fitted parameter, self-definition, or self-citation chain. No instances of the enumerated circularity patterns appear in the provided description or abstract.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Equilibrium World Models
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Achdou, Y., Han, J., Lasry, J.-M., Lions, P.-L., and Moll, B. (2022). Income and wealth distribution in macroeconomics: A continuous-time approach.The Review of Economic Studies, 89(1):45–86
2022
-
[2]
Adam, K., Marcet, A., and Nicolini, J. P. (2016). Stock market volatility and learning.Journal of Finance, 71(1):33–82
2016
-
[3]
Aiyagari, R. (1994). Uninsured idiosyncratic risk and aggregate saving.The Quarterly Journal of Economics, 109(3):659–684
1994
-
[4]
Aliprantis, C. D. and Border, K. C. (2006).Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer, 3rd edition
2006
-
[5]
Azinovic, M., Gaegauf, L., and Scheidegger, S. (2022). DEEP EQUILIBRIUM NETS.International Economic Review, 63(4):1471–1525. Azinovic-Yang,M.andŽemlička,J.(2024). Intergenerationalconsequencesofraredisasters.Avail- able at SSRN 4386477. Azinovic-Yang,M.andŽemlička,J.(2025). Deeplearninginthesequencespace. arXiv:2509.13623
arXiv 2022
-
[6]
Balestriero, R. and LeCun, Y. (2025). SIGReg: Sketched isotropic gaussian regularization. arXiv:2511.08544
Pith/arXiv arXiv 2025
-
[7]
Bauschke, H. H. and Combettes, P. L. (2011).Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer. Bellman,R.(1961).AdaptiveControlProcesses: AGuidedTour. ’RandCorporation.Researchstudies. Princeton University Press. Bewley,T.(1986). Stationarymonetaryequilibriumwithacontinuumofindependentlyfluctuating consumers.Contributions to Mathema...
2011
-
[8]
(1999).Convergence of Probability Measures
Billingsley, P. (1999).Convergence of Probability Measures. Wiley, 2nd edition
1999
-
[9]
Branch, W. A. and Evans, G. W. (2006). Intrinsic heterogeneity in expectation formation.Journal of Economic Theory, 127(1):264–295
2006
-
[10]
Bray, M. M. (1982). Learning, estimation, and the stability of rational expectations.Journal of Economic Theory, 26(2):318–339
1982
-
[11]
and Scheidegger, S
Brumm, J. and Scheidegger, S. (2017). Using adaptive sparse grids to solve high-dimensional dynamic models.Econometrica, 85(5):1575–1612
2017
-
[12]
(2019).The Master Equation and the Convergence Problem in Mean Field Games
Cardaliaguet, P., Delarue, F., Lasry, J.-M., and Lions, P.-L. (2019).The Master Equation and the Convergence Problem in Mean Field Games. Annals of Mathematics Studies. Princeton University Press
2019
-
[13]
M., Covarrubias, M., and Nuno, G
Carvalho, V. M., Covarrubias, M., and Nuno, G. (2025). Planning against disasters in dynamic production networks. Technical report, Working Paper. Chen,H.,Didisheim,A.,andScheidegger,S.(2026). Deepsurrogatesforfinance: Withanapplica- tion to option pricing.Journal of Financial Economics, 177:104222. 73
2025
-
[14]
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function.Mathematics of
1989
-
[15]
Den Haan, W
Control, Signals and Systems, 2(4):303–314. Den Haan, W. J. (2010). Comparison of solutions to the incomplete markets model with aggregate uncertainty.Journal of Economic Dynamics and Control, 34(1):4–27. Den Haan, W. J. and Marcet, A. (1990). Solving the stochastic growth model by parameterizing expectations.Journal of Business and Economic Statistics, 8...
2010
-
[16]
Duarte, V., Duarte, D., and Silva, D. (2024). Machine learning for continuous-time finance.Review of Financial Studies, 37(11):3217–3271
2024
-
[17]
and McNelis, P
Duffy, J. and McNelis, P. D. (2001). Approximating and simulating the stochastic growth model: Parameterized expectations, neural networks, and the genetic algorithm.Journal of Economic Dynamics and Control, 25(9):1273–1303
2001
-
[18]
and Pouzo, D
Esponda, I. and Pouzo, D. (2016). Berk–nash equilibrium: A framework for modeling agents with misspecified models.Econometrica, 84(3):1093–1130. Eusepi,S.andPreston,B.(2011). Expectations,learning,andbusinesscyclefluctuations.American Economic Review, 101(6):2844–2872
2016
-
[19]
Evans, G. W. and Honkapohja, S. (2001).Learning and Expectations in Macroeconomics. Princeton University Press. Fernández-Villaverde, J., Hurtado, S., and Nuño, G. (2023). Financial frictions and the wealth distribution.Econometrica, 91(3):869–901. Fernández-Villaverde, J., Nuño, G., and Perla, J. (2024). Taming the curse of dimensionality: Quantitativeec...
2001
-
[20]
Fischer, A. (1992). A special Newton-type optimization method.Optimization, 24(3–4):269–284. Folini,D.,Friedl,A.,Kübler,F.,andScheidegger,S.(2024). TheClimateinClimateEconomics.The Review of Economic Studies, forthcoming
1992
-
[21]
Friedl, A., Kübler, F., Scheidegger, S., and Usui, T. (2023). Deep uncertainty quantification: With an application to integrated assessment models. Working paper, University of Lausanne
2023
-
[22]
and Levine, D
Fudenberg, D. and Levine, D. K. (1993). Self-confirming equilibrium.Econometrica, 61(3):523–545
1993
-
[23]
Gopalakrishna, G. (2024). ALIENs and continuous time economies.Available at SSRN
2024
-
[24]
Gu, Z., Lauriere, M., Merkel, S., and Payne, J. (2024). Global solutions to master equations for continuoustimeheterogeneousagentmacroeconomicmodels. arXivpreprintarXiv:2406.13726
arXiv 2024
-
[25]
Ha, D. and Schmidhuber, J. (2018). World models. arXiv:1803.10122
Pith/arXiv arXiv 2018
-
[26]
Hafner, D., Lillicrap, T., Ba, J., and Norouzi, M. (2020). Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representations
2020
-
[27]
Hafner, D., Pasukonis, J., Ba, J., and Lillicrap, T. (2023). Mastering diverse domains through world models. arXiv:2301.04104. 74
Pith/arXiv arXiv 2023
-
[28]
Han, J., Yang, Y., and E, W. (2024). DeepHAM: A global solution method for heterogeneous agent models with aggregate shocks.Quantitative Economics. Forthcoming; preprint arXiv:2112.14377 (first version December 2021). Hornik,K.,Stinchcombe,M.,andWhite,H.(1989). Multilayerfeedforwardnetworksareuniversal approximators.Neural Networks, 2(5):359–366
arXiv 2024
-
[29]
Huang, H., Gao, T., Gui, Y., Guo, J., and Zhang, P. (2022). Stock trading optimization through model-basedreinforcementlearningwithresistancesupportrelativestrength. arXiv:2205.15056
arXiv 2022
-
[30]
E., Fernández-Villaverde, J., Perla, J., and Sood, A
Kahou, M. E., Fernández-Villaverde, J., Perla, J., and Sood, A. (2021). Exploiting symmetry in high-dimensional dynamic programming.NBER Working Paper, (28981)
2021
-
[31]
(2022).Estimating nonlinear heterogeneous agents models with neural networks
Kase, H., Melosi, L., and Rottner, M. (2022).Estimating nonlinear heterogeneous agents models with neural networks. Centre for Economic Policy Research. Kingma,D.P.andBa,J.(2015). Adam: Amethodforstochasticoptimization.Proceedingsofthe3rd International Conference on Learning Representations (ICLR)
2022
-
[32]
H., and Potter, S
Koop, G., Pesaran, M. H., and Potter, S. M. (1996). Impulse response analysis in nonlinear multi- variate models.Journal of Econometrics, 74(1):119–147
1996
-
[33]
Krusell, P. and Smith, Jr, A. A. (1998). Income and wealth heterogeneity in the macroeconomy. Journal of Political Economy, 106(5):867–896. Kubler,F.andScheidegger,S.(2023). Uniformlyself-justifiedequilibria.JournalofEconomicTheory, 212:105707. Kübler, F. and Scheidegger, S. (2025). Self-justified equilibria: Existence and computation.Journal of the Europ...
arXiv 1998
-
[34]
LeCun, Y. (2022). A path towards autonomous machine intelligence. OpenReview
2022
-
[35]
Li, J., Liu, Y., Liu, W., Fang, S., Wang, L., Xu, C., and Bian, J. (2025). MarS: a financial market simulation engine powered by generative foundation model. arXiv:2409.07486. Lillicrap,T.P.,Hunt,J.J.,Pritzel,A.,Heess,N.,Erez,T.,Tassa,Y.,Silver,D.,andWierstra,D.(2016). Continuous control with deep reinforcement learning. InInternational Conference on Lear...
arXiv 2025
-
[36]
Lucas, R. E. (1976). Econometric policy evaluation: A critique. In Brunner, K. and Meltzer, A. H., editors,The Phillips Curve and Labor Markets, volume 1 ofCarnegie-Rochester Conference Series on Public Policy, pages 19–46. North-Holland
1976
-
[37]
MacKay, D. J. C. (1992). Information-based objective functions for active data selection.Neural Computation, 4(4):590–604
1992
-
[38]
Maes, L., Le Lidec, Q., Scieur, D., LeCun, Y., and Balestriero, R. (2026). LeWorldModel: Stable end-to-end joint-embedding predictive architecture from pixels. arXiv:2603.19312. 75
Pith/arXiv arXiv 2026
-
[39]
Maliar, L., Maliar, S., and Winant, P. (2021). Deep learning for solving dynamic economic models. Journal of Monetary Economics, 122:76–101
2021
-
[40]
Marcet, A. (1988). Solution of nonlinear models by parameterizing expectations. Technical report, Carnegie Mellon University
1988
-
[41]
and Sargent, T
Marcet, A. and Sargent, T. J. (1989). Convergence of least-squares learning mechanisms in self- referential linear stochastic models.Journal of Economic Theory, 48(2):337–368. Moll,B.(2026). Heterogeneousagentmacroeconomics: Eightlessonsandachallenge.TheEconomic Journal, 136(676):1173–1205. Economic Journal Lecture, Royal Economic Society. Nuño, G., Renne...
1989
-
[42]
Deeplearningforsearchandmatchingmodels
Payne, J., Rebei, A., andYang, Y.(2025). Deeplearningforsearchandmatchingmodels. Technical Report 25-05, Swiss Finance Institute
2025
-
[43]
Polyak, B. T. and Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838–855
1992
-
[44]
Rasmussen, C. E. and Williams, C. K. I. (2005).Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press. Renner,P.andScheidegger,S.(2018). Machinelearningfordynamicincentiveproblems. Working paper. Available at SSRN: http://dx.doi.org/10.2139/ssrn.3282487
-
[45]
(1976).Principles of Mathematical Analysis
Rudin, W. (1976).Principles of Mathematical Analysis. McGraw-Hill, 3rd edition
1976
-
[46]
Sargent, T. J. (1993).Bounded Rationality in Macroeconomics. Oxford University Press
1993
-
[47]
Sargent, T. J. (1999).The Conquest of American Inflation. Princeton University Press
1999
-
[48]
Sargent, T. J. (2024). Macroeconomics after Lucas. Sequel to Lucas and Sargent (1978)
2024
-
[49]
Scheidegger, S. (2026). Deep learning for solving and estimating dynamic models in economics and finance. arXiv:2605.14493
Pith/arXiv arXiv 2026
-
[50]
and Bilionis, I
Scheidegger, S. and Bilionis, I. (2019). Machine learning for high-dimensional dynamic stochastic economies.Journal of Computational Science, 33:68–82
2019
-
[51]
Schmidhuber, J. (1990). Making the world differentiable: On using self-supervised fully recurrent neural networks for dynamic reinforcement learning and planning in non-stationary environ- ments.Technical Report FKI-126-90, Technische Universität München
1990
-
[52]
Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. InAdvances in Neural Information Processing Systems (NeurIPS 25)
2012
-
[53]
L., Lucas, R
Stokey, N. L., Lucas, R. E., and Prescott, E. C. (1989).Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge, MA. Valaitis,V.andVilla,A.T.(2024). Amachinelearningprojectionmethodformacro-financemodels. Quantitative Economics, 15(1):145–173. 76
1989
-
[54]
Yang, Y., Wang, C., Schaab, A., and Moll, B. (2026). Structural reinforcement learning for hetero- geneous agent macroeconomics. arXiv:2512.18892
arXiv 2026
-
[55]
Young, E. R. (2010). Solving the incomplete markets model with aggregate uncertainty using the krusell–smith algorithm and non-stochastic simulations.Journal of Economic Dynamics and Control, 34(1):36–41. 77
2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.