Learning more physically realistic dynamics in machine-learning based weather forecasting with latent-space constraints
Pith reviewed 2026-05-21 22:00 UTC · model grok-4.3
The pith
Training ML weather models with losses in an autoencoder latent space improves long-term forecast skill and physical realism by capturing cross-variable couplings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors show that rollout training with latent-space constraints improves long-term forecast skill, while better preserving fine-scale structures and physical realism than the widely used model-space loss. They achieve this by reformulating model training as a four-dimensional variational data assimilation problem that treats reanalysis data as imperfect observations, allowing the loss to incorporate cross-variable error covariance structures. In practice, computing the loss in an autoencoder-learned latent space of global atmospheric states encodes the complex nonlinear couplings among variables, so that the high-dimensional error covariance matrix in model space can be approximated as
What carries the argument
Autoencoder-learned latent space that approximates the multivariate error covariance matrix as nearly diagonal, enabling simplified incorporation of physical couplings into the rollout training loss.
If this is right
- Longer-range forecasts maintain higher accuracy because multivariate dependencies are respected during training.
- Fine-scale atmospheric structures such as fronts and convective cells remain sharper instead of diffusing.
- Forecasts exhibit greater physical realism with fewer unphysical artifacts like negative moisture or inconsistent pressure fields.
- The same framework allows joint training on reanalysis fields and heterogeneous observational datasets within one consistent objective.
Where Pith is reading between the lines
- Operational centers could adopt this training change to reduce post-processing corrections currently needed for model bias.
- The latent-space approach might extend naturally to other chaotic dynamical systems such as ocean or climate models where similar covariance issues arise.
- If the autoencoder is trained on additional variables, the method could enforce consistency across an even broader set of physical constraints.
Load-bearing premise
The autoencoder-learned latent space encodes complex nonlinear couplings among atmospheric variables so that the high-dimensional error covariance matrix in model space can be approximated as nearly diagonal.
What would settle it
Run parallel rollout forecasts with the latent-space loss and the standard model-space loss on the same initial conditions, then compare both against independent high-resolution observations at lead times beyond 5 days for metrics of small-scale feature preservation such as front sharpness or precipitation localization.
Figures
read the original abstract
Data-driven machine learning (ML) models are reshaping weather forecasting and have shown the potential to accelerate and surpass traditional physics-based approaches, leading to a second revolution in the field after data assimilation. However, most ML forecast models are trained with weighted variable-wise losses on rollout forecasts that neglect cross-variable and spatial error covariance induced by physical coupling, often yielding overly smooth and physically unrealistic long-range forecasts. To address this, we reformulate model training as a four-dimensional variational data assimilation (4DVar) problem that treats reanalysis data as imperfect observations. This enables the loss function to incorporate cross-variable error covariance structures that capture multivariate dependencies and their associated errors. In practice, we approximate this objective by computing the loss in an autoencoder-learned latent space of global atmospheric states. By encoding complex nonlinear couplings among atmospheric variables, this representation allows the high-dimensional, complex error covariance matrix in model space to be approximated as nearly diagonal in latent space, substantially simplifying implementation. We show that rollout training with latent-space constraints improves long-term forecast skill, while better preserving fine-scale structures and physical realism than the widely used model-space loss. Finally, we extend this framework to accommodate heterogeneous data sources, enabling the forecast model to be trained jointly on reanalysis and multi-source observations within a unified theoretical formulation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reformulates training of ML-based weather forecast models as a 4DVar data assimilation problem that treats reanalysis as imperfect observations. It approximates the resulting multivariate loss by computing it in the latent space of a trained autoencoder, under the assumption that this encoding of nonlinear cross-variable couplings renders the high-dimensional model-space error covariance nearly diagonal. The central empirical claim is that rollout training with this latent-space loss yields improved long-term forecast skill and better preservation of fine-scale structures and physical realism relative to standard model-space weighted losses; the framework is also extended to heterogeneous data sources.
Significance. If the latent-space diagonal approximation is valid and the reported gains are reproducible, the work would supply a theoretically grounded alternative to ad-hoc variable-wise losses, potentially improving physical consistency in data-driven weather models without requiring explicit covariance estimation in the original high-dimensional space.
major comments (2)
- [Abstract and §3.2] Abstract (latent-space approximation paragraph) and §3.2: the claim that the autoencoder latent space 'allows the high-dimensional, complex error covariance matrix in model space to be approximated as nearly diagonal' is load-bearing for attributing any skill or realism gains to covariance-aware training rather than to a generic regularizer. No quantitative diagnostic (e.g., average off-diagonal magnitude of the sample covariance in latent space, or comparison of full vs. diagonal 4DVar objectives) is presented to verify that off-diagonal terms are negligible; without this, the method reduces to an unverified heuristic.
- [§4] §4 (experimental results): the reported improvements in long-term skill and physical realism are presented without ablation of the diagonal assumption itself (e.g., comparison against a non-diagonal latent loss or against a model-space loss with explicit covariance). This makes it impossible to isolate whether gains stem from the 4DVar reformulation or from other implementation choices.
minor comments (2)
- [Eq. (7)] Notation for the latent-space loss (Eq. 7) should explicitly state the assumed form of the latent covariance (identity or learned diagonal) to avoid ambiguity with standard autoencoder reconstruction losses.
- [Figure 3] Figure 3 (forecast examples) would benefit from quantitative insets showing power spectra or gradient magnitudes to support the 'fine-scale structures' claim.
Simulated Author's Rebuttal
We thank the referee for their constructive and insightful comments on our manuscript. We address each major comment point by point below, with a focus on strengthening the attribution of our results to the proposed latent-space approximation.
read point-by-point responses
-
Referee: [Abstract and §3.2] Abstract (latent-space approximation paragraph) and §3.2: the claim that the autoencoder latent space 'allows the high-dimensional, complex error covariance matrix in model space to be approximated as nearly diagonal' is load-bearing for attributing any skill or realism gains to covariance-aware training rather than to a generic regularizer. No quantitative diagnostic (e.g., average off-diagonal magnitude of the sample covariance in latent space, or comparison of full vs. diagonal 4DVar objectives) is presented to verify that off-diagonal terms are negligible; without this, the method reduces to an unverified heuristic.
Authors: We agree that a quantitative verification of the near-diagonality assumption would strengthen the central claim. The autoencoder is trained to capture nonlinear cross-variable couplings in its latent representation, which we expect to reduce the magnitude of off-diagonal error covariances relative to model space. In the revised manuscript we will add a diagnostic: we will compute and report the average absolute off-diagonal element (normalized by the diagonal) of the sample covariance matrix estimated from a large set of latent encodings drawn from reanalysis data. We will also include a brief comparison of forecast skill when using the diagonal latent loss versus a version that retains a small number of leading off-diagonal terms (via a low-rank update) to quantify the approximation error. revision: yes
-
Referee: [§4] §4 (experimental results): the reported improvements in long-term skill and physical realism are presented without ablation of the diagonal assumption itself (e.g., comparison against a non-diagonal latent loss or against a model-space loss with explicit covariance). This makes it impossible to isolate whether gains stem from the 4DVar reformulation or from other implementation choices.
Authors: We concur that additional ablations would help isolate the contribution of the latent-space diagonal approximation. We will expand §4 to include (i) a direct comparison against the standard model-space weighted loss (equivalent to a diagonal covariance in model space) and (ii) a discussion of how the latent-space formulation implicitly accounts for cross-variable structure that a simple model-space diagonal loss cannot. A full non-diagonal latent-space loss or an explicit covariance in the original model space remains computationally intractable; the former would require storing and inverting a dense latent covariance at each training step, while the latter is infeasible given the millions of variables in model space. We will therefore clarify these practical constraints in the text and add the feasible ablations described above. revision: partial
- A direct experimental comparison against a 4DVar objective that uses an explicit full covariance matrix in the original high-dimensional model space, which is computationally prohibitive.
Circularity Check
No significant circularity; derivation rests on explicit approximation assumption
full rationale
The paper's core step reformulates rollout training as a 4DVar objective treating reanalysis as imperfect observations, then approximates the full error covariance by computing loss in an autoencoder latent space under the stated assumption that this space encodes nonlinear couplings sufficiently to render the covariance nearly diagonal. This assumption is presented explicitly as a modeling choice that simplifies implementation rather than being derived by construction from the forecast model equations or reduced to a fitted parameter. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided derivation chain; the claimed improvements in long-term skill and physical realism are positioned as empirical outcomes of the latent-space loss, not tautological consequences of the inputs. The overlap between autoencoder training data and forecast distribution is noted but does not collapse the objective into its own inputs per the paper's equations.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The autoencoder latent space encodes complex nonlinear couplings among atmospheric variables allowing the error covariance to be treated as nearly diagonal
Reference graph
Works this paper leans on
-
[1]
The quiet revolution of numerical weather prediction.Nature, 525(7567):47–55, September 2015
Peter Bauer, Alan Thorpe, and Gilbert Brunet. The quiet revolution of numerical weather prediction.Nature, 525(7567):47–55, September 2015
work page 2015
-
[2]
Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian. Accurate medium-range global weather forecasting with 3D neural networks.Nature, 619(7970):533–538, July 2023
work page 2023
-
[3]
Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, Alexander Merose, Stephan Hoyer, George Holland, Oriol Vinyals, Jacklynn Stott, Alexander Pritzel, Shakir Mohamed, and Peter Battaglia. Learning skillful medium-range global weather forecasting.Scienc...
work page 2023
-
[4]
Lei Chen, Xiaohui Zhong, Feng Zhang, Yuan Cheng, Yinghui Xu, Yuan Qi, and Hao Li. FuXi: A cascade machine learning forecasting system for 15-day global weather forecast.npj Climate and Atmospheric Science, 6(1):190, November 2023
work page 2023
-
[5]
Simon Lang, Mihai Alexe, Matthew Chantry, Jesper Dramsch, Florian Pinault, Baudouin Raoult, Mariana C. A. Clare, Christian Lessig, Michael Maier-Gerber, Linus Magnusson, Zied Ben Bouallègue, Ana Prieto Nemesio, Peter D. Dueben, Andrew Brown, Florian Pappenberger, and Florence Rabier. AIFS – ECMWF’s data-driven forecasting system, August 2024
work page 2024
-
[6]
Kang Chen, Tao Han, Fenghua Ling, Junchao Gong, Lei Bai, Xinyu Wang, Jing-Jia Luo, Ben Fei, Wenlong Zhang, Xi Chen, Leiming Ma, Tianning Zhang, Rui Su, Yuanzheng Ci, Bin Li, Xiaokang Yang, and Wanli Ouyang. The operational medium-range deterministic weather forecasting can be extended beyond a 10-day lead time. Communications Earth & Environment, 6(1), July 2025
work page 2025
-
[7]
Dmitrii Kochkov, Janni Yuval, Ian Langmore, Peter Norgaard, Jamie Smith, Griffin Mooers, Milan Klöwer, James Lottes, Stephan Rasp, Peter Düben, Sam Hatfield, Peter Battaglia, Alvaro Sanchez-Gonzalez, Matthew Willson, Michael P. Brenner, and Stephan Hoyer. Neural general circulation models for weather and climate.Nature, 632(8027):1060–1066, August 2024. 1...
work page 2024
-
[8]
FuXi-ENS: A machine learning model for medium-range ensemble weather forecasting, August 2024
Xiaohui Zhong, Lei Chen, Hao Li, Jun Liu, Xu Fan, Jie Feng, Kan Dai, Jing-Jia Luo, Jie Wu, and Bo Lu. FuXi-ENS: A machine learning model for medium-range ensemble weather forecasting, August 2024
work page 2024
-
[9]
Lizao Li, Robert Carver, Ignacio Lopez-Gomez, Fei Sha, and John Anderson. Generative emulation of weather forecast ensembles with diffusion models.Science Advances, 10(13):eadk4489, March 2024
work page 2024
-
[10]
FengWu-W2S: A deep learning model for seamless weather-to- subseasonal forecast of global atmosphere
Fenghua Ling, Kang Chen, Jiye Wu, Tao Han, Jing-Jia Luo, and Lei Bai. FengWu-W2S: A deep learning model for seamless weather-to- subseasonal forecast of global atmosphere
-
[11]
Simon Lang, Mihai Alexe, Mariana C. A. Clare, Christopher Roberts, Rilwan Adewoyin, Zied Ben Bouallègue, Matthew Chantry, Jesper Dramsch, Peter D. Dueben, Sara Hahner, Pedro Maciel, Ana Prieto-Nemesio, Cathal O’Brien, Florian Pinault, Jan Polster, Baudouin Raoult, Steffen Tietsche, and Martin Leutbecher. AIFS-CRPS: Ensemble forecasting using a model train...
work page 2024
-
[12]
Andersson, Jacklynn Stott, Remi Lam, Matthew Willson, Alvaro Sanchez-Gonzalez, and Peter Battaglia
Ferran Alet, Ilan Price, Andrew El-Kadi, Dominic Masters, Stratis Markou, Tom R. Andersson, Jacklynn Stott, Remi Lam, Matthew Willson, Alvaro Sanchez-Gonzalez, and Peter Battaglia. Skillful joint probabilistic weather forecasting from marginals, June 2025
work page 2025
-
[13]
Atmospheric Modeling, Data Assimilation and Predictability
Eugenia Kalnay. Atmospheric Modeling, Data Assimilation and Predictability. November 2002
work page 2002
-
[14]
Akshay Subramaniam, Dale Durran, David Pruitt, Nathaniel Cresswell-Clay, and William Yik. Imposing the Fundamental Dynamical Constraint of Hydrostatic Balance to Improve Global ML Weather Prediction, June 2025
work page 2025
-
[15]
Hans Hersbach, Bill Bell, Paul Berrisford, Shoji Hirahara, András Horányi, Joaquín Muñoz-Sabater, Julien Nicolas, Carole Peubey, Raluca Radu, Dinand Schepers, Adrian Simmons, Cornel Soci, Saleh Abdalla, Xavier Abellan, Gianpaolo Balsamo, Peter Bechtold, Gionata Biavati, Jean Bidlot, Massimo Bonavita, Giovanna De Chiara, Per Dahlgren, Dick Dee, Michail Dia...
work page 1999
-
[16]
Vandal, Kate Duffy, Daniel McDuff, Yoni Nachmany, and Chris Hartshorn
Thomas J. Vandal, Kate Duffy, Daniel McDuff, Yoni Nachmany, and Chris Hartshorn. Global atmospheric data assimilation with multi-modal masked autoencoders, July 2024
work page 2024
-
[17]
Mihai Alexe, Eulalie Boucher, Peter Lean, Ewan Pinnington, Patrick Laloyaux, Anthony McNally, Simon Lang, Matthew Chantry, Chris Burrows, Marcin Chrust, Florian Pinault, Ethel Villeneuve, Niels Bormann, and Sean Healy. GraphDOP: Towards skilful data-driven medium-range weather forecasts learnt and initialised directly from observations, December 2024
work page 2024
-
[18]
Fabrizio Falasca. Neural models of multiscale systems: Conceptual limitations, stochastic parametrizations, and a climate application, July 2025
work page 2025
-
[19]
YOSHIKAZU SASAKI. SOME BASIC FORMALISMS IN NUMERICAL V ARIATIONAL ANALYSIS.Monthly Weather Review, 98(12):875–883, December 1970
work page 1970
-
[20]
Milija Zupanski. Regional Four-Dimensional Variational Data Assimilation in a Quasi-Operational Forecasting Environment.Monthly Weather Review, 121(8):2396–2408, August 1993
work page 1993
-
[21]
A General Weak Constraint Applicable to Operational 4DV AR Data Assimilation Systems
Dusanka Zupanski. A General Weak Constraint Applicable to Operational 4DV AR Data Assimilation Systems. Monthly Weather Review, 125(9):2274–2292, September 1997
work page 1997
-
[22]
G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks.Science, 313(5786):504–507, July 2006
work page 2006
-
[23]
Hang Fan, Lei Bai, Ben Fei, Yi Xiao, Kun Chen, Yubao Liu, Yongquan Qu, Fenghua Ling, and Pierre Gentine. Physically Consistent Global Atmospheric Data Assimilation with Machine Learning in Latent Space, July 2025
work page 2025
-
[24]
Alban Farchi, Marcin Chrust, Marc Bocquet, Patrick Laloyaux, and Massimo Bonavita. Online Model Error Correction With Neural Networks in the Incremental 4D-Var Framework.Journal of Advances in Modeling Earth Systems, 15(9):e2022MS003474, September 2023
work page 2023
-
[25]
Yongquan Qu and Xiaoming Shi. Can a machine learning–enabled numerical model help extend effective forecast range through consistently trained subgrid-scale models?Artificial Intelligence for the Earth Systems, 2(1):e220050, 2023
work page 2023
-
[26]
Yongquan Qu, Mohamed Aziz Bhouri, and Pierre Gentine. Joint parameter and parameterization inference with un- certainty quantification through differentiable programming. InICLR 2024 Workshop on AI4DifferentialEquations In Science. 14 arXivTemplateA PREPRINT
work page 2024
-
[27]
Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
Roberto Cipolla, Yarin Gal, Alex Kendall, Roberto Cipolla, Yarin Gal, and Alex Kendall. Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7482–7491, Salt Lake City, UT, USA, June 2018. IEEE
work page 2018
-
[28]
R. N. Bannister. A review of forecast error covariance statistics in atmospheric variational data assimilation. I: Characteristics and measurements of forecast error covariances.Quarterly Journal of the Royal Meteorological Society, 134(637):1951–1970, October 2008
work page 1951
-
[29]
Edward N. Lorenz. The predictability of a flow which possesses many scales of motion.Tellus A: Dynamic Meteorology and Oceanography, 21(3):289, January 1969
work page 1969
-
[30]
Y . Qiang Sun and Fuqing Zhang. A New Theoretical Framework for Understanding Multiscale Atmospheric Predictability.Journal of the Atmospheric Sciences, 77(7):2297–2309, July 2020
work page 2020
-
[31]
Hynek Bednáˇr and Holger Kantz. Prediction error growth in a more realistic atmospheric toy model with three spatiotemporal scales.Geoscientific Model Development, 15(10):4147–4161, May 2022
work page 2022
-
[32]
Juan Nathaniel and Pierre Gentine. Generative emulation of chaotic dynamics with coherent prior.Computer Methods in Applied Mechanics and Engineering, 448:118410, January 2026
work page 2026
-
[33]
Boštjan Melinc and Žiga Zaplotnik. 3D-Var data assimilation using a variational autoencoder.Quarterly Journal of the Royal Meteorological Society, 150(761):2273–2295, April 2024
work page 2024
-
[34]
Qingyu Zheng, Guijun Han, Wei Li, Lige Cao, Gongfu Zhou, Haowen Wu, Qi Shao, Ru Wang, Xiaobo Wu, Xudong Cui, Hong Li, and Xuan Wang. Generating Unseen Nonlinear Evolution in Sea Surface Temperature Using a Deep Learning-Based Latent Space Data Assimilation Framework
-
[35]
Hang Fan, Yubao Liu, Yuewei Liu, Zhaoyang Huo, Baojun Chen, and Yu Qin. A Novel Latent Space Data Assimilation Framework with Autoencoder-Observation to Latent Space (AE-O2L) Network. Part II: Observation and Background Assimilation with Interpretability.Monthly Weather Review, 153(8):1349–1363, August 2025
work page 2025
-
[36]
Boštjan Melinc, Uroš Perkan, and Žiga Zaplotnik. A unified neural background-error covariance model for midlatitude and tropical atmospheric data assimilation, June 2025
work page 2025
-
[37]
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, August 2021
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, August 2021
work page 2021
-
[38]
Tao Han, Zhenghao Chen, Song Guo, Wanghan Xu, and Lei Bai. CRA5: Extreme Compression of ERA5 for Portable Global Climate and Weather Research via an Efficient Variational Transformer, May 2024
work page 2024
-
[39]
Stephan Rasp, Stephan Hoyer, Alexander Merose, Ian Langmore, Peter Battaglia, Tyler Russel, Alvaro Sanchez- Gonzalez, Vivian Yang, Rob Carver, Shreya Agrawal, Matthew Chantry, Zied Ben Bouallegue, Peter Dueben, Carla Bromberg, Jared Sisk, Luke Barrington, Aaron Bell, and Fei Sha. WeatherBench 2: A benchmark for the next generation of data-driven global we...
work page 2024
-
[40]
Zhanshan Ma, Chuanfeng Zhao, Jiandong Gong, Jin Zhang, Zhe Li, Jian Sun, Yongzhu Liu, Jiong Chen, and Qingu Jiang. Spin-up characteristics with three types of initial fields and the restart effects on forecast accuracy in the GRAPES global forecast system.Geoscientific Model Development, 14(1):205–221, January 2021. [41]An Introduction to Dynamic Meteorol...
work page 2021
-
[41]
P. Laloyaux, M. Bonavita, M. Chrust, and S. Gürol. Exploring the potential and limitations of weak-constraint 4D-Var.Quarterly Journal of the Royal Meteorological Society, 146(733):4067–4082, October 2020
work page 2020
-
[42]
Randal D. Koster, Y . C. Sud, Zhichang Guo, Paul A. Dirmeyer, Gordon Bonan, Keith W. Oleson, Edmond Chan, Diana Verseghy, Peter Cox, Harvey Davies, Eva Kowalczyk, C. T. Gordon, Shinjiro Kanae, David Lawrence, Ping Liu, David Mocko, Cheng-Hsuan Lu, Ken Mitchell, Sergey Malyshev, Bryant McAvaney, Taikan Oki, Tomohito Yamada, Andrew Pitman, Christopher M. Ta...
work page 2006
-
[43]
Seneviratne, Daniel Lüthi, Michael Litschi, and Christoph Schär
Sonia I. Seneviratne, Daniel Lüthi, Michael Litschi, and Christoph Schär. Land–atmosphere coupling and climate change in Europe.Nature, 443(7108):205–209, September 2006
work page 2006
-
[44]
Dudley Chelton and Shang-Ping Xie. Coupled Ocean-Atmosphere Interaction at Oceanic Mesoscales.Oceanog- raphy, 23(4):52–69, December 2010
work page 2010
-
[45]
Stephen G. Penny and Thomas M. Hamill. Coupled Data Assimilation for Integrated Earth System Analysis and Prediction.Bulletin of the American Meteorological Society, 98(7):ES169–ES172, July 2017
work page 2017
-
[46]
Shaoqing Zhang, Zhengyu Liu, Xuefeng Zhang, Xinrong Wu, Guijun Han, Yuxin Zhao, Xiaolin Yu, Chang Liu, Yun Liu, Shu Wu, Feiyu Lu, Mingkui Li, and Xiong Deng. Coupled data assimilation and parameter estimation in coupled ocean–atmosphere models: A review.Climate Dynamics, 54(11-12):5127–5144, June 2020. 15
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.