arxiv: 2604.20467 · v1 · submitted 2026-04-22 · ⚛️ physics.ao-ph · cs.LG· physics.comp-ph

Recognition: unknown

Mechanistic Interpretability Tool for AI Weather Models

George C. Craig, Kirsten I. Tempest, Matthias Beylich

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:59 UTC · model grok-4.3

classification ⚛️ physics.ao-ph cs.LGphysics.comp-ph

keywords mechanistic interpretabilityAI weather modelsGraphCastlatent spacePCAcosine similaritymeteorological features

0 comments

The pith

An open-source tool uses PCA and cosine similarity to find interpretable meteorological directions inside the latent space of AI weather models like GraphCast.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a tool that takes the internal latent representations produced by AI weather models and organizes them for analysis. It applies principal component analysis and cosine similarity to locate linear combinations of latent channels that align with recognizable weather phenomena. Case studies on the GraphCast model illustrate this for mid-latitude synoptic waves and specific humidity fields. A sympathetic reader would care because AI forecasts now rival traditional numerical models yet remain opaque, so any method that exposes their internal logic could increase trust and guide improvements. The work positions the tool as adaptable and open-source so others can test it on additional variables and models.

Core claim

The paper claims that its tool can identify linear combinations of latent channels in AI weather models that appear to correspond to interpretable meteorological features, shown through preliminary case studies on GraphCast where directions extracted via PCA and cosine similarity align with mid-latitude synoptic-scale waves and specific humidity.

What carries the argument

The tool that organizes internal latent representations from the model processor and enables cosine similarity and Principal Component Analysis to extract directions in latent space potentially tied to meteorological features.

If this is right

The tool can be applied to other AI weather models to examine how they represent different atmospheric variables.
It supplies a practical starting point for mechanistic analyses that could help explain why specific forecasts are generated.
Users can extend the same PCA and similarity methods to new case studies beyond waves and humidity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the identified directions prove stable, they could support targeted interventions in the model's latent space to adjust behavior for particular weather regimes.
This style of analysis might eventually connect data-driven AI predictions more directly to physical principles used in traditional weather models.

Load-bearing premise

That the linear combinations of latent channels identified by PCA and cosine similarity actually represent real meteorological features rather than coincidental patterns.

What would settle it

A controlled test that quantifies how accurately the extracted latent directions predict the strength or location of the corresponding weather features across a large set of independent forecast cases, or a counterexample where the alignment fails systematically.

Figures

Figures reproduced from arXiv: 2604.20467 by George C. Craig, Kirsten I. Tempest, Matthias Beylich.

**Figure 1.** Figure 1: Global fields for three forecast times, t: (a-c) 2016-03-09 18:00 UTC, (d-f) 2016-03-10 18:00 UTC, and (g-i) 2017-06-01 12:00 UTC. The left column (a,d,g) shows the 500hPa geopotential (m2 s −2 ) at the forecast initialisation time, tinit. The middle column (b,e,h) shows the corresponding residual (f(t)−f(tinit)), and the right column (c,f,i) the first principal component. The circled region highlights the… view at source ↗

**Figure 2.** Figure 2: Spatial structure of selected latent channels for three forecast times. Rows (a–c), (g–i) and (j–l) show the activation of channels 464, 360 and 30, respectively, at the last (16th) processor step for forecast times 2016-03-09 18:00 UTC, 2016-03-10 18:00 UTC and 2017-06-01 12:00 UTC. Second row from top (d–f) corresponds to the same forecast time as (a–c), but shows an earlier processor step (P = 4). Circl… view at source ↗

**Figure 3.** Figure 3: Cosine similarity of the latent feature vectors for the forecast at 2016-03-09 18:00 UTC, evaluated using two regions indicated by black circles; (a-b) analysis region is as in top two rows of [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Global fields for two forecast times, t: (a-b) 2017-06-02 06:00 UTC and (c-d) 2016-12-02 18:00 UTC. The left column (a,c) shows the 1000hPa specific humidity (kgkg−1 ) at the forecast initialisation time, tinit, and the right column (b,d), the first principal component. The circled region highlights the area used for analysis. For both rows, the circle is centred at 15◦N, 15◦E and has a radius of 20◦ . A c… view at source ↗

**Figure 5.** Figure 5: Spatial structure of selected latent channels for two forecast times. Rows (a–c) and (d–f) show the activation of channels 426, 33 and 172, respectively, at the last (16th) processor step for forecast times 2017-06-02 06:00 UTC and 2016-12-02 18:00 UTC. Circles correspond to the same location as in [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

read the original abstract

Artificial Intelligence (AI) weather models are improving rapidly, and their forecasts are already competitive with long-established traditional Numerical Weather Prediction (NWP). To build confidence in this new methodology, it is critical that we understand how these predictions are generated. This is a huge challenge as these AI weather models remain largely black boxes. In other areas of Machine Learning (ML), mechanistic interpretability has emerged as a framework for understanding ML predictions by analysing the building blocks responsible for them. Here we present an open-source, highly adaptable tool which incorporates concepts from mechanistic interpretability. The tool organises internal latent representations from the model processor and allows for initial analyses, including cosine similarity and Principal Component Analysis (PCA), enabling the user to identify directions in latent space potentially associated with meteorological features. Applying our tool to the graph neural network GraphCast, we present preliminary case studies for mid-latitude synoptic-scale waves and specific humidity. These demonstrate the tool's ability to identify linear combinations of latent channels that appear to correspond to interpretable features.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper releases an open-source tool applying PCA and cosine similarity to GraphCast latents with preliminary weather feature examples, but the evidence stays visual and correlational.

read the letter

The main takeaway from this paper is an open-source tool for mechanistic interpretability in AI weather models. It applies techniques like PCA and cosine similarity to the latent representations of models such as GraphCast and includes preliminary case studies on mid-latitude synoptic waves and specific humidity. What is new here is the adaptation of these methods to the weather domain in a user-friendly package. The authors show examples where linear combinations of channels seem to align with interpretable features, which extends existing ML interpretability work into a new application area with real-world stakes. The paper does well by focusing on practicality and openness. Releasing the tool allows others to experiment with it on their own models, and the choice of GraphCast as the test case is relevant since these AI models are already matching traditional forecasts. Where it falls short is in the strength of the evidence for the tool's effectiveness. The case studies depend on visual comparisons without quantitative validation, such as measuring how much intervening on those directions changes forecast skill or checking against held-out events. This makes it hard to be confident that the identified directions are mechanistically meaningful rather than artifacts of the data structure. The stress-test note is on point about the need for causal interventions. Overall, this paper is for researchers in AI meteorology and interpretability who are looking for accessible starting tools. A reader interested in building better understanding of black-box weather models would get value from the code and the basic examples. It deserves a serious referee because the contribution is a concrete tool in a growing field, and feedback could help improve the validation. I recommend putting it through peer review rather than desk rejecting it.

Referee Report

3 major / 3 minor

Summary. The manuscript introduces an open-source tool for mechanistic interpretability of AI weather models. It organizes latent representations from models such as the graph neural network GraphCast and applies standard techniques including cosine similarity and PCA to identify linear combinations of latent channels that may correspond to meteorological features. Preliminary case studies are presented for mid-latitude synoptic-scale waves and specific humidity, claiming these demonstrate the tool's ability to surface interpretable directions in latent space.

Significance. An adaptable open-source tool for probing internal representations in AI weather models could help build scientific confidence in forecasts that are already competitive with traditional NWP. The work correctly identifies the need for interpretability methods beyond black-box evaluation. However, because the central demonstrations remain qualitative and lack quantitative or causal validation, the immediate significance to atmospheric physics is modest and would increase substantially with stronger evidence that identified directions are mechanistically relevant rather than spurious.

major comments (3)

[Case studies] Case studies section: the claim that PCA directions 'appear to correspond to interpretable features' for synoptic waves and specific humidity rests entirely on visual similarity. No quantitative metrics (e.g., spatial correlation coefficients with reanalysis fields, explained variance relative to random baselines, or forecast-error correlations) are reported to establish that these directions are not simply high-variance directions unrelated to the model's computation.
[Tool description and results] Tool description and results: no activation patching, ablation, or intervention experiments are performed to test whether modifying the identified latent directions produces physically consistent changes in the model's output fields. Without such causal tests, the mapping from latent direction to meteorological feature remains correlational and could arise from shared variance structure in the training data.
[Abstract and conclusions] Abstract and conclusions: the assertion that the tool enables identification of 'directions in latent space potentially associated with meteorological features' is not supported by controls for false positives (e.g., statistical significance of cosine similarities or comparison against shuffled or random directions). This weakens the central claim that the tool provides mechanistic insight rather than post-hoc pattern matching.

minor comments (3)

[Figures] Figure captions should specify the exact PCA components plotted, the percentage of variance explained by each, and the units/color scales of the meteorological fields shown for reproducibility.
[Introduction] The manuscript would benefit from a brief comparison to existing interpretability methods already applied to weather models (e.g., saliency maps or attention visualization) to clarify the incremental contribution of the new tool.
[Tool description] Notation for latent channels and PCA directions should be defined consistently in the text and equations to avoid ambiguity when users adapt the code.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and insightful review. We appreciate the acknowledgment of the tool's potential value for building scientific confidence in AI weather models. We address each major comment below and outline revisions to strengthen the manuscript while remaining honest about the preliminary nature of the current demonstrations.

read point-by-point responses

Referee: [Case studies] Case studies section: the claim that PCA directions 'appear to correspond to interpretable features' for synoptic waves and specific humidity rests entirely on visual similarity. No quantitative metrics (e.g., spatial correlation coefficients with reanalysis fields, explained variance relative to random baselines, or forecast-error correlations) are reported to establish that these directions are not simply high-variance directions unrelated to the model's computation.

Authors: We agree that the case studies currently rely on visual inspection, which limits the strength of the claims. In the revised manuscript, we will add quantitative metrics including spatial correlation coefficients with ERA5 reanalysis fields for the identified directions, as well as comparisons of explained variance against random and shuffled direction baselines. These additions will help demonstrate that the directions capture more than generic high-variance structure. revision: yes
Referee: [Tool description and results] Tool description and results: no activation patching, ablation, or intervention experiments are performed to test whether modifying the identified latent directions produces physically consistent changes in the model's output fields. Without such causal tests, the mapping from latent direction to meteorological feature remains correlational and could arise from shared variance structure in the training data.

Authors: The referee is correct that the present work provides only correlational evidence. Performing activation patching or ablation on GraphCast would require substantial new methodological development and compute, which is beyond the scope of this initial tool paper. We will revise the manuscript to explicitly discuss this as a limitation and to position causal interventions as an important direction for future extensions of the tool, while clarifying that the current analyses are intended as an exploratory first step for identifying candidate directions. revision: partial
Referee: [Abstract and conclusions] Abstract and conclusions: the assertion that the tool enables identification of 'directions in latent space potentially associated with meteorological features' is not supported by controls for false positives (e.g., statistical significance of cosine similarities or comparison against shuffled or random directions). This weakens the central claim that the tool provides mechanistic insight rather than post-hoc pattern matching.

Authors: We acknowledge the absence of explicit false-positive controls. In the revised version, we will add comparisons of cosine similarities against those obtained from shuffled latent channels and random directions, along with basic statistical significance assessments. These controls will be incorporated into the abstract, results, and conclusions to better support the claims of potential mechanistic relevance. revision: yes

Circularity Check

0 steps flagged

No circularity; standard tool using PCA and cosine similarity on latent space with no derivations or self-referential fits.

full rationale

The paper presents a methodological tool for mechanistic interpretability of AI weather models, applying off-the-shelf techniques (cosine similarity, PCA) to GraphCast's latent channels. Preliminary case studies for synoptic waves and specific humidity are described as qualitative demonstrations that linear combinations 'appear to correspond' to features. No equations, parameter fits, predictions derived from inputs, or load-bearing self-citations are present. The central contribution is tool development and empirical illustration rather than a closed derivation chain, making the analysis self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper focuses on developing a software tool for analysis rather than introducing new physical parameters, axioms, or entities. The analyses rely on standard ML techniques like PCA and cosine similarity applied to existing model internals.

pith-pipeline@v0.9.0 · 5479 in / 1166 out tokens · 23121 ms · 2026-05-09T22:59:28.541329+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Toward a Scientific Discovery Engine for Weather and Climate Data: A Visual Analytics Workbench for Embedding-Based Exploration
physics.data-an 2026-05 conditional novelty 5.0

A visual analytics workbench enables scientists to explore, query, and verify embedding-based similarity searches on weather and climate data by tracing results back to physical evidence.

Reference graph

Works this paper leans on

18 extracted references · 11 canonical work pages · cited by 1 Pith paper

[1]

Beylich, M., Craig, G.C.: Interpretability of ai weather models via intermediate decoding (2026), to be submitted

2026
[2]

Accurate medium-range global weather forecasting with 3d neural networks,

Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., Tian, Q.: Accurate medium-range global weather forecasting with 3d neural networks. Nature619(7970), 533–538 (2023). https://doi.org/10.1038/s41586-023-06185-3, https://www.nature.com/a rticles/s41586-023-06185-3

work page doi:10.1038/s41586-023-06185-3 2023
[3]

anen, M. Ramonet, A. Richter, A. Sch

Bodnar, C., Bruinsma, W.P., Lucic, A., Stanley, M., Allen, A., Brandstetter, J., Garvan, P., Riechert, M., Weyn, J.A., Dong, H., Gupta, J.K., Thambiratnam, K., Archibald, A.T., Wu, C.C., Heider, E., Welling, M., Turner, R.E., Perdikaris, P.: A foundation model for the earth system. Nature641(8065), 1180–1187 (2025). https://doi.org/10.1038/s41586-025-0900...

work page doi:10.1038/s41586-025-09005-y 2025
[4]

Weather and Climate Dynamics4(2), 399–425 (2023)

Hauser,S.,Teubler,F., Riemer,M.,Knippertz, P.,Grams, C.M.:Towards aholistic understanding of blocked regime dynamics through a combination of complemen- tary diagnostic perspectives. Weather and Climate Dynamics4(2), 399–425 (2023). https://doi.org/10.5194/wcd-4-399-2023, https://wcd.copernicus.org/articles/4/ 399/2023/ Mechanistic Interpretability Tool f...

work page doi:10.5194/wcd-4-399-2023 2023
[5]

Hersbach, B

Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., Muñoz-Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Simmons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., Bonavita, M., De Chiara, G., Dahlgren, P., Dee, D., Diamantakis, M., Dragani, R., Flemming, J., Forbes, R., Fuentes, M., ...

work page doi:10.1002/qj.3803 1999
[6]

Kissane, C., Krzyzanowski, R., Bloom, J.I., Conmy, A., Nanda, N.: Interpreting attention layer outputs with sparse autoencoders (2024), https://arxiv.org/abs/ 2406.17759

work page arXiv 2024
[7]

Scienc e 382(6669), 1416–1421 (2023) https://doi.org/10.1126/science.adi2336

Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M., Alet, F.,Ravuri,S.,Ewalds,T.,Eaton-Rosen,Z.,Hu,W.,Merose,A.,Hoyer,S.,Holland, G., Vinyals, O., Stott, J., Pritzel, A., Mohamed, S., Battaglia, P.: Learning skillful medium-range global weather forecasting. Science382(6677), 1416–1421 (2023). https://doi.org/10.1126/science.adi2336...

work page doi:10.1126/science.adi2336 2023
[8]

https://doi.org/10.48550/arXiv.2412

Lang, S., Alexe, M., Clare, M.C.A., Roberts, C., Adewoyin, R., Bouallègue, Z.B., Chantry, M., Dramsch, J., Dueben, P.D., Hahner, S., Maciel, P., Prieto-Nemesio, A., O’Brien, C., Pinault, F., Polster, J., Raoult, B., Tietsche, S., Leutbecher, M.: AIFS-CRPS: Ensemble forecasting using a model trained with a loss function based on the continuous ranked proba...

work page doi:10.48550/arxiv.2412
[9]

MacMillan, T., Ouellette, N.T.: Towards mechanistic understanding in a data- driven weather model: internal activations reveal interpretable physical features (2025), https://arxiv.org/abs/2512.24440

work page arXiv 2025
[10]

https://www.ecmwf.int/en/forecasts/documentation-and-sup port/medium-range-forecasts (2026), accessed 2026/02/13

for Medium-Range Weather Forecasts, E.C.: Medium-range forecasts: Forecasts up to 15 days ahead. https://www.ecmwf.int/en/forecasts/documentation-and-sup port/medium-range-forecasts (2026), accessed 2026/02/13

2026
[11]

https://www.transformer-circuits.pub/2022/mech-interp-essay (June 27 2022), accessed: 2026-02-13

Olah, C.: Mechanistic interpretability, variables, and the importance of inter- pretable bases. https://www.transformer-circuits.pub/2022/mech-interp-essay (June 27 2022), accessed: 2026-02-13

2022
[12]

https://distill.pub/2020/circuits/zoom-in/ (2020), accessed: 2026-02-13

Olah, C., Cammarata, N., Schubert, L., Goh, G., Petrov, M., Carter, S.: Zoom in: An introduction to circuits. https://distill.pub/2020/circuits/zoom-in/ (2020), accessed: 2026-02-13

2020
[13]

WeatherBench 2: A benchmark for the next generation of data‐driven global weather models

Rasp, S., Hoyer, S., Merose, A., Langmore, I., Battaglia, P., Russell, T., Sanchez- Gonzalez, A., Yang, V., Carver, R., Agrawal, S., Chantry, M., Ben Bouallegue, Z., Dueben, P., Bromberg, C., Sisk, J., Barrington, L., Bell, A., Sha, F.: Weatherbench 2: A benchmark for the next generation of data-driven global weather models. Journal of Advances in Modelin...

work page doi:10.1029/2023ms004019 2024
[14]

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition

scikit-learn developers: sklearn.decomposition.pca — scikit-learn 1.8.0 documenta- tion. https://scikit-learn.org/stable/modules/generated/sklearn.decomposition. PCA.html (2025), accessed: 2026-02-14

2025
[15]

https://scikit-learn.org/stable/modules/generated/skle arn.metrics.pairwise.cosine_similarity.html (2025), accessed: 2026-02-14 14 K.I

scikit-learn developers: sklearn.metrics.pairwise.cosine similarity — scikit-learn 1.8.0 documentation. https://scikit-learn.org/stable/modules/generated/skle arn.metrics.pairwise.cosine_similarity.html (2025), accessed: 2026-02-14 14 K.I. Tempest et al

2025
[16]

https://stream lit.io (2026), accessed: 2026-02-14

Streamlit: Streamlit — a faster way to build and share data apps. https://stream lit.io (2026), accessed: 2026-02-14

2026
[17]

Atmospheric Environment338, 120797 (2024)

Yang, R., Hu, J., Li, Z., Mu, J., Yu, T., Xia, J., Li, X., Dasgupta, A., Xiong, H.: Interpretable machine learning for weather and climate prediction: A review. Atmospheric Environment338, 120797 (2024). https://doi.org/https://doi.org/ 10.1016/j.atmosenv.2024.120797, https://www.sciencedirect.com/science/article/ pii/S1352231024004722

work page doi:10.1016/j.atmosenv.2024.120797 2024
[18]

Zhao, H., Yang, F., Shen, B., Lakkaraju, H., Du, M.: Towards uncovering how large language model works: An explainability perspective (2024), https://arxiv.org/ab s/2402.10688

work page arXiv 2024