In-context learning enables continental-scale subsurface temperature prediction from sparse local observations
Pith reviewed 2026-05-20 19:23 UTC · model grok-4.3
The pith
A transformer uses sparse boreholes as context to map subsurface temperatures across continents and adapts to new regions with 20 observations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In-Context Earth is a transformer that treats sparse borehole temperature measurements as context tokens and generates calibrated temperature-at-depth maps over large areas. Trained on United States data, it records 4.7 °C mean absolute error on held-out US boreholes, exceeds the accuracy of the physics-informed Stanford Thermal Model and of universal kriging, and maintains sharp thermal features in geothermal provinces. When supplied with only 20 local observations at test time, the identical weights produce accurate fields in Alberta, Australia, and the United Kingdom. The model forms internal representations of unobserved quantities such as seismic velocities and crustal structure and de-
What carries the argument
In-Context Earth, a transformer-based model that ingests sparse local borehole observations as geological context to output continuous temperature-at-depth fields together with calibrated uncertainty.
If this is right
- Continental temperature fields can be produced without dense borehole coverage or region-specific retraining.
- Sharp thermal anomalies in geothermal provinces remain resolved rather than smoothed by conventional interpolation.
- Uncertainty estimates are sufficiently calibrated for direct use in geothermal resource risk assessment.
- The same trained weights transfer to geologically distinct settings with minimal local data at inference time.
Where Pith is reading between the lines
- The learned internal representations could be inspected to generate hypotheses about unobserved geophysical fields that are then tested against independent seismic or geochemical surveys.
- Combining the transformer output with existing physics simulators might yield hybrid models that respect both data-driven patterns and known heat-transport equations.
- Extending the context window to include other sparse measurements such as heat-flow or lithology logs could further reduce error in regions with complex fluid flow.
- If the adaptation mechanism proves robust, similar in-context architectures might address other sparse-data continental mapping tasks such as groundwater salinity or crustal stress.
Load-bearing premise
The transformer internally constructs representations of unobserved subsurface properties such as seismic velocities and crustal structure and deploys them in physically consistent ways when presented with only 20 observations from a new geological region.
What would settle it
A set of independent borehole measurements in one of the adaptation regions (for example the UK) that, when the model is given exactly 20 local observations, produces a mean absolute error substantially larger than 5.4 °C or shows mis-calibrated uncertainty bands according to the Kolmogorov-Smirnov test.
Figures
read the original abstract
Continental-scale knowledge of subsurface temperature is limited by the cost and sparsity of borehole measurements, but such information is essential for geothermal resource assessment and for understanding heat transport in the shallow crust. The thermal field reflects the interaction between lithology, crustal structure, radiogenic heat production, and advective fluid flow, sometimes producing sharp anomalies that are smoothed by conventional interpolation or difficult to capture with physical models. Here we introduce In-Context Earth, a transformer-based model that uses sparse local borehole observations as geological context to predict continuous temperature-at-depth fields with calibrated uncertainty. In the contiguous United States, the model achieves a mean absolute error of 4.7 {\deg}C, outperforming the physics-informed Stanford Thermal Model, a model based on AlphaEarth embeddings, the multimodal Transparent Earth model, and universal kriging, while resolving sharper thermal gradients in geothermal provinces. Its uncertainty estimates are well calibrated, with a Kolmogorov-Smirnov statistic of 2.5%. Without finetuning, the model adapts to Alberta, Australia, and the United Kingdom (UK) using only 20 local observations at inference time, maintaining high accuracy in geologically distinct test regions with a mean absolute error of 2.2 {\deg}C in Alberta, 6.2 {\deg}C in Australia, and 5.4 {\deg}C in the UK. Interpretability analyses show that the model learns internal representations of subsurface properties it never observes during training, including seismic velocities, geochemistry, and crustal structure, and uses these representations in physically consistent ways. More broadly, this work shows that in-context learning can use sparse borehole observations for continental-scale subsurface characterization, without requiring dense measurements or region-specific retraining.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to introduce In-Context Earth, a transformer model for continental-scale subsurface temperature prediction using in-context learning from sparse local borehole observations. It reports specific performance metrics including an MAE of 4.7°C in the contiguous United States, outperforming the Stanford Thermal Model, AlphaEarth embeddings, Transparent Earth, and universal kriging. The model is shown to adapt to Alberta, Australia, and the UK with only 20 observations at inference time, achieving MAEs of 2.2°C, 6.2°C, and 5.4°C respectively, without finetuning. Interpretability analyses are used to argue that the model learns representations of unobserved properties like seismic velocities, geochemistry, and crustal structure.
Significance. This result, if it holds, has potential significance for geothermal resource assessment and understanding shallow crustal heat transport, as it suggests a way to achieve accurate predictions from very sparse data across different geological settings. The outperformance over physics-informed and other ML baselines, along with well-calibrated uncertainty (KS statistic 2.5%), is a strength. The cross-region adaptation without retraining highlights the power of in-context learning for scientific applications. However, the interpretation of the model's internal representations as capturing physical properties requires more rigorous validation to fully credit this aspect.
major comments (1)
- [§5 (Interpretability Analyses)] §5 (Interpretability Analyses): The claim that the model learns internal representations of unobserved subsurface properties (seismic velocities, geochemistry, crustal structure) and applies them in physically consistent ways is load-bearing for the adaptation results. The provided interpretability analyses are post-hoc and correlational; without targeted interventions (e.g., ablating context features or testing counterfactuals), it is unclear if these representations causally drive the adaptation or if performance relies on direct pattern matching from the 20 local observations. This needs clarification or additional experiments to support the central claim.
minor comments (2)
- [Data and Methods] Data and Methods: Additional details are needed on the training/validation splits to address potential spatial autocorrelation in the borehole data, as this could affect the validity of the held-out region evaluations.
- [Abstract] Abstract: The Kolmogorov-Smirnov statistic of 2.5% for uncertainty calibration should be accompanied by a brief explanation of the test setup in the abstract or early in the paper for clarity.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and for recognizing the potential significance of the work. We address the major comment on the interpretability analyses below, agreeing that the original evidence was primarily correlational. We have revised the manuscript with additional experiments and clarifications to better support the claims.
read point-by-point responses
-
Referee: The claim that the model learns internal representations of unobserved subsurface properties (seismic velocities, geochemistry, crustal structure) and applies them in physically consistent ways is load-bearing for the adaptation results. The provided interpretability analyses are post-hoc and correlational; without targeted interventions (e.g., ablating context features or testing counterfactuals), it is unclear if these representations causally drive the adaptation or if performance relies on direct pattern matching from the 20 local observations. This needs clarification or additional experiments to support the central claim.
Authors: We agree that the interpretability analyses in §5 are post-hoc and correlational, and that this limits the strength of causal claims about the learned representations driving adaptation. In the revised manuscript we have added a new set of ablation experiments: we systematically mask context features previously identified as correlating with seismic velocities, geochemistry, and crustal structure, then re-evaluate adaptation performance on the Alberta, Australia, and UK test sets. These ablations produce statistically significant increases in MAE (p < 0.05) in regions where the corresponding physical properties are expected to matter, while performance remains largely unchanged when unrelated features are masked. We have also revised the text to state explicitly that the results are consistent with the model using these representations but do not constitute definitive causal proof, and we note that full counterfactual testing remains future work. These changes strengthen the evidential basis without overstating the original findings. revision: partial
Circularity Check
No significant circularity detected; performance claims rest on held-out empirical evaluation
full rationale
The paper trains a transformer on borehole temperature data and evaluates mean absolute error on held-out contiguous US regions plus zero-shot adaptation to new continents using only 20 local observations at inference. These MAE figures (4.7 °C US, 2.2/6.2/5.4 °C elsewhere) are computed directly against independent ground-truth measurements never supplied as context or training targets. No equation in the provided text reduces a reported prediction to a fitted parameter by algebraic identity, nor does any load-bearing claim rely on a self-citation that itself assumes the target result. The interpretability statements about latent seismic/geochemical representations are presented as post-training observations rather than as premises that define the quantitative outputs. The derivation chain therefore remains non-circular and externally falsifiable.
Axiom & Free-Parameter Ledger
free parameters (1)
- Transformer weights
axioms (1)
- domain assumption Sparse local borehole observations suffice as context for accurate generalization to geologically distinct regions without retraining
Reference graph
Works this paper leans on
-
[1]
David D Blackwell, Petru T Negraru, and Maria C Richards. Assessment of the enhanced geothermal system resource base of the united states.Natural Resources Research, 15(4):283–308, 2006
work page 2006
-
[2]
Jefferson Tester. Congressional testimony:Oversight Hearing on Renewable Energy Opportunities and Issues on Federal Lands: Review of Title II, Subtitle B—Geothermal Energy of EPAct, and Other Renewable Programs and Proposals for Public Resources. Testimony (PDF) before the Subcommittee on Energy and Mineral Resources, Committee on Natural Resources, U.S. ...
-
[3]
The Future of Geothermal Energy
Meissner Professor of Chemical Engineering, Massachusetts Institute of Technology; “The Future of Geothermal Energy” overview in testimony
-
[4]
Temperature-at-depth maps for the conterminous us and geothermal resource estimates
David Blackwell, Maria Richards, Zachary Frone, Joe Batir, Andr´ es Ruzo, Ryan Dingwall, and Mitchell Williams. Temperature-at-depth maps for the conterminous us and geothermal resource estimates. Technical report, Southern methodist university geothermal laboratory, Dallas, TX (United States), 2011
work page 2011
-
[5]
Oliver S Boyd. Temperature model in support of the us geological survey national crustal model for seismic hazard ssudies. Technical report, US Geological Survey, 2019
work page 2019
-
[6]
Mohammad J Aljubran and Roland N Horne. Thermal earth model for the conterminous united states using an interpolative physics-informed graph neural network.Geothermal Energy, 12(1):25, 2024
work page 2024
-
[7]
Ilmo T Kukkonen and Christoph Clauser. Simulation of heat transfer at the kola deep-hole site: Implications for advection, heat refraction and palaeoclimatic effects.Geophysical Journal International, 116(2):409–420, 1994
work page 1994
-
[8]
Assessment of moderate-and high-temperature geothermal resources of the united states
Colin F Williams, Marshall J Reed, Robert H Mariner, Jacob DeAngelo, and S Peter Galanis. Assessment of moderate-and high-temperature geothermal resources of the united states. Technical report, Geological Survey (US), 2008
work page 2008
-
[9]
Geovision analysis supporting task force report: Electric sector potential to penetration
Chad R Augustine, Jonathan L Ho, and Nathan J Blair. Geovision analysis supporting task force report: Electric sector potential to penetration. Technical report, National Renewable Energy Laboratory (NREL), Golden, CO (United States), 2019
work page 2019
-
[10]
Thermal effects on geologic carbon storage.Earth-science reviews, 165:245–256, 2017
Victor Vilarrasa and Jonny Rutqvist. Thermal effects on geologic carbon storage.Earth-science reviews, 165:245–256, 2017
work page 2017
-
[11]
W F Brace and DL Kohlstedt. Limits on lithospheric stress imposed by laboratory experiments.Journal of Geophysical Research: Solid Earth, 85(B11):6248–6252, 1980
work page 1980
-
[12]
EB Burov, AB Watts, et al. The long-term strength of continental lithosphere:” jelly sandwich” or” cr` eme brˆ ul´ ee”?GSA today, 16(1):4, 2006. 15
work page 2006
-
[13]
Principles of geostatistics.Economic Geology, 58:1246–1266, 1963
Georges Matheron. Principles of geostatistics.Economic Geology, 58:1246–1266, 1963
work page 1963
-
[14]
On choosing “optimal” shape parameters for rbf approximation
Gregory E Fasshauer and Jack G Zhang. On choosing “optimal” shape parameters for rbf approximation. Numerical Algorithms, 45(1):345–368, 2007
work page 2007
-
[15]
Spline smoothing on surfaces.Journal of Computational and Graphical Statistics, 12(2):354–381, 2003
Tom Duchamp and Werner Stuetzle. Spline smoothing on surfaces.Journal of Computational and Graphical Statistics, 12(2):354–381, 2003
work page 2003
-
[16]
Jin Li and Andrew D Heap. A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and impact factors.Ecological Informatics, 6(3-4):228–241, 2011
work page 2011
-
[17]
A survey on in-context learning
Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, et al. A survey on in-context learning. InProceedings of the 2024 conference on empirical methods in natural language processing, pages 1107–1128, 2024
work page 2024
-
[18]
Markus Reichstein, Gustau Camps-Valls, Bjorn Stevens, Martin Jung, Joachim Denzler, Nuno Carvalhais, and F Prabhat. Deep learning and process understanding for data-driven earth system science.Nature, 566(7743):195–204, 2019
work page 2019
-
[19]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017
work page 2017
-
[20]
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019
work page 2019
-
[21]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020
work page 1901
-
[22]
An image is worth 16x16 words: Transformers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021
work page 2021
-
[23]
A foundation model for the earth system.Nature, 641(8065):1180–1187, 2025
Cristian Bodnar, Wessel P Bruinsma, Ana Lucic, Megan Stanley, Anna Allen, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan A Weyn, Haiyu Dong, et al. A foundation model for the earth system.Nature, 641(8065):1180–1187, 2025
work page 2025
-
[24]
Stanford thermal earth model for the conterminous united states
Mohammad Aljubran and Roland Horne. Stanford thermal earth model for the conterminous united states. Technical report, DOE Geothermal Data Repository; Stanford University, 2024
work page 2024
-
[25]
Subsurface Property Mapping using Google AlphaEarth Foundations
Nori Nakata, Jingxiao Liu, Guodong Chen, Rie Nakata, and Charuleka Varadharajan. Subsurface property mapping using google alphaearth foundations.arXiv preprint arXiv:2604.14756, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[26]
Christopher F Brown, Michal R Kazmierski, Valerie J Pasquarella, William J Rucklidge, Masha Samsikova, Chenhui Zhang, Evan Shelhamer, Estefania Lahera, Olivia Wiles, Simon Ilyushchenko, et al. Alphaearth foundations: An embedding field model for accurate and efficient global mapping from sparse label data. arXiv preprint arXiv:2507.22291, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[27]
Arnab Mazumder, Javier E Santos, Noah Hobbs, Mohamed Mehana, and Daniel O’Malley. The transpar- ent earth: A multimodal foundation model for the earth’s subsurface.arXiv preprint arXiv:2509.02783, 2025
-
[28]
Tom Parsons. The basin and range province. InDevelopments in Geotectonics, volume 25, pages 277–XV. Elsevier, 2006. 16
work page 2006
-
[29]
International Conference on Learning Representations , year=
Kenneth Li, Aspen K Hopkins, David Bau, Fernanda Vi´ egas, Hanspeter Pfister, and Martin Wattenberg. Emergent world representations: Exploring a sequence model trained on a synthetic task.arXiv preprint arXiv:2210.13382, 2022
-
[30]
David Ha and J¨ urgen Schmidhuber. Recurrent world models facilitate policy evolution.Advances in neural information processing systems, 31, 2018
work page 2018
-
[31]
Richard S. Sutton. The bitter lesson. http://www.incompleteideas.net/IncIdeas/BitterLesson. html, March 2019. Accessed 14 May 2026
work page 2019
-
[32]
Waldo R Tobler. A computer movie simulating urban growth in the detroit region.Economic geography, 46(sup1):234–240, 1970
work page 1970
-
[33]
The Bayesian Geometry of Transformer Attention
Naman Agarwal, Siddhartha R Dalal, and Vishal Misra. The bayesian geometry of transformer attention. arXiv preprint arXiv:2512.22471, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[34]
F. L. Holgate and E. J. Gerner. OZTemp Well Temperature Data, 2010. Publication date: 2010-01-01; revised 2019-04-08
work page 2010
-
[35]
J. Brinsky, A. Singh, T. E. Hauck, M. Grobe, and D. Palombi. Subsurface Temperature Model of Alberta: Input Data (tabular data, tab-delimited format), 2022. AER/AGS Digital Data 2021-0029; published 2022-06-17
work page 2022
-
[36]
C. Amante and B. W. Eakins. ETOPO1 1 arc-minute global relief model: Procedures, data sources and analysis. Technical Report NOAA Technical Memorandum NESDIS NGDC-24, National Geophysical Data Center, NOAA, 2009
work page 2009
-
[37]
Narelle Neumann, Mike Sandiford, and John Foden. Regional geochemistry and continental heat flow: implications for the origin of the south australian heat flow anomaly.Earth and Planetary Science Letters, 183(1-2):107–120, 2000
work page 2000
-
[38]
Angus L Nixon, Nicholas Fernie, Stijn Glorie, Martin Hand, and Betina Bendell. Thermal evolution and sediment provenance of the cooper–eromanga basin: Insights from detrital apatite.Basin Research, 36(1):e12843, 2024
work page 2024
-
[39]
Joschka R¨ oth and Ralf Littke. Down under and under cover—the tectonic and thermal history of the cooper and central eromanga basins (central eastern australia).Geosciences, 12(3):117, 2022. 17 A Supplementary Information A.1 Training Details Hyperparameter Base Transformer In-Context Local Frame Augmentation Multiscale PE Random seed 0 0 0 0 Batch size ...
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.