A Non-stationary, Amortized, Transfer Learning Approach for Modeling Italian Air Quality
Pith reviewed 2026-05-10 02:51 UTC · model grok-4.3
The pith
A neural network learns non-stationary correlation structures from gridded models to improve air quality predictions from sparse stations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors present a framework that first trains an image-to-image neural network on gridded CTM outputs to learn millions of spatially varying parameters defining a nonstationary anisotropic covariance. This covariance is then embedded into the LatticeKrig basis-function model and projected onto a finer grid while transferring to the change-of-support point data from stations. A final likelihood refinement adjusts the correlation range to capture fine-scale variability, resulting in predictions that outperform stationary alternatives on both data types.
What carries the argument
An image-to-image neural network that amortizes the estimation of a spatially varying precision matrix for the LatticeKrig basis functions, enabling transfer of the non-stationary correlation structure across data sources with different supports.
If this is right
- Enables production of daily NO2 concentration maps on a fine grid with associated uncertainty.
- Accommodates the complex, non-stationary spatial patterns induced by Italy's geography and terrain.
- Demonstrates improved accuracy over stationary geostatistical models when validated on held-out station data.
- Provides a scalable way to integrate gridded simulation outputs with point observations without requiring full re-estimation of the covariance.
Where Pith is reading between the lines
- The amortized neural estimation could be extended to incorporate temporal dynamics for forecasting future air quality.
- Similar transfer learning might apply to other environmental variables where high-resolution simulations exist alongside sparse observations.
- The method's speed in learning the correlation structure suggests potential for near-real-time updating of maps as new data becomes available.
Load-bearing premise
That the correlation structure estimated from the smoothed CTM grids can be directly transferred to the finer-scale station data via the basis projection without substantial mismatch or bias.
What would settle it
Observing that a stationary model achieves equivalent or better predictive performance on cross-validated station measurements than the transferred non-stationary version would indicate that the non-stationarity transfer does not provide the claimed benefit.
Figures
read the original abstract
Air quality monitoring in Italy relies on sparse, irregular, ground-based stations that provide high-quality but incomplete measurements of pollution. Chemical transport models (CTMs) offer full spatial and temporal coverage but smooth over local variability. We develop a spatial transfer-learning framework that integrates these two data sources to produce daily, fine-grid predictions of nitrogen dioxide (NO$_2$) concentrations across Italy for 2023, with uncertainty quantification. The resulting maps provide a resource for decision making in downstream applications such as epidemiology and environmental policy. Our approach builds on the geostatistical LatticeKrig framework, which uses compactly supported basis functions and coefficients governed by a sparse precision matrix. We learn a nonstationary, anisotropic correlation structure from the gridded CTM outputs using an image-to-image neural architecture that estimates millions of spatially varying parameters in a matter of seconds. The basis-function representation enables this covariance structure to be transferred to the point-level station data and projected onto a finer prediction grid, a key extension for handling the change of support between data sources. A likelihood-based refinement step then adjusts the correlation range to recover fine-scale variability smoothed out by the gridded data. The proposed methodology results in a flexible, non-stationary, and anisotropic representation of the spatial process, better accommodating the complex geography of Italy. Performance is assessed through experiments on both gridded CTM outputs and point-level station measurements, demonstrating improvements over the stationary formulation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a non-stationary, amortized transfer-learning framework that extends the LatticeKrig model to integrate gridded chemical transport model (CTM) outputs with sparse point-level station measurements for daily NO2 concentration mapping across Italy in 2023. An image-to-image neural network learns millions of spatially varying parameters to capture anisotropic, non-stationary correlation structure from the CTM grids; this structure is projected onto irregular station locations and a finer prediction grid via compactly supported basis functions, followed by a likelihood-based refinement of the global correlation range to recover fine-scale variability, with uncertainty quantification. The central claim is that this yields a flexible representation better suited to Italy's complex geography and demonstrates improvements over stationary formulations in experiments on both gridded and point data.
Significance. If the transfer and refinement steps hold under scrutiny, the work offers a scalable, computationally efficient route to non-stationary spatial prediction that amortizes covariance estimation from large grids while explicitly handling change-of-support; this could be useful for environmental statistics applications requiring fine-scale maps with uncertainty, such as epidemiology or policy support. The combination of neural amortization with LatticeKrig basis projection is a technically interesting extension.
major comments (3)
- [Abstract] Abstract: the assertion that experiments 'demonstrate improvements over the stationary formulation' is unsupported by any quantitative metrics, error bars, cross-validation scores, or specific comparison results, which is load-bearing for the central performance claim.
- [§3] §3 (transfer and projection step): the non-stationary anisotropic structure learned from CTM grids is projected onto station locations via the compactly supported basis functions, but no quantitative diagnostic (e.g., comparison of projected local ranges or principal axes against empirical variograms computed directly from station data) is described to verify that CTM smoothing has not distorted the fine-scale structure; this assumption is central to the transfer-learning validity.
- [§4] §4 (experiments on station data): the likelihood refinement step tunes only a global range parameter after transfer; without reported sensitivity checks or hold-out validation showing that the spatially varying components remain appropriate at point support, it is unclear whether fine-scale variability is recovered without bias or overfitting across Italy's terrain.
minor comments (2)
- [Abstract] Abstract: the phrase 'amortized' is used in the title and text but is not explicitly defined in terms of what computation is being amortized (neural network weights versus per-prediction inference).
- [§2] Notation: the description of the sparse precision matrix and its relation to the neural-network outputs would benefit from an early equation reference to avoid ambiguity when the basis-function projection is introduced.
Simulated Author's Rebuttal
We thank the referee for their insightful and constructive comments, which have helped us identify areas where the manuscript can be strengthened. We provide point-by-point responses to the major comments below, along with our plans for revision.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that experiments 'demonstrate improvements over the stationary formulation' is unsupported by any quantitative metrics, error bars, cross-validation scores, or specific comparison results, which is load-bearing for the central performance claim.
Authors: We agree that the abstract would be improved by including specific quantitative support for the performance claim. While Section 4 of the manuscript reports cross-validation results, RMSE reductions, and log-likelihood improvements from the station data experiments, these details were not summarized numerically in the abstract. We will revise the abstract to incorporate key quantitative metrics, such as the observed RMSE improvement and cross-validation scores, to make the claim fully supported. revision: yes
-
Referee: [§3] §3 (transfer and projection step): the non-stationary anisotropic structure learned from CTM grids is projected onto station locations via the compactly supported basis functions, but no quantitative diagnostic (e.g., comparison of projected local ranges or principal axes against empirical variograms computed directly from station data) is described to verify that CTM smoothing has not distorted the fine-scale structure; this assumption is central to the transfer-learning validity.
Authors: This is a fair observation on the need for explicit validation of the transfer step. Section 3 describes the projection of the learned non-stationary structure onto station locations using the LatticeKrig basis functions, but does not include direct quantitative comparisons to station-derived empirical variograms. In the revision we will add such diagnostics, for example by comparing the projected local ranges and anisotropy axes at station sites against variograms estimated from station residuals, to confirm that the transferred structure preserves relevant fine-scale features. revision: yes
-
Referee: [§4] §4 (experiments on station data): the likelihood refinement step tunes only a global range parameter after transfer; without reported sensitivity checks or hold-out validation showing that the spatially varying components remain appropriate at point support, it is unclear whether fine-scale variability is recovered without bias or overfitting across Italy's terrain.
Authors: We appreciate the referee drawing attention to the validation of the refinement step. The current procedure in Section 4 fixes the spatially varying parameters transferred from the CTM and optimizes only the global range parameter. To address concerns about suitability at point support, we will expand the experiments in the revised manuscript to include sensitivity analyses (e.g., perturbing the fixed parameters within plausible ranges) and additional hold-out validation on withheld stations, stratified by terrain type, to demonstrate that fine-scale variability is recovered without systematic bias or overfitting. revision: yes
Circularity Check
No significant circularity; derivation remains self-contained
full rationale
The paper trains an image-to-image network on external CTM gridded outputs to estimate spatially varying non-stationary parameters, projects the resulting covariance via LatticeKrig compactly supported basis functions onto irregular station locations, and performs a separate likelihood refinement on the point data to adjust the global range. None of these steps reduce the final predictions or uncertainty quantification to the inputs by construction; the neural network outputs are not redefined in terms of the station likelihood, and the basis projection is a fixed linear operation independent of the target variable. Prior LatticeKrig work is invoked for the basis representation but is not a self-citation load-bearing the central claim, as the non-stationary structure itself is learned from CTM data and validated on held-out station measurements. Experiments explicitly compare against the stationary baseline on both data types, confirming independent content.
Axiom & Free-Parameter Ledger
free parameters (2)
- neural network weights
- correlation range adjustment
axioms (2)
- domain assumption CTM-derived correlation structure is transferable to station data via basis-function projection without introducing systematic bias from smoothing or change of support.
- standard math The LatticeKrig sparse precision matrix representation remains valid when the correlation parameters are replaced by the neural-network output.
Reference graph
Works this paper leans on
-
[1]
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Pytorch: An imperative style, high-performance deep learning library , author=. arXiv preprint arXiv:1912.01703 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1912
-
[2]
Decoupled Weight Decay Regularization
Decoupled weight decay regularization , author=. arXiv preprint arXiv:1711.05101 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Chen, Jieneng and Lu, Yongyi and Yu, Qihang and Luo, Xiangde and Adeli, Ehsan and Wang, Yan and Lu, Le and Yuille, Alan L and Zhou, Yuyin , journal=
-
[4]
Geostatistics for Large Datasets , isbn =
Sun, Ying and Li, Bo and Genton, Marc , year =. Geostatistics for Large Datasets , isbn =
-
[5]
Journal of the Korean Statistical Society , volume=
A modeling approach for large spatial datasets , author=. Journal of the Korean Statistical Society , volume=. 2008 , publisher=
work page 2008
-
[6]
Environmetrics: The official journal of the International Environmetrics Society , volume=
Spatial modelling using a new class of nonstationary covariance functions , author=. Environmetrics: The official journal of the International Environmetrics Society , volume=. 2006 , publisher=
work page 2006
-
[7]
Models-3 community multiscale air quality (
Binkowski, Francis S and Roselle, Shawn J , journal=. Models-3 community multiscale air quality (. 2003 , publisher=
work page 2003
-
[8]
Computational Statistics & Data Analysis , volume=
Neural networks for parameter estimation in intractable models , author=. Computational Statistics & Data Analysis , volume=. 2023 , publisher=
work page 2023
-
[9]
Banesh, Divya and Panda, Nishant and Biswas, Ayan and Van Roekel, Luke and Oyen, Diane and Urban, Nathan and Grosskopf, Michael and Wolfe, Jonathan and Lawrence, Earl , booktitle=. Fast. 2021 , organization=
work page 2021
-
[10]
Task-agnostic amortized inference of
Liu, Sulin and Sun, Xingyuan and Ramadge, Peter J and Adams, Ryan P , journal=. Task-agnostic amortized inference of
-
[11]
Annual Review of Statistics and Its Application , volume=
Neural methods for amortized inference , author=. Annual Review of Statistics and Its Application , volume=. 2024 , publisher=
work page 2024
-
[12]
Environmental and Ecological Statistics , volume=
A non-stationary spatial model of PM2.5 with localized transfer learning from numerical model output , author=. Environmental and Ecological Statistics , volume=. 2026 , publisher=
work page 2026
-
[13]
LatticeVision: Image to Image Networks for Modeling Non-Stationary Spatial Data
LatticeVision: Image to Image Networks for Modeling Non-Stationary Spatial Data , author=. arXiv preprint arXiv:2505.09803 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Fioravanti, Guido and Cameletti, Michela and Martino, Sara and Cattani, Giorgio and Pisoni, Enrico , journal=. A spatiotemporal analysis of. 2022 , publisher=
work page 2022
-
[15]
Nychka, Douglas and Bandyopadhyay, Soutir and Hammerling, Dorit and Lindgren, Finn and Sain, Stephan , journal=. A multiresolution. 2015 , publisher=
work page 2015
- [16]
-
[17]
Lindgren, Finn and Rue, H. An explicit link between. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2011 , publisher=
work page 2011
-
[18]
Multivariate low-rank state-space model with
Rodeschini, Jacopo and Tedesco, Lorenzo and Finazzi, Francesco and Otto, Philipp and Fass. Multivariate low-rank state-space model with. Spatial Statistics , pages=. 2026 , publisher=
work page 2026
-
[19]
Lindgren, Finn and Bolin, David and Rue, H. The. Spatial Statistics , volume=. 2022 , publisher=
work page 2022
- [20]
-
[21]
On stationary processes in the plane , author=. Biometrika , pages=. 1954 , publisher=
work page 1954
-
[22]
Advances and challenges in space-time modelling of natural events , pages=
Geostatistics for large datasets , author=. Advances and challenges in space-time modelling of natural events , pages=. 2011 , publisher=
work page 2011
-
[23]
Modeling spatial data using local likelihood estimation and a
Wiens, Ashton and Nychka, Douglas and Kleiber, William , journal=. Modeling spatial data using local likelihood estimation and a. 2020 , publisher=
work page 2020
-
[24]
arXiv preprint arXiv:2508.20067 , year=
Neural Conditional Simulation for Complex Spatial Processes , author=. arXiv preprint arXiv:2508.20067 , year=
-
[25]
Fast parameter estimation of generalized extreme value distribution using neural networks , author=. Environmetrics , volume=. 2024 , publisher=
work page 2024
-
[26]
Fast covariance parameter estimation of spatial
Gerber, Florian and Nychka, Douglas , journal=. Fast covariance parameter estimation of spatial. 2021 , publisher=
work page 2021
-
[27]
Likelihood-free parameter estimation with neural
Sainsbury-Dale, Matthew and Zammit-Mangion, Andrew and Huser, Rapha. Likelihood-free parameter estimation with neural. The American Statistician , volume=. 2024 , publisher=
work page 2024
-
[28]
Modeling Spatial Extremes using Non-
Rai, Sweta and Nychka, Douglas W and Bandyopadhyay, Soutir , journal=. Modeling Spatial Extremes using Non-
-
[29]
Advances in neural information processing systems , volume=
Deep sets , author=. Advances in neural information processing systems , volume=
-
[30]
Forlani, Chiara and Bhatt, Samir and Cameletti, Michela and Krainski, Elias and Blangiardo, Marta , journal=. A joint. 2020 , publisher=
work page 2020
- [31]
-
[32]
Space-time data fusion under error in computer model output: an application to modeling air quality , author=. Biometrics , volume=. 2012 , publisher=
work page 2012
-
[33]
Journal of agricultural, biological, and environmental statistics , volume=
A spatio-temporal downscaler for output from numerical models , author=. Journal of agricultural, biological, and environmental statistics , volume=. 2010 , publisher=
work page 2010
- [34]
-
[35]
Model evaluation and spatial interpolation by
Fuentes, Montserrat and Raftery, Adrian E , journal=. Model evaluation and spatial interpolation by. 2005 , publisher=
work page 2005
-
[36]
Journal of the American Statistical Association , volume=
Combining incompatible spatial data , author=. Journal of the American Statistical Association , volume=. 2002 , publisher=
work page 2002
-
[37]
Shetty, Shobitha and Schneider, Philipp and Stebel, Kerstin and Hamer, Paul David and Kylling, Arve and Berntsen, Terje Koren , journal=. Estimating surface. 2024 , publisher=
work page 2024
-
[38]
Sun, Wenfu and Tack, Frederik and Clarisse, Lieven and Schneider, Rochelle and Stavrakou, Trissevgeni and Van Roozendael, Michel , journal=. Inferring surface. 2024 , publisher=
work page 2024
-
[39]
Deep learning estimation of daily ground-level
Ghahremanloo, Masoud and Lops, Yannic and Choi, Yunsoo and Yeganeh, Bijan , journal=. Deep learning estimation of daily ground-level. 2021 , publisher=
work page 2021
-
[40]
Stafoggia, Massimo and Bellander, Tom and Bucci, Simone and Davoli, Marina and De Hoogh, Kees and De'Donato, Francesca and Gariazzo, Claudio and Lyapustin, Alexei and Michelozzi, Paola and Renzi, Matteo and others , journal=. Estimation of daily. 2019 , publisher=
work page 2019
-
[41]
Journal of Geophysical Research: Machine Learning and Computation , volume=
Air quality estimation and forecasting via data fusion with uncertainty quantification: theoretical framework and preliminary results , author=. Journal of Geophysical Research: Machine Learning and Computation , volume=. 2024 , publisher=
work page 2024
- [42]
-
[43]
Journal of the American Statistical Association , volume=
Spatial modeling with spatially varying coefficient processes , author=. Journal of the American Statistical Association , volume=. 2003 , publisher=
work page 2003
-
[44]
Hierarchical modeling and analysis for spatial data , author=. 2003 , publisher=
work page 2003
-
[45]
Journal of Computational and Graphical Statistics , volume=
Covariance tapering for interpolation of large spatial datasets , author=. Journal of Computational and Graphical Statistics , volume=. 2006 , publisher=
work page 2006
-
[46]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Estimation and model identification for continuous spatial processes , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1988 , publisher=
work page 1988
-
[47]
Journal of the American Statistical Association , volume=
Approximate likelihood for large irregularly spaced spatial data , author=. Journal of the American Statistical Association , volume=. 2007 , publisher=
work page 2007
-
[48]
Covariance tapering for multivariate
Bevilacqua, Moreno and Fass. Covariance tapering for multivariate. Statistical Methods & Applications , volume=. 2016 , publisher=
work page 2016
-
[49]
The Dataset of Daily Air Quality for the Years 2013-2023 in
Alessandro. The Dataset of Daily Air Quality for the Years 2013-2023 in. 2026 , eprint=
work page 2013
-
[50]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Fixed rank kriging for very large spatial data sets , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2008 , publisher=
work page 2008
-
[51]
Journal of agricultural, biological and environmental Statistics , volume=
A case study competition among methods for analyzing large spatial data , author=. Journal of agricultural, biological and environmental Statistics , volume=. 2019 , publisher=
work page 2019
-
[52]
Datta, Abhirup and Banerjee, Sudipto and Finley, Andrew O and Gelfand, Alan E , journal=. Hierarchical nearest-neighbor. 2016 , publisher=
work page 2016
-
[53]
2021 , publisher =. doi:10.24381/7cc0465a , url =
-
[54]
Kuenen, J. J. P. and Dellaert, S. and Visschedijk, A. and Jalkanen, J.-P. and Super, I. and Denier van der Gon, H. A. C. , title =. Earth System Science Data , year =
-
[55]
Earth System Science Data , year =
Guevara, Marc and Jorba, Oriol and Tena, Carlos and. Earth System Science Data , year =. doi:10.5194/essd-13-367-2021 , url =
-
[56]
Earth system science data , volume=
Mu. Earth system science data , volume=. 2021 , publisher=
work page 2021
-
[57]
Copernicus climate change service (c3s) climate data store (cds) , volume=
Hersbach, Hans and Bell, Bill and Berrisford, Paul and Biavati, Gionata and Hor. Copernicus climate change service (c3s) climate data store (cds) , volume=
-
[58]
Otto, Philipp and Fusta Moro, Alessandro and Rodeschini, Jacopo and Shaboviq, Qendrim and Ignaccolo, Rosaria and Golini, Natalia and Cameletti, Michela and Maranzano, Paolo and Finazzi, Francesco and Fass. Spatiotemporal modelling of. Environmental and Ecological Statistics , volume=. 2024 , publisher=
work page 2024
-
[59]
Agrimonia: a dataset on livestock, meteorology and air quality in the
Fass. Agrimonia: a dataset on livestock, meteorology and air quality in the. Scientific Data , volume=. 2023 , publisher=
work page 2023
-
[60]
Air pollution removal by green infrastructures and urban forests in the city of
Bottalico, Francesca and Chirici, Gherardo and Giannetti, Francesca and De Marco, Alessandra and Nocentini, Susanna and Paoletti, Elena and Salbitano, Fabio and Sanesi, Giovanni and Serenelli, Chiara and Travaglini, Davide , journal=. Air pollution removal by green infrastructures and urban forests in the city of. 2016 , publisher=
work page 2016
-
[61]
Machine learning reveals that prolonged exposure to air pollution is associated with
Gatti, Roberto Cazzolla and Velichevskaya, Alena and Tateo, Andrea and Amoroso, Nicola and Monaco, Alfonso , journal=. Machine learning reveals that prolonged exposure to air pollution is associated with. 2020 , publisher=
work page 2020
-
[62]
Air pollution impact on pregnancy outcomes in
Capobussi, Matteo and Tettamanti, Roberto and Marcolin, Luca and Piovesan, Luca and Bronzin, Silvia and Gattoni, Maria Elena and Polloni, Ilaria and Sabatino, Giuliana and Tersalvi, Carlo A and Auxilia, Francesco and others , journal=. Air pollution impact on pregnancy outcomes in. 2016 , publisher=
work page 2016
-
[63]
The association between air pollution and the incidence of idiopathic pulmonary fibrosis in
Conti, Sara and Harari, Sergio and Caminati, Antonella and Zanobetti, Antonella and Schwartz, Joel D and Bertazzi, Pietro A and Cesana, Giancarlo and Madotto, Fabiana , journal=. The association between air pollution and the incidence of idiopathic pulmonary fibrosis in. 2018 , publisher=
work page 2018
-
[64]
Short-term exposure to particulate matter (
Orellano, Pablo and Reynoso, Julieta and Quaranta, Nancy and Bardach, Ariel and Ciapponi, Agustin , journal=. Short-term exposure to particulate matter (. 2020 , publisher=
work page 2020
-
[65]
European Respiratory Journal , volume=
Nitrogen dioxide and mortality: review and meta-analysis of long-term studies , author=. European Respiratory Journal , volume=. 2014 , publisher=
work page 2014
-
[66]
The effects of air pollution on
Coker, Eric S and Cavalli, Laura and Fabrizi, Enrico and Guastella, Gianni and Lippo, Enrico and Parisi, Maria Laura and Pontarollo, Nicola and Rizzati, Massimiliano and Varacca, Alessandro and Vergalli, Sergio , journal=. The effects of air pollution on. 2020 , publisher=
work page 2020
-
[67]
Multivariate Low-Rank State-Space Model with
Rodeschini, Jacopo and Tedesco, Lorenzo and Finazzi, Francesco and Otto, Philipp and Fass. Multivariate Low-Rank State-Space Model with. arXiv preprint arXiv:2509.12825 , year=
-
[68]
Calculli, Crescenza and Fass. Maximum likelihood estimation of the multivariate hidden dynamic geostatistical model with application to air quality in. Environmetrics , volume=. 2015 , publisher=
work page 2015
-
[69]
Stochastic environmental research and risk assessment , volume=
Kriging with external drift for functional data for air quality monitoring , author=. Stochastic environmental research and risk assessment , volume=. 2014 , publisher=
work page 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.