pith. machine review for the scientific record. sign in

arxiv: 2604.03456 · v1 · submitted 2026-04-03 · 💻 cs.LG · cs.CY

Recognition: no theorem link

Earth Embeddings Reveal Diverse Urban Signals from Space

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:25 UTC · model grok-4.3

classification 💻 cs.LG cs.CY
keywords earth embeddingssatellite imageryurban indicatorsremote sensingneighborhood monitoringsupervised learningbuilt environment
0
0 comments X

The pith

Satellite-derived Earth embeddings predict neighborhood urban indicators such as health burdens and commuting modes across U.S. cities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates whether compact vector representations extracted from satellite images, called Earth embeddings, can serve as low-cost proxies for traditional census and survey data on urban conditions. Using a single supervised learning setup, it tests three families of embeddings on the task of predicting 14 neighborhood indicators spanning crime, income, health, and travel across six U.S. metropolitan areas from 2020 to 2023. The embeddings capture substantial variation, performing best on outcomes tied to physical structure such as chronic health burdens and dominant commuting modes, while showing weaker results for behaviors more shaped by policy or individual choice such as cycling rates. Performance remains stable from year to year yet differs noticeably between cities, and compact 64-dimensional versions retain more useful signal than dimensionality-reduced versions of larger embeddings. This points to a practical route for frequent, scalable neighborhood monitoring aligned with sustainable development goals.

Core claim

Earth embeddings from models such as AlphaEarth, Prithvi, and Clay, when fed into a unified supervised learning framework, predict 14 neighborhood-level urban indicators with meaningful accuracy; skill is highest for outcomes directly linked to built-environment structure including chronic health burdens and dominant commuting modes, remains comparatively stable across years, and varies across cities in ways associated with urban form.

What carries the argument

A unified supervised learning framework that treats Earth embeddings as input features to predict urban indicators, with systematic comparisons across embedding families, evaluation settings (global, city-wise, year-wise, city-year), and controlled dimensionality reductions.

If this is right

  • Earth embeddings provide scalable, frequently updatable features for neighborhood-scale urban monitoring aligned with SDG targets.
  • Predictive performance is strongest for indicators most directly tied to visible built-environment structure.
  • Cross-city differences in accuracy track urban form in task-specific ways.
  • Compact 64-dimensional embeddings remain more informative than 64-dimensional reductions of larger models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could support low-cost tracking of urban change in regions lacking regular census coverage.
  • Task-specific performance gaps suggest opportunities to refine embeddings by incorporating more fine-scale behavioral cues.
  • Linking accuracy variation to measurable urban-form metrics could guide future embedding design.

Load-bearing premise

The chosen supervised learning framework and selected indicators accurately reflect transferable urban signals from the embeddings without substantial domain shift or unmeasured confounding across cities and years.

What would settle it

Finding that the embeddings lose all predictive power above simple baselines for health burdens and commuting modes when tested on a new city or future year outside the six metropolitan areas and 2020-2023 window.

read the original abstract

Conventional urban indicators derived from censuses, surveys, and administrative records are often costly, spatially inconsistent, and slow to update. Recent geospatial foundation models enable Earth embeddings, compact satellite image representations transferable across downstream tasks, but their utility for neighborhood-scale urban monitoring remains unclear. Here, we benchmark three Earth embedding families, AlphaEarth, Prithvi, and Clay, for urban signal prediction across six U.S. metropolitan areas from 2020 to 2023. Using a unified supervised-learning framework, we predict 14 neighborhood-level indicators spanning crime, income, health, and travel behavior, and evaluate performance under four settings: global, city-wise, year-wise, and city-year. Results show that Earth embeddings capture substantial urban variation, with the highest predictive skill for outcomes more directly tied to built-environment structure, including chronic health burdens and dominant commuting modes. By contrast, indicators shaped more strongly by fine-scale behavior and local policy, such as cycling, remain difficult to infer. Predictive performance varies markedly across cities but remains comparatively stable across years, indicating strong spatial heterogeneity alongside temporal robustness. Exploratory analysis suggests that cross-city variation in predictive performance is associated with urban form in task-specific ways. Controlled dimensionality experiments show that representation efficiency is critical: compact 64-dimensional AlphaEarth embeddings remain more informative than 64-dimensional reductions of Prithvi and Clay. This study establishes a benchmark for evaluating Earth embeddings in urban remote sensing and demonstrates their potential as scalable, low-cost features for SDG-aligned neighborhood-scale urban monitoring.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript benchmarks three Earth embedding families (AlphaEarth, Prithvi, Clay) for predicting 14 neighborhood-level urban indicators spanning crime, income, health, and travel behavior across six U.S. metropolitan areas (2020–2023). Using a unified supervised-learning framework, performance is evaluated under global, city-wise, year-wise, and city-year settings. The central claims are that embeddings capture substantial urban variation (strongest for built-environment outcomes such as chronic health burdens and dominant commuting modes), that performance varies markedly across cities but is stable across years, and that compact 64-dimensional AlphaEarth embeddings outperform dimensionality-reduced versions of the other models.

Significance. If the empirical results hold after addressing methodological transparency, the work supplies a useful benchmark for geospatial foundation models in urban remote sensing. It demonstrates the feasibility of low-cost, scalable neighborhood-scale monitoring aligned with SDG indicators and isolates the practical importance of representation efficiency and temporal robustness versus spatial heterogeneity.

major comments (3)
  1. [Abstract / Methods] Abstract and Methods: The reported performance differences and exploratory associations provide no details on data splits, error bars, statistical tests, or exact model specifications (e.g., base learner, regularization, hyperparameter search). These omissions are load-bearing for the claim that embeddings deliver transferable urban signals.
  2. [Results] Results: Marked cross-city performance variation is reported without city fixed effects, policy covariates, or explicit domain-adaptation steps beyond the four evaluation settings. This leaves open the possibility that predictive skill for health and commuting indicators is driven by unmeasured city-level confounders rather than generalizable features from the satellite embeddings.
  3. [Methods] Methods: The unified supervised-learning framework is not specified with respect to handling of domain shift across cities/years or the precise construction of the 64-dimensional reductions used in the controlled dimensionality experiments; without these details the efficiency claim for AlphaEarth cannot be fully evaluated.
minor comments (1)
  1. [Abstract] Abstract: Consider reporting the total number of neighborhoods and the exact temporal coverage per city to help readers gauge the scale and balance of the benchmark.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which have helped us improve the clarity and transparency of the manuscript. We have revised the paper to address the major concerns regarding methodological details, potential confounders, and domain shift. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [Abstract / Methods] Abstract and Methods: The reported performance differences and exploratory associations provide no details on data splits, error bars, statistical tests, or exact model specifications (e.g., base learner, regularization, hyperparameter search). These omissions are load-bearing for the claim that embeddings deliver transferable urban signals.

    Authors: We agree that these details are essential for reproducibility and to substantiate our claims. In the revised manuscript we have added a new subsection in Methods titled 'Supervised Learning Pipeline' that specifies: (i) data splits as random 70/15/15 train/validation/test partitions with city-year stratification to prevent leakage; (ii) error bars as standard deviations across 5-fold cross-validation; (iii) statistical comparisons via paired t-tests with Bonferroni correction, with p-values now reported in all result tables; and (iv) exact model specifications (ridge regression with L2 regularization, regularization strength selected by grid search over {0.01, 0.1, 1, 10, 100} on the validation fold). The abstract has been updated to reference the cross-validated evaluation protocol. revision: yes

  2. Referee: [Results] Results: Marked cross-city performance variation is reported without city fixed effects, policy covariates, or explicit domain-adaptation steps beyond the four evaluation settings. This leaves open the possibility that predictive skill for health and commuting indicators is driven by unmeasured city-level confounders rather than generalizable features from the satellite embeddings.

    Authors: We acknowledge the possibility of city-level confounders. Our design intentionally omits additional covariates to isolate the raw predictive signal contained in the embeddings. The city-wise and city-year settings already provide within-city estimates that are less affected by between-city differences, while the global setting quantifies transfer. We have added a paragraph in the Discussion section explicitly noting the absence of city fixed effects or policy covariates and discussing how unmeasured factors (e.g., local zoning or enforcement) may contribute to observed cross-city variation. We also report an exploratory correlation between performance gaps and urban-form metrics (density, land-use entropy) to partially address the concern. No further domain-adaptation steps were introduced, as that would alter the benchmark's focus on zero-shot transferability. revision: partial

  3. Referee: [Methods] Methods: The unified supervised-learning framework is not specified with respect to handling of domain shift across cities/years or the precise construction of the 64-dimensional reductions used in the controlled dimensionality experiments; without these details the efficiency claim for AlphaEarth cannot be fully evaluated.

    Authors: We have expanded the Methods section to clarify both points. Domain shift is addressed solely through the four evaluation settings (global for cross-city transfer, city-wise for local performance, year-wise for temporal stability, and city-year for joint generalization); no explicit adaptation techniques such as adversarial training or fine-tuning were applied. For the dimensionality-controlled experiments, the 64-dimensional versions of Prithvi and Clay were obtained by PCA on the original embeddings, retaining the top 64 principal components (cumulative explained variance now stated as 82% for Prithvi and 76% for Clay in the supplementary material). AlphaEarth embeddings were used in their native 64-dimensional form. These additions allow direct evaluation of the efficiency claim. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmarking of pre-trained embeddings on external urban indicators

full rationale

The paper conducts supervised benchmarking of three external pre-trained Earth embedding models (AlphaEarth, Prithvi, Clay) to predict 14 neighborhood indicators drawn from independent census, survey, and administrative sources across six U.S. cities and four years. Performance is measured via standard metrics under global, city-wise, year-wise, and city-year splits with no equations, fitted parameters renamed as predictions, or self-referential definitions. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from the authors' prior work appear; the embeddings are treated as fixed inputs and the indicators as external ground truth. The central claims rest on observable predictive skill differences (e.g., higher for built-environment outcomes) that remain testable against held-out data and do not reduce to the input embeddings by construction. This is a standard transfer-learning evaluation with no derivation chain that collapses to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available; the central claim rests on the unstated assumption that the embeddings encode transferable urban signals and that the supervised setup isolates those signals from city-specific confounders.

axioms (1)
  • domain assumption Earth embeddings are transferable across downstream urban prediction tasks without major retraining
    Invoked by the unified supervised-learning framework and cross-city evaluation

pith-pipeline@v0.9.0 · 5593 in / 1174 out tokens · 32103 ms · 2026-05-13T19:25:32.793150+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 1 internal anchor

  1. [1]

    World Health Organization (2010)

    World Health Organization: Hidden Cities: Unmasking and Overcoming Health Inequities in Urban Settings. World Health Organization (2010)

  2. [2]

    UN-Habitat (2024)

    UN-Habitat: World Cities Report 2024: Cities and Climate Action|UN-Habitat. UN-Habitat (2024)

  3. [3]

    United Nations (2025)

    United Nations: World Urbanization Prospects 2025: Summary of Results|Population Division. United Nations (2025)

  4. [4]

    https://www.undp.org/sustainable- development-goals

    UNDP: United Nations Development Programme. https://www.undp.org/sustainable- development-goals

  5. [5]

    United Nations

    United Nations: Unsdg|Leave No One Behind. United Nations

  6. [6]

    United Nations

    United Nations: IAEG-SDGs — SDG Indicators. United Nations

  7. [7]

    https://www.census.gov/programs-surveys/decennial-census/about.html (2021)

    Bureau, U.C.: About the Decennial Census of Population and Housing. https://www.census.gov/programs-surveys/decennial-census/about.html (2021)

  8. [8]

    https://www.census.gov/programs- surveys/ahs.html (2025)

    Bureau, U.C.: American Housing Survey (AHS). https://www.census.gov/programs- surveys/ahs.html (2025)

  9. [9]

    https://www.census.gov/topics/research/guidance/restricted- use-microdata/administrative-data.html (2024)

    Bureau, U.C.: Administrative Data. https://www.census.gov/topics/research/guidance/restricted- use-microdata/administrative-data.html (2024)

  10. [10]

    Proceedings of the National Academy of Sciences120(27), 2220417120 (2023) https://doi.org/10.1073/pnas.2220417120

    Fan, Z., Zhang, F., Loo, B.P.Y., Ratti, C.: Urban visual intelligence: Uncovering hidden city profiles with street view images. Proceedings of the National Academy of Sciences120(27), 2220417120 (2023) https://doi.org/10.1073/pnas.2220417120

  11. [11]

    Landscape and Urban Planning215, 104217 (2021) https://doi.org/10.1016/j.landurbplan.2021.104217

    Biljecki, F., Ito, K.: Street view imagery in urban analytics and GIS: A review. Landscape and Urban Planning215, 104217 (2021) https://doi.org/10.1016/j.landurbplan.2021.104217

  12. [12]

    Computers, Environment and Urban Systems117, 102253 (2025) https://doi.org/10.1016/j.compenvurbsys.2025.102253

    Fan, Z., Feng, C.-C., Biljecki, F.: Coverage and bias of street view imagery in mapping the urban environment. Computers, Environment and Urban Systems117, 102253 (2025) https://doi.org/10.1016/j.compenvurbsys.2025.102253

  13. [13]

    In: 2009 IEEE 12th International Conference on Computer Vision, pp

    Frome, A., Cheung, G., Abdulkader, A., Zennaro, M., Wu, B., Bissacco, A., Adam, H., Neven, H., Vincent, L.: Large-scale privacy protection in Google Street View. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2373–2380 (2009). https: //doi.org/10.1109/ICCV.2009.5459413 13

  14. [14]

    Proceedings of the National Academy of Sciences114(50), 13108–13113 (2017) https://doi.org/10.1073/pnas.1700035114

    Gebru, T., Krause, J., Wang, Y., Chen, D., Deng, J., Aiden, E.L., Fei-Fei, L.: Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States. Proceedings of the National Academy of Sciences114(50), 13108–13113 (2017) https://doi.org/10.1073/pnas.1700035114

  15. [15]

    Remote Sensing of Environment210, 113–143 (2018) https: //doi.org/10.1016/j.rse.2018.03.017

    Rom´ an, M.O., Wang, Z., Sun, Q., Kalb, V., Miller, S.D., Molthan, A., Schultz, L., Bell, J., Stokes, E.C., Pandey, B., Seto, K.C., Hall, D., Oda, T., Wolfe, R.E., Lin, G., Golpayegani, N., Devadiga, S., Davidson, C., Sarkar, S., Praderas, C., Schmaltz, J., Boller, R., Stevens, J., Ramos Gonz´ alez, O.M., Padilla, E., Alonso, J., Detr´ es, Y., Armstrong, ...

  16. [16]

    Remote Sensing of Environment280, 113195 (2022) https://doi.org/10.1016/j.rse.2022.113195

    Wulder, M.A., Roy, D.P., Radeloff, V.C., Loveland, T.R., Anderson, M.C., Johnson, D.M., Healey, S., Zhu, Z., Scambos, T.A., Pahlevan, N., Hansen, M., Gorelick, N., Crawford, C.J., Masek, J.G., Hermosilla, T., White, J.C., Belward, A.S., Schaaf, C., Woodcock, C.E., Hunt- ington, J.L., Lymburner, L., Hostert, P., Gao, F., Lyapustin, A., Pekel, J.-F., Strobl...

  17. [17]

    Drusch, U

    Drusch, M., Del Bello, U., Carlier, S., Colin, O., Fernandez, V., Gascon, F., Hoersch, B., Isola, C., Laberinti, P., Martimort, P., Meygret, A., Spoto, F., Sy, O., Marchese, F., Bargellini, P.: Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services. Remote Sensing of Environment120, 25–36 (2012) https://doi.org/10.1016/j.rse.2011.11.026

  18. [18]

    Mellander, C., Lobo, J., Stolarick, K., Matheson, Z.: Night-Time Light Data: A Good Proxy Measure for Economic Activity? PLoS ONE10(10), 0139779 (2015) https://doi.org/10.1371/ journal.pone.0139779

  19. [19]

    IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing13, 5251–5263 (2020) https://doi.org/10.1109/JSTARS

    Stark, T., Wurm, M., Zhu, X.X., Taubenb¨ ock, H.: Satellite-Based Mapping of Urban Poverty With Transfer-Learned Slum Morphologies. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing13, 5251–5263 (2020) https://doi.org/10.1109/JSTARS. 2020.3018862

  20. [20]

    The World Bank Economic Review36(2), 382–412 (2022) https://doi.org/10.1093/wber/lhab015

    Engstrom, R., Hersh, J., Newhouse, D.: Poverty from Space: Using High Resolution Satellite Imagery for Estimating Economic Well-being. The World Bank Economic Review36(2), 382–412 (2022) https://doi.org/10.1093/wber/lhab015

  21. [21]

    IEEE Geoscience and Remote Sensing Magazine 4(2), 41–57 (2016) https://doi.org/10.1109/MGRS.2016.2548504

    Tuia, D., Persello, C., Bruzzone, L.: Domain Adaptation for the Classification of Remote Sens- ing Data: An Overview of Recent Advances. IEEE Geoscience and Remote Sensing Magazine 4(2), 41–57 (2016) https://doi.org/10.1109/MGRS.2016.2548504

  22. [22]

    arXiv (2025)

    Xiao, A., Xuan, W., Wang, J., Huang, J., Tao, D., Lu, S., Yokoya, N.: Foundation Models for Remote Sensing and Earth Observation: A Survey. arXiv (2025). https://doi.org/10.48550/ arXiv.2410.16602

  23. [23]

    Alphaearth foundations: An embedding field model for accurate and efficient global mapping from sparse label data.arXiv preprint arXiv:2507.22291, 2025

    Brown, C.F., Kazmierski, M.R., Pasquarella, V.J., Rucklidge, W.J., Samsikova, M., Zhang, C., Shelhamer, E., Lahera, E., Wiles, O., Ilyushchenko, S., Gorelick, N., Zhang, L.L., Alj, S., Schechter, E., Askay, S., Guinan, O., Moore, R., Boukouvalas, A., Kohli, P.: AlphaEarth Foundations: An Embedding Field Model for Accurate and Efficient Global Mapping from...

  24. [24]

    a rXiv preprint arXiv:2412.02732 (2024)

    Szwarcman, D., Roy, S., Fraccaro, P., G´ ıslason, T.E., Blumenstiel, B., Ghosal, R., Oliveira, P.H., Almeida, J.L.d.S., Sedona, R., Kang, Y., Chakraborty, S., Wang, S., Gomes, C., Kumar, A., Truong, M., Godwin, D., Lee, H., Hsu, C.-Y., Asanjan, A.A., Mujeci, B., Shidham, D., Keenan, T., Arevalo, P., Li, W., Alemohammad, H., Olofsson, P., Hain, C., Kennedy...

  25. [25]

    https://clay- foundation.github.io/model/

    Clay Foundation Model — Clay Foundation Model. https://clay- foundation.github.io/model/

  26. [26]

    Klemmer, K., Rolf, E., Russwurm, M., Camps-Valls, G., Czerkawski, M., Ermon, S., Francis, A., Jacobs, N., Kerner, H.R., Mackey, L., Mai, G., Aodha, O.M., Reichstein, M., Robinson, C., Rolnick, D., Shelhamer, E., Sitzmann, V., Tuia, D., Zhu, X.: Earth Embeddings: Towards AI-centric Representations of our Planet (2025)

  27. [27]

    Drusch, U

    Fang, H., Liang, S., Li, W., Chen, Y., Ma, H., Xu, J., Ma, Y., He, T., Tian, F., Zhang, F., Liang, H.: Generating an annual 30 m rice cover product for monsoon Asia (2018–2023) using harmonized Landsat and Sentinel-2 data and the NASA-IBM geospatial foundation model. Remote Sensing of Environment335, 115256 (2026) https://doi.org/10.1016/j.rse. 2026.115256

  28. [28]

    https://doc.arcgis.com/en/pretrained-models/latest/imagery/introduction-to-prithvi-flood- segmentation.htm?utm source=chatgpt.com (2024)

    Introduction to the Model—ArcGIS Pretrained Models|Documentation. https://doc.arcgis.com/en/pretrained-models/latest/imagery/introduction-to-prithvi-flood- segmentation.htm?utm source=chatgpt.com (2024)

  29. [29]

    Wiratama, W., Chong, M.K., Lim, Y.L., Ho, C.J.: Comparative Analysis of Fine-Tuned Foundation Models for Land Cover Classification using Sentinel-2 Imagery, Study Area: Sumatra and Kalimantan, Indonesia. The International Archives of the Photogramme- try, Remote Sensing and Spatial Information SciencesXL VIII-G-2025, 1559–1564 (2025) https://doi.org/10.51...

  30. [30]

    Remote Sensing17(20) (2025) https://doi.org/10.3390/rs17203472

    Alvarez, C.I., Vaca, C.A.U., Llumipanta, N.A.E.: Machine Learning for Urban Air Quality Prediction Using Google AlphaEarth Foundations Satellite Embeddings: A Case Study of Quito, Ecuador. Remote Sensing17(20) (2025) https://doi.org/10.3390/rs17203472

  31. [31]

    In: NeurIPS 2025 Workshop on Tackling Climate Change with Machine Learning (2025)

    Ashfaq, H., Arsal, M., Ashfaq, A.: Theory-guided deep learning with alphaearth embeddings for flash flood prediction in data-scarce regions. In: NeurIPS 2025 Workshop on Tackling Climate Change with Machine Learning (2025). https://www.climatechange.ai/papers/neurips2025/98

  32. [32]

    Harvesting AlphaEarth: Benchmarking the Geospatial Foundation Model for Agricultural Downstream Tasks

    Ma, Y., Shen, Y., Swatantran, A., Lobell, D.B.: Harvesting AlphaEarth: Benchmarking the Geospatial Foundation Model for Agricultural Downstream Tasks. arXiv (2025). https://doi. org/10.48550/arXiv.2601.00857

  33. [33]

    Current Environmental Health Reports9(1), 80–89 (2022) https://doi.org/10.1007/ s40572-022-00336-w

    Smith, G.S., Anjum, E., Francis, C., Deanes, L., Acey, C.: Climate Change, Environ- mental Disasters, and Health Inequities: The Underlying Role of Structural Inequali- ties. Current Environmental Health Reports9(1), 80–89 (2022) https://doi.org/10.1007/ s40572-022-00336-w

  34. [34]

    Journal of International Development35(7), 1753–1768 (2023) https://doi.org/10.1002/jid

    Hall, O., Dompae, F., Wahab, I., Dzanku, F.M.: A review of machine learning and satel- lite imagery for poverty prediction: Implications for development research and applications. Journal of International Development35(7), 1753–1768 (2023) https://doi.org/10.1002/jid. 3751

  35. [35]

    Urban Forestry & Urban Greening117, 129264 (2026) https://doi.org/10.1016/j.ufug.2026

    Gong, W., Wu, L., Zhu, C., Song, Y., Ye, X.: Revealing park visitation under dual environ- mental threats in a socially stratified city: Evidence from smartphone mobility data in Dallas. Urban Forestry & Urban Greening117, 129264 (2026) https://doi.org/10.1016/j.ufug.2026. 129264

  36. [36]

    Nature Communications16(1), 10372 (2025) https://doi.org/10.1038/s41467-025-65373-z

    Xu, Y., Gao, S., Huang, Q., G¨ o¸ cmen, A., Zhu, Q., Zhang, F.: Predicting human mobility 15 flows in cities using deep learning on satellite imagery. Nature Communications16(1), 10372 (2025) https://doi.org/10.1038/s41467-025-65373-z

  37. [37]

    JAMA Cardiology9(6), 556–564 (2024) https://doi

    Chen, Z., Dazard, J.-E., Khalifa, Y., Motairek, I., Kreatsoulas, C., Rajagopalan, S., Al- Kindi, S.: Deep Learning–Based Assessment of Built Environment From Satellite Images and Cardiometabolic Disease Prevalence. JAMA Cardiology9(6), 556–564 (2024) https://doi. org/10.1001/jamacardio.2024.0749

  38. [38]

    JAMA Network Open 7(12), 2449113 (2024) https://doi.org/10.1001/jamanetworkopen.2024.49113

    Yi, L., Harnois-Leblanc, S., Rifas-Shiman, S.L., Suel, E., Pescador Jimenez, M., Lin, P.-I.D., Hystad, P., Hankey, S., Zhang, W., Hivert, M.-F., Oken, E., Aris, I.M., James, P.: Satellite- Based and Street-View Green Space and Adiposity in US Children. JAMA Network Open 7(12), 2449113 (2024) https://doi.org/10.1001/jamanetworkopen.2024.49113

  39. [39]

    An improved NExT-DMD for efficient automated operational modal analysis.Applied Mathematical Modelling2026,156, 116823

    Pucher, J., Dill, J., Handy, S.: Infrastructure, programs, and policies to increase bicycling: An international review. Preventive Medicine50, 106–125 (2010) https://doi.org/10.1016/j. ypmed.2009.07.028

  40. [40]

    Transport Reviews 36(1), 9–27 (2016) https://doi.org/10.1080/01441647.2015.1069908

    Buehler, R., Dill, J.: Bikeway Networks: A Review of Effects on Cycling. Transport Reviews 36(1), 9–27 (2016) https://doi.org/10.1080/01441647.2015.1069908

  41. [41]

    Science353(6301), 790–794 (2016) https://doi.org/10.1126/science.aaf7894

    Jean, N., Burke, M., Xie, M., Alampay Davis, W.M., Lobell, D.B., Ermon, S.: Combining satellite imagery and machine learning to predict poverty. Science353(6301), 790–794 (2016) https://doi.org/10.1126/science.aaf7894

  42. [42]

    Random Structures & Algorithms22(1), 60–65 (2003) https://doi.org/10.1002/rsa.10073

    Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures & Algorithms22(1), 60–65 (2003) https://doi.org/10.1002/rsa.10073

  43. [43]

    In: IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium, pp

    Blumenstiel, B., Moor, V., Kienzler, R., Brunschwiler, T.: Multi-Spectral Remote Sens- ing Image Retrieval Using Geospatial Foundation Models. In: IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium, pp. 7286–7291 (2024). https: //doi.org/10.1109/IGARSS53475.2024.10641903

  44. [44]

    In: Proceedings of the 31st International Conference on Neural Information Processing Systems

    Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17, pp. 6405–6416. Curran Associates Inc., Red Hook, NY, USA (2017)

  45. [45]

    Morgan, Erica L

    Rachel E. Morgan, Erica L. Smith: The National Crime Victimization Survey and National Incident-Based Reporting System: A Complementary Picture of Crime in 2022|Bureau of Justice Statistics. Bureau of Justice Statistics (2023)

  46. [46]

    IEEE transactions on pattern analysis and machine intelligence45(4), 4396–4415 (2023) https: //doi.org/10.1109/TPAMI.2022.3195549

    Zhou, K., Liu, Z., Qiao, Y., Xiang, T., Loy, C.C.: Domain Generalization: A Survey. IEEE transactions on pattern analysis and machine intelligence45(4), 4396–4415 (2023) https: //doi.org/10.1109/TPAMI.2022.3195549

  47. [47]

    In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pp

    Pettersson, M.B., Kakooei, M., Ortheden, J., Johansson, F.D., Daoud, A.: Time Series of Satellite Imagery Improve Deep Learning Estimates of Neighborhood-Level Poverty in Africa. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pp. 6165–6173. International Joint Conferences on Artificial Intelligence Organiza...

  48. [48]

    Advances in Neural Information Processing Systems35, 197–211 (2022)

    Cong, Y., Khanna, S., Meng, C., Liu, P., Rozi, E., He, Y., Burke, M., Lobell, D.B., Ermon, S.: SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery. Advances in Neural Information Processing Systems35, 197–211 (2022)

  49. [49]

    International Journal of Computer Vision133(11), 16 7672–7709 (2025) https://doi.org/10.1007/s11263-025-02518-z

    Al-Emadi, S.A., Yang, Y., Ofli, F.: Analysing Satellite Imagery Classification under Spatial Domain Shift across Geographic Regions. International Journal of Computer Vision133(11), 16 7672–7709 (2025) https://doi.org/10.1007/s11263-025-02518-z

  50. [50]

    Curran Associates Inc., Red Hook, NY, USA (2019)

    Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J.V., Lakshmi- narayanan, B., Snoek, J.: Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. Curran Associates Inc., Red Hook, NY, USA (2019)

  51. [51]

    International Encyclopedia of Human Geogra- phy, 169–173 (2020) https://doi.org/10.1016/B978-0-08-102295-5.10406-8

    Buzzelli, M.: Modifiable Areal Unit Problem. International Encyclopedia of Human Geogra- phy, 169–173 (2020) https://doi.org/10.1016/B978-0-08-102295-5.10406-8

  52. [52]

    Preventive medicine126, 105735 (2019)

    Marquet, O., Hipp, J.A., Alberico, C., Huang, J.-H., Fry, D., Mazak, E., Lovasi, G.S., Floyd, M.F.: Short-term associations between objective crime, park-use, and park-based physical activity in low-income neighborhoods. Preventive medicine126, 105735 (2019)

  53. [53]

    https://www.census.gov/programs- surveys/acs/data.html (2025)

    Bureau, U.C.: American Community Survey Data. https://www.census.gov/programs- surveys/acs/data.html (2025)

  54. [54]

    https://data.cdc.gov/ (2025)

    Centers for Disease Control and Prevention: Data|Centers for Disease Control and Prevention. https://data.cdc.gov/ (2025)

  55. [55]

    In: Encyclopedia of Quality of Life and Well-Being Research, pp

    Zdaniuk, B.: Ordinary least-squares (ols) model. In: Encyclopedia of Quality of Life and Well-Being Research, pp. 4515–4517. Springer, Dordrecht (2014). https://doi.org/10.1007/ 978-94-007-0753-5 2008

  56. [56]

    Machine Learning45(1), 5–32 (2001) https://doi.org/10.1023/ A:1010933404324

    Breiman, L.: Random Forests. Machine Learning45(1), 5–32 (2001) https://doi.org/10.1023/ A:1010933404324

  57. [57]

    Proceedings of the 22nd

    Chen, T., Guestrin, C.: XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016). https://doi.org/10.1145/2939672.2939785

  58. [58]

    In: Proceedings of the 31st International Conference on Neural Information Processing Systems

    Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.-Y.: Lightgbm: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17, pp. 3149–3157. Curran Associates Inc., Red Hook, NY, USA (2017)

  59. [59]

    The American journal of drug and alcohol abuse37(5), 367–375 (2011) https://doi.org/10.3109/00952990

    Hu, M.-C., Pavlicova, M., Nunes, E.V.: Zero-inflated and Hurdle Models of Count Data with Extra Zeros: Examples from an HIV-Risk Reduction Intervention Trial. The American journal of drug and alcohol abuse37(5), 367–375 (2011) https://doi.org/10.3109/00952990. 2011.597280

  60. [60]

    https://www.census.gov/data/tables/time-series/demo/popest/2020s-total-metro-and- micro-statistical-areas.html

    Bureau, U.C.: Metropolitan and Micropolitan Statistical Areas Population Totals: 2020-2024. https://www.census.gov/data/tables/time-series/demo/popest/2020s-total-metro-and- micro-statistical-areas.html

  61. [61]

    https://www.epa.gov/smartgrowth/smart- location-mapping (2021)

    US EPA, OLEM.: Smart Location Mapping. https://www.epa.gov/smartgrowth/smart- location-mapping (2021)

  62. [62]

    I. T. Jolliffe: Principal Component Analysis. Springer Series in Statistics. Springer, New York (2002). https://doi.org/10.1007/b98835

  63. [63]

    Journal of the Royal Statistical Society

    Lawley, D.N., Maxwell, A.E.: Factor Analysis as a Statistical Method. Journal of the Royal Statistical Society. Series D (The Statistician)12(3), 209–229 (1962) https://doi.org/10.2307/ 2986915 2986915

  64. [64]

    Neural Computation10(5), 1299–1319 (1998) https://doi.org/10.1162/ 089976698300017467 17

    Sch¨ olkopf, B., Smola, A., M¨ uller, K.-R.: Nonlinear Component Analysis as a Kernel Eigen- value Problem. Neural Computation10(5), 1299–1319 (1998) https://doi.org/10.1162/ 089976698300017467 17

  65. [65]

    Science290(5500), 2319–2323 (2000) https://doi.org/10.1126/ science.290.5500.2319

    Tenenbaum, J.B., Silva, V., Langford, J.C.: A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science290(5500), 2319–2323 (2000) https://doi.org/10.1126/ science.290.5500.2319

  66. [66]

    In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    Bingham, E., Mannila, H.: Random projection in dimensionality reduction: Applications to image and text data. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’01, pp. 245–250. Association for Computing Machinery, New York, NY, USA (2001). https://doi.org/10.1145/502512.502546 18 Supplementary ...