Geospatial foundation-model embeddings improve population estimation unevenly across space and scale

Andrew J. Tatem; Eimear Cleary; Francisco Rowe; Maksym Bondarenko; Shengjie Lai; Somnath Chaudhuri; Wenbin Zhang

arxiv: 2605.01650 · v1 · submitted 2026-05-03 · 💻 cs.LG

Geospatial foundation-model embeddings improve population estimation unevenly across space and scale

Wenbin Zhang , Eimear Cleary , Francisco Rowe , Somnath Chaudhuri , Maksym Bondarenko , Shengjie Lai , Andrew J. Tatem This is my paper

Pith reviewed 2026-05-10 14:44 UTC · model grok-4.3

classification 💻 cs.LG

keywords population estimationgeospatial foundation modelsPDFM embeddingssubnational populationspatial scale mismatchpredictive fitKullback-Leibler divergencesettlement context

0 comments

The pith

Geospatial foundation model embeddings improve subnational population estimates unevenly across space and scale.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper benchmarks learned embeddings from the Population Dynamics Foundation Model against traditional harmonized geospatial covariates such as settlement extent and night-time lights for estimating subnational populations in Brazil, Nigeria, and the United States. It finds a median 20.1 percent reduction in unexplained variance and a 23.2 percent drop in distribution divergence when using the embeddings, with the strongest gains appearing in larger and less-developed areas where the covariates perform weakly. The embeddings transfer less flexibly across different spatial scales than the hand-built data. Readers should care because reliable small-area population figures support planning and resource allocation where censuses are sparse or outdated.

Core claim

PDFM embeddings capture settlement context more effectively than harmonized geospatial covariates in many cases, yielding better population predictions under geographically structured validation, yet the advantage is geographically and scale-dependent, with performance degrading under spatial aggregation mismatches and providing less flexible transfer across scales.

What carries the argument

The PDFM embeddings, reusable representations learned from multifaceted and heterogeneous geospatial data sources, benchmarked directly against assembled covariates for predictive modeling of population.

If this is right

PDFM is most advantageous where the geospatial covariates weakly characterise settlement context, such as larger and less-developed subnational areas.
Embeddings provide less flexible transfer across spatial aggregations than geospatial covariates.
Geospatial foundation-model representations can improve population estimation in data-poor settings.
Benefits break down predictably under spatial scale mismatch, revealing a limitation of current geospatial AI.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid models that combine embeddings with traditional covariates may be needed to handle varied geographies reliably.
The scale-coupling problem suggests developing multi-resolution training objectives for future geospatial foundation models.
Similar transfer limitations could appear in other spatial prediction tasks that rely on foundation-model embeddings.
Targeted collection of ground-truth population data could be prioritized in regions where embeddings currently underperform.

Load-bearing premise

The PDFM embeddings capture settlement context more informatively than harmonized geospatial covariates without scale-specific biases introduced by the foundation model's pretraining data or aggregation choices.

What would settle it

A new test set of large, less-developed subnational areas where PDFM embeddings produce zero or negative improvement in predictive fit, or where they transfer across mismatched spatial aggregations worse than the covariates.

read the original abstract

Reliable subnational population estimates are essential for applications, yet remain difficult where censuses are sparse, outdated or spatially coarse. Existing population-mapping workflows rely on hand-built geospatial covariates, such as settlement extent, night-time lights, and environmental conditions, which must be assembled and harmonised across scales and geographies. Geospatial foundation models offer an alternative by learning reusable representations of place from more multifaceted and heterogeneous data sources. Here, we benchmark Population Dynamics Foundation Model (PDFM) embeddings against the harmonised geospatial covariates for subnational population estimation in Brazil, Nigeria and the United States. Under geographically structured validation, PDFM increased predictive fit by a median of 20.1% (IQR: 10.0-33.2%, across country-model comparisons) reduction in unexplained variance, and reduced Kullback-Leibler divergence by 23.2% (9.2-26.2%). However, these gains were uneven. PDFM was most advantageous where the geospatial covariates weakly characterised settlement context, such as larger and less-developed subnational areas. Moreover, PDFM performance was scale-coupled with embeddings providing less flexible transfer across spatial aggregations than geospatial covariates. These findings showed that geospatial foundation-model representations of place can improve population estimation in data poor settings, but their benefits break down predictably under spatial scale mismatch, revealing a fundamental limitation of current geospatial AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PDFM embeddings deliver a median 20% lift over covariates for subnational population estimates in three countries, but the gains are uneven and weaken when spatial scales do not match.

read the letter

The paper's core result is that PDFM embeddings reduce unexplained variance by a median 20.1% and KL divergence by 23.2% compared with harmonized geospatial covariates under geographically structured validation in Brazil, Nigeria, and the US. The advantage appears largest in larger or less-developed subnational units where the standard covariates are weaker at capturing settlement patterns. It also documents that the embeddings transfer less flexibly across different spatial aggregations than the covariates do. That combination of quantified median gains plus the scale-coupling limitation is the new empirical content here; prior work had not shown these specific patterns for this model and these countries. The geographically structured split and the focus on where gains occur give the findings some independent grounding rather than pure in-sample fit. The main soft spot is the aggregation step. If the way PDFM outputs are summarized to the target administrative units differs in resolution or weighting from how the covariates are prepared, part of the reported edge could trace to that procedural difference rather than to richer features in the embeddings. The abstract already notes the scale-coupling, so the authors see the issue, but the exact matching of aggregation procedures needs to be verified in the methods to rule out an artifact. Methods details on embedding extraction and covariate harmonization are also thin in the summary, which limits quick reproduction. This work is aimed at researchers who build or apply geospatial models for population mapping in data-poor regions. A reader who needs concrete benchmarks on when foundation-model representations help versus when they hit scale limits will get usable numbers and a clear caveat. It is solid enough on the empirical side to deserve a serious referee, though the review should press on the aggregation comparison and full reproducibility of the pipeline.

Referee Report

2 major / 1 minor

Summary. The paper benchmarks the use of embeddings from the Population Dynamics Foundation Model (PDFM) against harmonized geospatial covariates for subnational population estimation in Brazil, Nigeria, and the United States. Under geographically structured validation, it reports a median 20.1% reduction in unexplained variance (IQR 10.0-33.2%) and 23.2% reduction in Kullback-Leibler divergence, with improvements being uneven across space and scale, performing best in larger, less-developed areas but showing less flexible transfer across spatial aggregations than traditional covariates.

Significance. If the results are confirmed, this work demonstrates that geospatial foundation models can offer advantages over conventional covariates in population mapping, especially in data-poor contexts, while also identifying key limitations related to spatial scale that must be addressed for broader applicability. This has practical significance for improving demographic estimates used in policy and humanitarian efforts.

major comments (2)

[Methods] Detailed procedures for extracting and aggregating PDFM embeddings to subnational units, as well as the exact harmonization steps for geospatial covariates, are not described. This omission makes it difficult to determine whether the reported performance gains stem from the embeddings themselves or from differences in aggregation methods, directly impacting the validity of the central claim of uneven improvements.
[Results] The manuscript should provide more explicit evidence that aggregation procedures were matched exactly between PDFM embeddings and geospatial covariates. Without this, the 20.1% median improvement could be partly attributable to scale-specific summarization choices rather than superior settlement context capture, as suggested by the noted scale-coupling and the abstract's own qualification on transfer flexibility.

minor comments (1)

[Abstract] The abstract is clear but could specify the number of subnational units or models compared to give context to the IQR ranges reported for the improvements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which highlight important aspects of methodological transparency. We have revised the manuscript to address the concerns about aggregation procedures and have added the requested details and evidence. Below we respond point by point.

read point-by-point responses

Referee: [Methods] Detailed procedures for extracting and aggregating PDFM embeddings to subnational units, as well as the exact harmonization steps for geospatial covariates, are not described. This omission makes it difficult to determine whether the reported performance gains stem from the embeddings themselves or from differences in aggregation methods, directly impacting the validity of the central claim of uneven improvements.

Authors: We agree that the original manuscript omitted sufficient detail on these steps, which limits reproducibility and could raise questions about the source of the observed gains. In the revised version we have added a new Methods subsection ('PDFM Embedding Extraction and Covariate Harmonization') that specifies: (i) the PDFM API query parameters and embedding dimensionality, (ii) the exact spatial aggregation (area-weighted mean pooling of embeddings within each subnational polygon), and (iii) the full harmonization pipeline for the geospatial covariates (source datasets, reprojection to a common grid, temporal alignment, and normalization). We also include a direct statement that identical aggregation logic was applied to both feature sets. These additions allow readers to confirm that performance differences arise from the embeddings rather than procedural mismatches. revision: yes
Referee: [Results] The manuscript should provide more explicit evidence that aggregation procedures were matched exactly between PDFM embeddings and geospatial covariates. Without this, the 20.1% median improvement could be partly attributable to scale-specific summarization choices rather than superior settlement context capture, as suggested by the noted scale-coupling and the abstract's own qualification on transfer flexibility.

Authors: We accept this critique and have strengthened the Results section accordingly. We now include an explicit paragraph and a supplementary table that document the matched aggregation functions (area-weighted means for both embeddings and covariates) and report a sensitivity check in which alternative summarization choices (e.g., median pooling) were tested; the relative advantage of PDFM remains stable. While we retain the abstract's qualification on scale-coupling, the added evidence demonstrates that the 20.1% median reduction in unexplained variance is not an artifact of mismatched summarization. We have also cross-referenced these details in the discussion of uneven spatial performance. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical benchmark of foundation-model embeddings

full rationale

The paper reports an empirical benchmark comparing PDFM embeddings to harmonized geospatial covariates for subnational population estimation in Brazil, Nigeria and the United States. Results are obtained via geographically structured validation measuring reductions in unexplained variance and KL divergence on held-out data. No derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the load-bearing steps; the claimed improvements are measured directly against external data splits rather than being forced by construction or internal definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Relies on standard assumptions of ML benchmarking and geospatial data harmonization; no free parameters or invented entities explicitly introduced beyond the foundation model itself.

axioms (1)

domain assumption Geographically structured validation sufficiently prevents spatial autocorrelation leakage in performance estimates.
Invoked to support claims of improved generalization across countries and scales.

pith-pipeline@v0.9.0 · 5569 in / 1177 out tokens · 35608 ms · 2026-05-10T14:44:36.047354+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 1 internal anchor

[1]

WorldPop, School of Geography and Environmental Sciences, University of Southampton, United Kingdom

work page
[2]

Geographic Data Science Lab, Department of Geography and Planning, School of Environmental Sciences, University of Liverpool, United Kingdom # Corresponding authors: Wenbin Zhang (wb.zhang@soton.ac.uk), and Shengjie Lai (Shengjie.Lai@soton.ac.uk) Abstract Reliable subnational population estimates are essential for applications , yet remain difficult where...

work page
[3]

Such representations are appealing for spatial demography because they compress not only otherwise difficult -to-access behavioural signals, such as aggregated search and activity patterns, but also geospatial and contextual data into a single reusable representation of place, thereby reducing the need for downstream applications to assemble and align mul...

work page 2023
[4]

The resulting embeddings are fixed, location- specific representations that can be used as predictors in downstream tasks. The goal of this study was not to retrain or modify PDFM, but to evaluate the predictive utility of these precomputed embeddings for population modelling relative to established geospatial covariates. PDFM embeddings were generated se...

work page
[5]

In each country, embeddings were produced from country-specific models built on the relevant administrative geography and available place -based input data

and the US (N = 39649) using the same methodological pipeline. In each country, embeddings were produced from country-specific models built on the relevant administrative geography and available place -based input data. Input signals were time -matched across countries to October 2023 to improve consistency. Consequently, cross-country differences in mode...

work page 2023
[6]

Within this candidate set, we calculated string similarity between the embedding place name and polygon names using the Jaro-Winkler distance

Candidate polygons were then identified based on spatial proximity to these coordinates, including the nearest polygon and its neighbouring polygons. Within this candidate set, we calculated string similarity between the embedding place name and polygon names using the Jaro-Winkler distance

work page
[7]

We evaluated similarity both with the full name and with higher - level region names removed (e.g., removing state names when embedded in district names). The final match was determined by combining spatial proximity and name similarity: the nearest polygon was selected unless another candidate showed substantially higher name similarity beyond a predefin...

work page 2022
[8]

& Tatem, A

Lloyd, C., Sorichetta, A. & Tatem, A. 2017. High resolution global gridded data for use in population studies. Sci Data 4, 170001. https://doi.org/10.1038/sdata.2017.1

work page doi:10.1038/sdata.2017.1 2017
[9]

and Tatem, A.J., 2021

Nilsen, K., Tejedor-Garavito, N., Leasure, D.R., Utazi, C.E., Ruktanonchai, C.W., Wigley, A.S., Dooley, C.A., Matthews, Z. and Tatem, A.J., 2021. A review of geospatial methods for population estimation and their use in constructing reproductive, maternal, newborn, child and adolescent health service indicators. BMC health services research, 21(Suppl 1), p.370

work page 2021
[10]

and Zhou, Y ., 2025

Li, D., Sun, L., Feng, K., Zhang, N., Yu, Y ., Zhao, D. and Zhou, Y ., 2025. Disproportionate flood exposure for slum populations of the Global South. Nature Cities, 2(7), pp.626-638

work page 2025
[11]

and Lai, S., 2025

Liu, H., Wang, S., Wei, C., Zhang, W., Tatem, A.J. and Lai, S., 2025. Assessing context-dependent effectiveness of heat adaptation through human mobility under different heatwave regimes. Sustainable Cities and Society, p.107066

work page 2025
[12]

and Sorichetta, A., 2025

Zhang, W.B., Woods, D., Olowe, I.D., Schiavina, M., Fang, W., Hornby, G., Bondarenko, M., Maes, J., Dijkstra, L., Tatem, A.J. and Sorichetta, A., 2025. Assessing the impacts of gridded population model choice on degree of urbanisation metrics. Cities, 166, p.106293

work page 2025
[13]

and Beltrán-Sánchez, H., 2023

Bozick, R., Burgette, L.F., Sharygin, E., Shih, R.A., Weidmer, B., Tzen, M., Kofner, A., Brand, J.E. and Beltrán-Sánchez, H., 2023. Evaluating the accuracy of 2020 census block-level estimates in California. Demography, 60(6), pp.1903-1921

work page 2023
[14]

Estimating the civilian noninstitutional population for small areas: a modified cohort component approach using public use data

Forrester, A.C., 2024. Estimating the civilian noninstitutional population for small areas: a modified cohort component approach using public use data. Journal of Population Research, 41(1), p.5

work page 2024
[15]

WorldPop, open data for spatial demography

Tatem, A.J., 2017. WorldPop, open data for spatial demography. Scientific data, 4(1), p.170004

work page 2017
[16]

and Schindler, K., 2024

Metzger, N., Daudt, R.C., Tuia, D. and Schindler, K., 2024. High-resolution population maps derived from sentinel-1 and sentinel-2. Remote Sensing of Environment, 314, p.114383

work page 2024
[17]

The spatial allocation of population: a review of large-scale gridded population data products and their fitness for use

Leyk, S., Gaughan, A.E., Adamo, S.B., et al., 2019. The spatial allocation of population: a review of large-scale gridded population data products and their fitness for use. Earth Syst. Sci. Data 11, 1385-1409. https://doi:10.5194/essd-11-1385-2019

work page doi:10.5194/essd-11-1385-2019 2019
[18]

and Tatem, A.J., 2018

Wardrop, N.A., Jochem, W.C., Bird, T.J., Chamberlain, H.R., Clarke, D., Kerr, D., Bengtsson, L., Juran, S., Seaman, V . and Tatem, A.J., 2018. Spatially disaggregated population estimates in the absence of national population and housing census data. Proceedings of the National Academy of Sciences, 115(14), pp.3529-3537

work page 2018
[19]

Census counts, undercounts and population estimates: The importance of data quality evaluation

Pelletier, François, 2020. Census counts, undercounts and population estimates: The importance of data quality evaluation. United Nations, Department of Economics and Social Affairs, Population Division, Technical Paper No. 2

work page 2020
[20]

and Wang, L., 2005

Wu, S.S., Qiu, X. and Wang, L., 2005. Population estimation methods in GIS and remote sensing: A review. GIScience & Remote Sensing, 42(1), pp.80-96

work page 2005
[21]

and Temple, J., 2022

Wilson, T., Grossman, I., Alexander, M., Rees, P. and Temple, J., 2022. Methods for small area population forecasts: State-of-the-art and research needs. Population Research and Policy Review, 41(3), pp.865-898

work page 2022
[22]

and Bondarenko, M., 2025

Zhang, W.B., Sorichetta, A., Frye, C., Tejedor-Garavito, N., Fang, W., Cihan, D., Woods, D., Yetman, G., Hilton, J., Tatem, A.J. and Bondarenko, M., 2025. A stochastic approach to integerize floating-point estimates in gridded population mapping. International Journal of Geographical Information Science, pp.1-17

work page 2025
[23]

PLOS ONE 10(2), 1–22 (02 2015)

Stevens, F.R., Gaughan, A.E., Linard, C., Tatem, A.J., 2015. Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data. PLOS ONE 10(2), e0107042. doi:10.1371/journal.pone.0107042

work page doi:10.1371/journal.pone.0107042 2015
[24]

Adams, D.S., Zimmer, A., Tuccillo, J. et al. 2025. LandScan mosaic enables high- resolution gridded population estimates with explicit uncertainty. Sci Rep 15, 44493. https://doi.org/10.1038/s41598-025-28125-z

work page doi:10.1038/s41598-025-28125-z 2025
[25]

and V oPham, T., 2025

Iyer, H.S., Karasaki, S., Yi, L., Hswen, Y ., James, P. and V oPham, T., 2025. Harnessing geospatial artificial intelligence (GeoAI) for environmental epidemiology: a narrative review. Current environmental health reports, 12(1), p.34

work page 2025
[26]

X., Xiong, Z

Zhu, X. X., Xiong, Z. & Shi, Y . On the foundations of Earth foundation models. Commun. Earth Environ. 7, 116 (2026)

work page 2026
[27]

Dynamic population mapping using mobile phone data

Deville P, Linard C, Martin S, et al. Dynamic population mapping using mobile phone data. Proceedings of the National Academy of Sciences. 2014;111(45):15888-15893

work page 2014
[28]

Lai, S., Erbach-Schoenberg, E.z., Pezzulo, C. et al. Exploring the use of mobile phone data for national migration statistics. Palgrave Commun 5, 34 (2019). https://doi.org/10.1057/s41599-019-0242-9

work page doi:10.1057/s41599-019-0242-9 2019
[29]

Duan, Q., Lai, S., Sorichetta, A. et al. COVID-19 and urban exodus: diverging population redistribution patterns across countries from 2020 to 2022. npj Urban Sustain 6, 59 (2026). https://doi.org/10.1038/s42949-026-00351-y

work page doi:10.1038/s42949-026-00351-y 2020
[30]

and Huang, Z., 2020

Zhang, F., Zu, J., Hu, M., Zhu, D., Kang, Y ., Gao, S., Zhang, Y . and Huang, Z., 2020. Uncovering inconspicuous places using social media check-ins and street view images. Computers, Environment and Urban Systems, 81, p.101478

work page 2020
[31]

Peng, D., Gui, Z., Wei, W. et al. Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data. Nat Mach Intell 7, 1669- 1684 (2025). https://doi.org/10.1038/s42256-025-01112-9

work page doi:10.1038/s42256-025-01112-9 2025
[32]

General Geospatial Inference with a Population Dynamics Foundation Model

Agarwal, M., Sun, M., Kamath, C., Muslim, A., Sarker, P., Paul, J., Yee, H., Sieniek, M., Jablonski, K., Vispute, S. and Kumar, A., 2024. General geospatial inference with a population dynamics foundation model. arXiv preprint arXiv:2411.07207

work page internal anchor Pith review Pith/arXiv arXiv 2024
[33]

Mai, G. et al. On the Opportunities and Challenges of Foundation Models for GeoAI (Vision Paper). ACM Trans. Spat. Algorithms Syst. 10, Article 11, 1-46 (2024)

work page 2024
[34]

Bodnar, C., Bruinsma, W.P., Lucic, A. et al. A foundation model for the Earth system. Nature 641, 1180-1187 (2025). https://doi.org/10.1038/s41586-025-09005-y

work page doi:10.1038/s41586-025-09005-y 2025
[35]

and Bondarenko, M., 2025

Woods, D., McKeen, T., Cunningham, A., Priyatikanto, R., Tatem, A.J., Sorichetta, A. and Bondarenko, M., 2025. Global gridded multi-temporal datasets to support human population distribution modelling. Gates Open Research, 9, p.72

work page 2025
[36]

Earth ai: Unlocking geospatial insights with foundation models and cross-modal reasoning.arXiv preprint arXiv:2510.18318,

Bell, A., Aides, A., Helmy, A., Muslim, A., Barzilai, A., Slobodkin, A., Jaber, B., Schottlander, D., Leifman, G., Paul, J. and Sun, M., 2025. Earth AI: unlocking geospatial insights with foundation models and cross-modal reasoning. arXiv preprint arXiv:2510.18318

work page arXiv 2025
[37]

and Thakur, G., 2023

Fan, J. and Thakur, G., 2023. Towards POI-based large-scale land use modeling: spatial scale, semantic granularity, and geographic context. International Journal of Digital Earth, 16(1), pp.430-445

work page 2023
[38]

Modeling walking accessibility to urban parks using Google Maps crowdsourcing database in the high-density urban environments of Hong Kong

Gong, F.Y ., 2023. Modeling walking accessibility to urban parks using Google Maps crowdsourcing database in the high-density urban environments of Hong Kong. Scientific Reports, 13(1), p.20798

work page 2023
[39]

and Du, S., 2025

Xiong, S., Zhang, X., Wang, H., Lei, Y ., Tan, G. and Du, S., 2025. Mapping the first dataset of global urban land uses with Sentinel-2 imagery and POI prompt. Remote Sensing of Environment, 327, p.114824

work page 2025
[40]

Li, Z., Li, L., Hu, T. et al. Satellite mapping of every building’s function in urban China reveals deep built environment disparities. Nat Commun 17, 2827 (2026). https://doi.org/10.1038/s41467-026-69589-5

work page doi:10.1038/s41467-026-69589-5 2026
[41]

and Tatem, A.J., 2020

Stevens, F.R., Gaughan, A.E., Nieves, J.J., King, A., Sorichetta, A., Linard, C. and Tatem, A.J., 2020. Comparisons of two global built area land cover datasets in methods to disaggregate human population in eleven countries from the global South. International Journal of Digital Earth, 13(1), pp.78-100

work page 2020
[42]

and Sun, Z.Y ., 2024

Sun, Y ., Xie, J., Wang, Y ., Chan, T.O. and Sun, Z.Y ., 2024. Mapping local-scale working population and daytime population densities using points-of-interest and nighttime light satellite imageries. Geo-Spatial Information Science, 27(6), pp.1852- 1867

work page 2024
[43]

and Tatem, A.J., 2022

Thomson, D.R., Leasure, D.R., Bird, T., Tzavidis, N. and Tatem, A.J., 2022. How accurate are WorldPop-Global-Unconstrained gridded population data at the cell- level?: A simulation analysis in urban Namibia. Plos one, 17(7), p.e0271504

work page 2022
[44]

and Prasad, G., 2025

Metz, L., Haggard, R., Moszczynski, M., Asbah, S., Mwase, C., Khomani, P., Smith, T., Cooper, H., Mwale, A., Muslim, A. and Prasad, G., 2025. Application and Validation of Geospatial Foundation Model Data for the Prediction of Health Facility Programmatic Outputs--A Case Study in Malawi. arXiv preprint arXiv:2510.25954

work page arXiv 2025
[45]

and Bram, D., 2007

Dark, S.J. and Bram, D., 2007. The modifiable areal unit problem (MAUP) in physical geography. Progress in physical geography, 31(5), pp.471-479

work page 2007
[46]

and Young, L.J., 2005

Gotway Crawford, C.A. and Young, L.J., 2005. Change of support: an inter- disciplinary challenge. In Geostatistics for Environmental Applications: Proceedings of the Fifth European Conference on Geostatistics for Environmental Applications (pp. 1-13). Berlin, Heidelberg: Springer Berlin Heidelberg

work page 2005
[47]

and Lao, N., 2022

Mai, G., Janowicz, K., Hu, Y ., Gao, S., Yan, B., Zhu, R., Cai, L. and Lao, N., 2022. A review of location encoding for GeoAI: methods and applications. International Journal of Geographical Information Science, 36(4), pp.639-673

work page 2022
[48]

and Wang, W., 2017, October

Wang, Y ., Qin, J. and Wang, W., 2017, October. Efficient approximate entity matching using jaro-winkler distance. In International conference on web information systems engineering (pp. 231-239). Cham: Springer International Publishing

work page 2017
[49]

and Jurman, G., 2021

Chicco, D., Warrens, M.J. and Jurman, G., 2021. The coefficient of determination R- squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. Peerj computer science, 7, p.e623

work page 2021
[50]

and Leibler, R.A., 1951

Kullback, S. and Leibler, R.A., 1951. On information and sufficiency. The annals of mathematical statistics, 22(1), pp.79-86

work page 1951
[51]

Swanwick, R.H., Read, Q.D., Guinn, S.M. et al. Dasymetric population mapping based on US census data and 30-m gridded estimates of impervious surface. Sci Data 9, 523 (2022). https://doi.org/10.1038/s41597-022-01603-z

work page doi:10.1038/s41597-022-01603-z 2022
[52]

Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure

Roberts, D.R., Bahn, V ., Ciuti, S., et al., 2017. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40(8), 913-

work page 2017
[53]

doi:10.1111/ecog.02881

work page doi:10.1111/ecog.02881
[54]

and Meyer, H., 2023

Ludwig, M., Moreno‐Martinez, A., Hölzel, N., Pebesma, E. and Meyer, H., 2023. Assessing and improving the transferability of current global spatial prediction models. Global Ecology and Biogeography, 32(3), pp.356-368

work page 2023
[55]

and Lengauer, T., 2010

Altmann, A., Toloşi, L., Sander, O. and Lengauer, T., 2010. Permutation importance: a corrected feature importance measure. Bioinformatics, 26(10), pp.1340-1347. Supplementary Information A. Supplementary Tables Supplementary Table 1. Marginal ordinary least squares (OLS) associations between regional characteristics and PDFM performance gains relative to...

work page 2010

[1] [1]

WorldPop, School of Geography and Environmental Sciences, University of Southampton, United Kingdom

work page

[2] [2]

Geographic Data Science Lab, Department of Geography and Planning, School of Environmental Sciences, University of Liverpool, United Kingdom # Corresponding authors: Wenbin Zhang (wb.zhang@soton.ac.uk), and Shengjie Lai (Shengjie.Lai@soton.ac.uk) Abstract Reliable subnational population estimates are essential for applications , yet remain difficult where...

work page

[3] [3]

Such representations are appealing for spatial demography because they compress not only otherwise difficult -to-access behavioural signals, such as aggregated search and activity patterns, but also geospatial and contextual data into a single reusable representation of place, thereby reducing the need for downstream applications to assemble and align mul...

work page 2023

[4] [4]

The resulting embeddings are fixed, location- specific representations that can be used as predictors in downstream tasks. The goal of this study was not to retrain or modify PDFM, but to evaluate the predictive utility of these precomputed embeddings for population modelling relative to established geospatial covariates. PDFM embeddings were generated se...

work page

[5] [5]

In each country, embeddings were produced from country-specific models built on the relevant administrative geography and available place -based input data

and the US (N = 39649) using the same methodological pipeline. In each country, embeddings were produced from country-specific models built on the relevant administrative geography and available place -based input data. Input signals were time -matched across countries to October 2023 to improve consistency. Consequently, cross-country differences in mode...

work page 2023

[6] [6]

Within this candidate set, we calculated string similarity between the embedding place name and polygon names using the Jaro-Winkler distance

Candidate polygons were then identified based on spatial proximity to these coordinates, including the nearest polygon and its neighbouring polygons. Within this candidate set, we calculated string similarity between the embedding place name and polygon names using the Jaro-Winkler distance

work page

[7] [7]

We evaluated similarity both with the full name and with higher - level region names removed (e.g., removing state names when embedded in district names). The final match was determined by combining spatial proximity and name similarity: the nearest polygon was selected unless another candidate showed substantially higher name similarity beyond a predefin...

work page 2022

[8] [8]

& Tatem, A

Lloyd, C., Sorichetta, A. & Tatem, A. 2017. High resolution global gridded data for use in population studies. Sci Data 4, 170001. https://doi.org/10.1038/sdata.2017.1

work page doi:10.1038/sdata.2017.1 2017

[9] [9]

and Tatem, A.J., 2021

Nilsen, K., Tejedor-Garavito, N., Leasure, D.R., Utazi, C.E., Ruktanonchai, C.W., Wigley, A.S., Dooley, C.A., Matthews, Z. and Tatem, A.J., 2021. A review of geospatial methods for population estimation and their use in constructing reproductive, maternal, newborn, child and adolescent health service indicators. BMC health services research, 21(Suppl 1), p.370

work page 2021

[10] [10]

and Zhou, Y ., 2025

Li, D., Sun, L., Feng, K., Zhang, N., Yu, Y ., Zhao, D. and Zhou, Y ., 2025. Disproportionate flood exposure for slum populations of the Global South. Nature Cities, 2(7), pp.626-638

work page 2025

[11] [11]

and Lai, S., 2025

Liu, H., Wang, S., Wei, C., Zhang, W., Tatem, A.J. and Lai, S., 2025. Assessing context-dependent effectiveness of heat adaptation through human mobility under different heatwave regimes. Sustainable Cities and Society, p.107066

work page 2025

[12] [12]

and Sorichetta, A., 2025

Zhang, W.B., Woods, D., Olowe, I.D., Schiavina, M., Fang, W., Hornby, G., Bondarenko, M., Maes, J., Dijkstra, L., Tatem, A.J. and Sorichetta, A., 2025. Assessing the impacts of gridded population model choice on degree of urbanisation metrics. Cities, 166, p.106293

work page 2025

[13] [13]

and Beltrán-Sánchez, H., 2023

Bozick, R., Burgette, L.F., Sharygin, E., Shih, R.A., Weidmer, B., Tzen, M., Kofner, A., Brand, J.E. and Beltrán-Sánchez, H., 2023. Evaluating the accuracy of 2020 census block-level estimates in California. Demography, 60(6), pp.1903-1921

work page 2023

[14] [14]

Estimating the civilian noninstitutional population for small areas: a modified cohort component approach using public use data

Forrester, A.C., 2024. Estimating the civilian noninstitutional population for small areas: a modified cohort component approach using public use data. Journal of Population Research, 41(1), p.5

work page 2024

[15] [15]

WorldPop, open data for spatial demography

Tatem, A.J., 2017. WorldPop, open data for spatial demography. Scientific data, 4(1), p.170004

work page 2017

[16] [16]

and Schindler, K., 2024

Metzger, N., Daudt, R.C., Tuia, D. and Schindler, K., 2024. High-resolution population maps derived from sentinel-1 and sentinel-2. Remote Sensing of Environment, 314, p.114383

work page 2024

[17] [17]

The spatial allocation of population: a review of large-scale gridded population data products and their fitness for use

Leyk, S., Gaughan, A.E., Adamo, S.B., et al., 2019. The spatial allocation of population: a review of large-scale gridded population data products and their fitness for use. Earth Syst. Sci. Data 11, 1385-1409. https://doi:10.5194/essd-11-1385-2019

work page doi:10.5194/essd-11-1385-2019 2019

[18] [18]

and Tatem, A.J., 2018

Wardrop, N.A., Jochem, W.C., Bird, T.J., Chamberlain, H.R., Clarke, D., Kerr, D., Bengtsson, L., Juran, S., Seaman, V . and Tatem, A.J., 2018. Spatially disaggregated population estimates in the absence of national population and housing census data. Proceedings of the National Academy of Sciences, 115(14), pp.3529-3537

work page 2018

[19] [19]

Census counts, undercounts and population estimates: The importance of data quality evaluation

Pelletier, François, 2020. Census counts, undercounts and population estimates: The importance of data quality evaluation. United Nations, Department of Economics and Social Affairs, Population Division, Technical Paper No. 2

work page 2020

[20] [20]

and Wang, L., 2005

Wu, S.S., Qiu, X. and Wang, L., 2005. Population estimation methods in GIS and remote sensing: A review. GIScience & Remote Sensing, 42(1), pp.80-96

work page 2005

[21] [21]

and Temple, J., 2022

Wilson, T., Grossman, I., Alexander, M., Rees, P. and Temple, J., 2022. Methods for small area population forecasts: State-of-the-art and research needs. Population Research and Policy Review, 41(3), pp.865-898

work page 2022

[22] [22]

and Bondarenko, M., 2025

Zhang, W.B., Sorichetta, A., Frye, C., Tejedor-Garavito, N., Fang, W., Cihan, D., Woods, D., Yetman, G., Hilton, J., Tatem, A.J. and Bondarenko, M., 2025. A stochastic approach to integerize floating-point estimates in gridded population mapping. International Journal of Geographical Information Science, pp.1-17

work page 2025

[23] [23]

PLOS ONE 10(2), 1–22 (02 2015)

Stevens, F.R., Gaughan, A.E., Linard, C., Tatem, A.J., 2015. Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data. PLOS ONE 10(2), e0107042. doi:10.1371/journal.pone.0107042

work page doi:10.1371/journal.pone.0107042 2015

[24] [24]

Adams, D.S., Zimmer, A., Tuccillo, J. et al. 2025. LandScan mosaic enables high- resolution gridded population estimates with explicit uncertainty. Sci Rep 15, 44493. https://doi.org/10.1038/s41598-025-28125-z

work page doi:10.1038/s41598-025-28125-z 2025

[25] [25]

and V oPham, T., 2025

Iyer, H.S., Karasaki, S., Yi, L., Hswen, Y ., James, P. and V oPham, T., 2025. Harnessing geospatial artificial intelligence (GeoAI) for environmental epidemiology: a narrative review. Current environmental health reports, 12(1), p.34

work page 2025

[26] [26]

X., Xiong, Z

Zhu, X. X., Xiong, Z. & Shi, Y . On the foundations of Earth foundation models. Commun. Earth Environ. 7, 116 (2026)

work page 2026

[27] [27]

Dynamic population mapping using mobile phone data

Deville P, Linard C, Martin S, et al. Dynamic population mapping using mobile phone data. Proceedings of the National Academy of Sciences. 2014;111(45):15888-15893

work page 2014

[28] [28]

Lai, S., Erbach-Schoenberg, E.z., Pezzulo, C. et al. Exploring the use of mobile phone data for national migration statistics. Palgrave Commun 5, 34 (2019). https://doi.org/10.1057/s41599-019-0242-9

work page doi:10.1057/s41599-019-0242-9 2019

[29] [29]

Duan, Q., Lai, S., Sorichetta, A. et al. COVID-19 and urban exodus: diverging population redistribution patterns across countries from 2020 to 2022. npj Urban Sustain 6, 59 (2026). https://doi.org/10.1038/s42949-026-00351-y

work page doi:10.1038/s42949-026-00351-y 2020

[30] [30]

and Huang, Z., 2020

Zhang, F., Zu, J., Hu, M., Zhu, D., Kang, Y ., Gao, S., Zhang, Y . and Huang, Z., 2020. Uncovering inconspicuous places using social media check-ins and street view images. Computers, Environment and Urban Systems, 81, p.101478

work page 2020

[31] [31]

Peng, D., Gui, Z., Wei, W. et al. Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data. Nat Mach Intell 7, 1669- 1684 (2025). https://doi.org/10.1038/s42256-025-01112-9

work page doi:10.1038/s42256-025-01112-9 2025

[32] [32]

General Geospatial Inference with a Population Dynamics Foundation Model

Agarwal, M., Sun, M., Kamath, C., Muslim, A., Sarker, P., Paul, J., Yee, H., Sieniek, M., Jablonski, K., Vispute, S. and Kumar, A., 2024. General geospatial inference with a population dynamics foundation model. arXiv preprint arXiv:2411.07207

work page internal anchor Pith review Pith/arXiv arXiv 2024

[33] [33]

Mai, G. et al. On the Opportunities and Challenges of Foundation Models for GeoAI (Vision Paper). ACM Trans. Spat. Algorithms Syst. 10, Article 11, 1-46 (2024)

work page 2024

[34] [34]

Bodnar, C., Bruinsma, W.P., Lucic, A. et al. A foundation model for the Earth system. Nature 641, 1180-1187 (2025). https://doi.org/10.1038/s41586-025-09005-y

work page doi:10.1038/s41586-025-09005-y 2025

[35] [35]

and Bondarenko, M., 2025

Woods, D., McKeen, T., Cunningham, A., Priyatikanto, R., Tatem, A.J., Sorichetta, A. and Bondarenko, M., 2025. Global gridded multi-temporal datasets to support human population distribution modelling. Gates Open Research, 9, p.72

work page 2025

[36] [36]

Earth ai: Unlocking geospatial insights with foundation models and cross-modal reasoning.arXiv preprint arXiv:2510.18318,

Bell, A., Aides, A., Helmy, A., Muslim, A., Barzilai, A., Slobodkin, A., Jaber, B., Schottlander, D., Leifman, G., Paul, J. and Sun, M., 2025. Earth AI: unlocking geospatial insights with foundation models and cross-modal reasoning. arXiv preprint arXiv:2510.18318

work page arXiv 2025

[37] [37]

and Thakur, G., 2023

Fan, J. and Thakur, G., 2023. Towards POI-based large-scale land use modeling: spatial scale, semantic granularity, and geographic context. International Journal of Digital Earth, 16(1), pp.430-445

work page 2023

[38] [38]

Modeling walking accessibility to urban parks using Google Maps crowdsourcing database in the high-density urban environments of Hong Kong

Gong, F.Y ., 2023. Modeling walking accessibility to urban parks using Google Maps crowdsourcing database in the high-density urban environments of Hong Kong. Scientific Reports, 13(1), p.20798

work page 2023

[39] [39]

and Du, S., 2025

Xiong, S., Zhang, X., Wang, H., Lei, Y ., Tan, G. and Du, S., 2025. Mapping the first dataset of global urban land uses with Sentinel-2 imagery and POI prompt. Remote Sensing of Environment, 327, p.114824

work page 2025

[40] [40]

Li, Z., Li, L., Hu, T. et al. Satellite mapping of every building’s function in urban China reveals deep built environment disparities. Nat Commun 17, 2827 (2026). https://doi.org/10.1038/s41467-026-69589-5

work page doi:10.1038/s41467-026-69589-5 2026

[41] [41]

and Tatem, A.J., 2020

Stevens, F.R., Gaughan, A.E., Nieves, J.J., King, A., Sorichetta, A., Linard, C. and Tatem, A.J., 2020. Comparisons of two global built area land cover datasets in methods to disaggregate human population in eleven countries from the global South. International Journal of Digital Earth, 13(1), pp.78-100

work page 2020

[42] [42]

and Sun, Z.Y ., 2024

Sun, Y ., Xie, J., Wang, Y ., Chan, T.O. and Sun, Z.Y ., 2024. Mapping local-scale working population and daytime population densities using points-of-interest and nighttime light satellite imageries. Geo-Spatial Information Science, 27(6), pp.1852- 1867

work page 2024

[43] [43]

and Tatem, A.J., 2022

Thomson, D.R., Leasure, D.R., Bird, T., Tzavidis, N. and Tatem, A.J., 2022. How accurate are WorldPop-Global-Unconstrained gridded population data at the cell- level?: A simulation analysis in urban Namibia. Plos one, 17(7), p.e0271504

work page 2022

[44] [44]

and Prasad, G., 2025

Metz, L., Haggard, R., Moszczynski, M., Asbah, S., Mwase, C., Khomani, P., Smith, T., Cooper, H., Mwale, A., Muslim, A. and Prasad, G., 2025. Application and Validation of Geospatial Foundation Model Data for the Prediction of Health Facility Programmatic Outputs--A Case Study in Malawi. arXiv preprint arXiv:2510.25954

work page arXiv 2025

[45] [45]

and Bram, D., 2007

Dark, S.J. and Bram, D., 2007. The modifiable areal unit problem (MAUP) in physical geography. Progress in physical geography, 31(5), pp.471-479

work page 2007

[46] [46]

and Young, L.J., 2005

Gotway Crawford, C.A. and Young, L.J., 2005. Change of support: an inter- disciplinary challenge. In Geostatistics for Environmental Applications: Proceedings of the Fifth European Conference on Geostatistics for Environmental Applications (pp. 1-13). Berlin, Heidelberg: Springer Berlin Heidelberg

work page 2005

[47] [47]

and Lao, N., 2022

Mai, G., Janowicz, K., Hu, Y ., Gao, S., Yan, B., Zhu, R., Cai, L. and Lao, N., 2022. A review of location encoding for GeoAI: methods and applications. International Journal of Geographical Information Science, 36(4), pp.639-673

work page 2022

[48] [48]

and Wang, W., 2017, October

Wang, Y ., Qin, J. and Wang, W., 2017, October. Efficient approximate entity matching using jaro-winkler distance. In International conference on web information systems engineering (pp. 231-239). Cham: Springer International Publishing

work page 2017

[49] [49]

and Jurman, G., 2021

Chicco, D., Warrens, M.J. and Jurman, G., 2021. The coefficient of determination R- squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. Peerj computer science, 7, p.e623

work page 2021

[50] [50]

and Leibler, R.A., 1951

Kullback, S. and Leibler, R.A., 1951. On information and sufficiency. The annals of mathematical statistics, 22(1), pp.79-86

work page 1951

[51] [51]

Swanwick, R.H., Read, Q.D., Guinn, S.M. et al. Dasymetric population mapping based on US census data and 30-m gridded estimates of impervious surface. Sci Data 9, 523 (2022). https://doi.org/10.1038/s41597-022-01603-z

work page doi:10.1038/s41597-022-01603-z 2022

[52] [52]

Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure

Roberts, D.R., Bahn, V ., Ciuti, S., et al., 2017. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40(8), 913-

work page 2017

[53] [53]

doi:10.1111/ecog.02881

work page doi:10.1111/ecog.02881

[54] [54]

and Meyer, H., 2023

Ludwig, M., Moreno‐Martinez, A., Hölzel, N., Pebesma, E. and Meyer, H., 2023. Assessing and improving the transferability of current global spatial prediction models. Global Ecology and Biogeography, 32(3), pp.356-368

work page 2023

[55] [55]

and Lengauer, T., 2010

Altmann, A., Toloşi, L., Sander, O. and Lengauer, T., 2010. Permutation importance: a corrected feature importance measure. Bioinformatics, 26(10), pp.1340-1347. Supplementary Information A. Supplementary Tables Supplementary Table 1. Marginal ordinary least squares (OLS) associations between regional characteristics and PDFM performance gains relative to...

work page 2010