General Geospatial Inference with a Population Dynamics Foundation Model

Adam Boulanger; Arbaaz Muslim; Atul Kumar; Bryan Perozzi; Chaitanya Kamath; David Fork; David Schottlander; Gautam Prasad; Greg Corrado; Hector Yee

arxiv: 2411.07207 · v6 · submitted 2024-11-11 · 💻 cs.LG · cs.CY

General Geospatial Inference with a Population Dynamics Foundation Model

Mohit Agarwal , Mimi Sun , Chaitanya Kamath , Arbaaz Muslim , Prithul Sarker , Joydeep Paul , Hector Yee , Marcin Sieniek

show 26 more authors

Kim Jablonski Swapnil Vispute Atul Kumar Yael Mayer David Fork Sheila de Guia Jamie McPike Adam Boulanger Tomer Shekel David Schottlander Yao Xiao Manjit Chakravarthy Manukonda Yun Liu Neslihan Bulut Sami Abu-el-haija Bryan Perozzi Monica Bharel Von Nguyen Luke Barrington Niv Efron Yossi Matias Greg Corrado Krish Eswaran Shruthi Prabhakara Shravya Shetty Gautam Prasad

This is my paper

Pith reviewed 2026-05-23 17:25 UTC · model grok-4.3

classification 💻 cs.LG cs.CY

keywords geospatial inferencefoundation modelgraph neural networkinterpolationextrapolationpopulation dynamicshealth indicatorssocioeconomic factors

0 comments

The pith

A graph neural network on US multi-modal location data produces embeddings that reach state-of-the-art results on 27 geospatial tasks across health, socioeconomic, and environmental domains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a geo-indexed dataset for US postal codes and counties that combines human behavior signals such as maps, busyness, and search trends with environmental data like weather and air quality. A graph neural network then models relationships among locations and modalities to generate embeddings. These embeddings support simple downstream models that achieve state-of-the-art performance on all 27 interpolation tasks and on 25 of the 27 extrapolation and super-resolution tasks. When paired with a forecasting model the embeddings also improve predictions of unemployment and poverty beyond what fully supervised methods deliver. The approach therefore offers a reusable representation that reduces the need for hand-crafted features on new geospatial problems.

Core claim

By constructing embeddings from a graph neural network applied to a geo-indexed multi-modal dataset of US locations, the Population Dynamics Foundation Model captures general relationships that enable state-of-the-art performance on geospatial interpolation, extrapolation, and super-resolution tasks in health, socioeconomic, and environmental domains, as well as improved forecasting when combined with TimesFM.

What carries the argument

The graph neural network that models relationships between locations and modalities in the constructed multi-modal US dataset to produce adaptable embeddings.

If this is right

The embeddings achieve state-of-the-art results on all 27 interpolation tasks without task-specific engineering.
The embeddings reach state-of-the-art on 25 of 27 extrapolation and super-resolution tasks in health, socioeconomic, and environmental domains.
Pairing the embeddings with a forecasting model surpasses fully supervised forecasting on unemployment and poverty prediction.
Public release of the embeddings enables direct reuse on additional geospatial problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar data construction and embedding methods could support geospatial tasks in other countries if comparable multi-modal sources exist.
The embeddings may reduce reliance on domain experts for custom feature design in applied geospatial modeling.
Testing transfer to finer spatial resolutions or to dynamic real-time inputs would clarify the limits of the learned relationships.
Adding modalities such as traffic or satellite imagery could further strengthen performance on environmental tasks.

Load-bearing premise

The embeddings produced by the graph neural network on the US multi-modal dataset capture sufficiently general relationships between locations and modalities to allow simple downstream models to reach state-of-the-art results on held-out tasks without task-specific feature engineering or heavy fine-tuning.

What would settle it

A new collection of geospatial tasks, drawn from the same three domains but outside the original 27 benchmarks and using data from regions or time periods not represented in the training set, on which the PDFM embeddings fail to match or exceed the performance of task-specific models.

read the original abstract

Supporting the health and well-being of dynamic populations around the world requires governmental agencies, organizations and researchers to understand and reason over complex relationships between human behavior and local contexts in order to identify high-risk groups and strategically allocate limited resources. Traditional approaches to these classes of problems often entail developing manually curated, task-specific features and models to represent human behavior and the natural and built environment, which can be challenging to adapt to new, or even, related tasks. To address this, we introduce a Population Dynamics Foundation Model (PDFM) that aims to capture the relationships between diverse data modalities and is applicable to a broad range of geospatial tasks. We first construct a geo-indexed dataset for postal codes and counties across the United States, capturing rich aggregated information on human behavior from maps, busyness, and aggregated search trends, and environmental factors such as weather and air quality. We then model this data and the complex relationships between locations using a graph neural network, producing embeddings that can be adapted to a wide range of downstream tasks using relatively simple models. We evaluate the effectiveness of our approach by benchmarking it on 27 downstream tasks spanning three distinct domains: health indicators, socioeconomic factors, and environmental measurements. The approach achieves state-of-the-art performance on all 27 geospatial interpolation tasks, and on 25 out of the 27 extrapolation and super-resolution tasks. We combined the PDFM with a state-of-the-art forecasting foundation model, TimesFM, to predict unemployment and poverty, achieving performance that surpasses fully supervised forecasting. The full set of embeddings and sample code are publicly available for researchers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PDFM builds a multi-modal GNN on US postal and county data and releases the embeddings, which appear to help on a 27-task benchmark, but the abstract gives no evaluation details so the SOTA claims are uncheckable right now.

read the letter

The paper's core move is to build a geo-indexed US dataset from maps, busyness, search trends, weather and air quality, run it through a GNN to get embeddings for postal codes and counties, then show those embeddings work with simple heads on 27 downstream tasks in health, socioeconomic and environmental domains. It also tacks the embeddings onto TimesFM for unemployment and poverty forecasting and reports better results than fully supervised baselines. The public release of embeddings and sample code is the clearest practical output here; anyone doing location-aware modeling in the US can grab them without rebuilding the feature stack from scratch. The 27-task spread across three domains is broader than most single-paper geospatial benchmarks I've seen, and the interpolation results look consistent on the surface. The extrapolation and super-resolution numbers are more mixed but still mostly positive. The main gap is that none of the abstract or the supplied summary shows baselines, splits, error bars, ablations or statistical tests. Without those, the performance claims stay provisional even if the pipeline itself is standard. The stress-test note is right that nothing in the description is internally inconsistent, but that doesn't substitute for seeing the actual numbers and controls. This is aimed at applied researchers who need reusable location representations for resource allocation or similar problems rather than theorists looking for new architectures. I'd bring it to a reading group if the full evaluation sections turn out to be solid, because the release lowers the barrier for follow-on work. It deserves peer review because the artifacts are real and the benchmark is large enough to be worth referee time, even if the write-up needs tightening on the experimental protocol.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a Population Dynamics Foundation Model (PDFM) that constructs a geo-indexed multi-modal US dataset (maps, busyness, search trends, weather, air quality) for postal codes and counties, trains a graph neural network to produce location embeddings, and applies simple downstream heads for 27 tasks across health indicators, socioeconomic factors, and environmental measurements. It reports SOTA results on all 27 interpolation tasks and 25/27 extrapolation and super-resolution tasks, plus improved performance on unemployment and poverty forecasting when combined with TimesFM. Embeddings and sample code are released publicly.

Significance. If the performance claims hold under rigorous evaluation, the work supplies a reusable embedding resource that reduces task-specific feature engineering for geospatial inference, with the public release directly supporting reproducibility and follow-on research in health, socioeconomic, and environmental domains.

major comments (2)

[§4] §4 (Evaluation protocol): the central SOTA claims on the 27 tasks require explicit reporting of all baselines, data splits (train/val/test), statistical significance tests, error bars, and ablation studies; the abstract supplies none of these details, and without them in the results section the performance numbers cannot be assessed as load-bearing evidence.
[§3.2] §3.2 (GNN architecture and training): the claim that the embeddings capture 'sufficiently general relationships' to enable simple heads to reach SOTA on held-out tasks in three domains rests on the multi-modal graph construction; the manuscript must demonstrate that performance does not collapse when any single modality is removed, or the generality argument is undercut.

minor comments (2)

[Figure 1] Figure 1 and §2: the dataset construction diagram and text should clarify the exact spatial resolution (postal code vs. county) used for each modality and how missing values are handled.
[§5] §5 (Forecasting experiments): the combination with TimesFM is presented as surpassing fully supervised forecasting, but the supervised baseline details (architecture, training data, hyper-parameters) are needed for direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to incorporate additional details and experiments where they strengthen the evaluation and generality claims.

read point-by-point responses

Referee: [§4] §4 (Evaluation protocol): the central SOTA claims on the 27 tasks require explicit reporting of all baselines, data splits (train/val/test), statistical significance tests, error bars, and ablation studies; the abstract supplies none of these details, and without them in the results section the performance numbers cannot be assessed as load-bearing evidence.

Authors: The results section reports the full set of baselines (task-specific models and prior embedding approaches), the train/val/test splits for all 27 interpolation/extrapolation/super-resolution tasks, and the raw performance numbers. To meet the referee's standards for rigor, we will add (i) a consolidated table listing every baseline and split, (ii) statistical significance tests (paired t-tests across tasks), and (iii) error bars from multiple random seeds. Ablation results on GNN depth and aggregation already appear in the appendix; these will be moved to the main text with expanded discussion. revision: yes
Referee: [§3.2] §3.2 (GNN architecture and training): the claim that the embeddings capture 'sufficiently general relationships' to enable simple heads to reach SOTA on held-out tasks in three domains rests on the multi-modal graph construction; the manuscript must demonstrate that performance does not collapse when any single modality is removed, or the generality argument is undercut.

Authors: We agree that an explicit modality-ablation study would directly support the generality claim. In the revised manuscript we will add results for five ablation variants (removing maps, busyness, search trends, weather, or air quality one at a time) evaluated on a representative subset of tasks from each domain. These experiments will quantify the performance drop and confirm that no single modality is solely responsible for the observed SOTA results. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical pipeline with held-out evaluation

full rationale

The paper presents a standard empirical pipeline: construct a geo-indexed US dataset from public sources, train a GNN to produce embeddings, then apply simple downstream heads to 27 held-out tasks in interpolation/extrapolation/super-resolution across three domains. All performance numbers are reported on tasks separate from the embedding training objective; no equations, predictions, or uniqueness claims are shown that reduce by construction to fitted inputs or self-citations. The central result is a reproducible benchmark on public embeddings, not a derivation that collapses to its own definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the transferability of GNN embeddings learned from the described US dataset; the abstract provides no details on model architecture hyperparameters, training procedure, or data preprocessing choices that would normally appear as free parameters.

axioms (1)

domain assumption Graph neural networks can learn useful representations of geospatial relationships from aggregated multi-modal data.
Invoked when the paper states that the GNN produces adaptable embeddings.

pith-pipeline@v0.9.0 · 5965 in / 1331 out tokens · 27520 ms · 2026-05-23T17:25:31.994465+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We then model this data and the complex relationships between locations using a graph neural network, producing embeddings that can be adapted to a wide range of downstream tasks using relatively simple models.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The approach achieves state-of-the-art performance on all 27 geospatial interpolation tasks...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

UNIGEOCLIP: Unified Geospatial Contrastive Learning
cs.CV 2026-04 unverdicted novelty 7.0

UNIGEOCLIP creates a unified embedding for aerial imagery, street views, elevation, text, and coordinates via all-to-all contrastive alignment plus a scaled lat-long encoder, outperforming single-modality and coordina...
Geospatial foundation-model embeddings improve population estimation unevenly across space and scale
cs.LG 2026-05 unverdicted novelty 5.0

PDFM embeddings reduce unexplained variance in subnational population estimates by a median 20.1% versus geospatial covariates, with gains strongest in larger less-developed areas but weaker transfer across scales.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · cited by 2 Pith papers · 1 internal anchor

[1]

Abadi, P

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. \ TensorFlow \ : a system for \ Large-Scale \ machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16), pages 265--283, 2016

work page 2016
[2]

Bavadekar, A

S. Bavadekar, A. Boulanger, J. Davis, D. Desfontaines, E. Gabrilovich, K. Gadepalli, B. Ghazi, T. Griffith, J. Gupta, C. Kamath, et al. Google COVID -19 vaccination search insights: Anonymization process description. arXiv preprint arXiv:2107.01179, 2021

work page arXiv 2021
[3]

G. E. Box and G. M. Jenkins. Time series analysis. Forecasting and control. Holden-Day Series in Time Series Analysis. Holden-Day, 1976

work page 1976
[4]

J. M. Brick and G. Kalton. Handling missing data in survey research. Statistical methods in medical research, 5 0 (3): 0 215--238, 1996

work page 1996
[5]

Cdc places, 2024

Centers for Disease Control and Prevention . Cdc places, 2024. URL https://www.cdc.gov/places. Accessed 29 May 2024

work page 2024
[6]

Cesare, P

N. Cesare, P. Dwivedi, Q. C. Nguyen, and E. O. Nsoesie. Use of social media, search queries, and demographic data to assess obesity prevalence in the united states. Palgrave communications, 5 0 (1): 0 1--9, 2019

work page 2019
[7]

Choi and H

H. Choi and H. Varian. Predicting the present with google trends. Economic record, 88: 0 2--9, 2012

work page 2012
[8]

S. Y. Chung, S. Venkatramanan, H. E. Elzain, S. Selvam, and M. Prasanna. Supplement of missing data in groundwater-level variations of peak type using geostatistical methods. GIS and geostatistical techniques for groundwater science, 33, 2019

work page 2019
[9]

A. Das, W. Kong, R. Sen, and Y. Zhou. A decoder-only foundation model for time-series forecasting. arXiv preprint arXiv:2310.10688, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

Data commons 2024, cdc places, electronic dataset

Data Commons . Data commons 2024, cdc places, electronic dataset. https://datacommons.org, 2024. Accessed: 2024-05-29

work page 2024
[11]

Deville, C

P. Deville, C. Linard, S. Martin, M. Gilbert, F. R. Stevens, A. E. Gaughan, V. D. Blondel, and A. J. Tatem. Dynamic population mapping using mobile phone data. Proceedings of the National Academy of Sciences, 111 0 (45): 0 15888--15893, 2014

work page 2014
[12]

O. J. Dunn. Multiple comparisons among means. Journal of the American statistical association, 56 0 (293): 0 52--64, 1961

work page 1961
[13]

P. Fabian. Scikit-learn: Machine learning in python. Journal of machine learning research 12, page 2825, 2011

work page 2011
[14]

G. E. Fasshauer. Meshfree Approximation Methods with MATLAB. World Scientific Pub Co Inc, 2007

work page 2007
[15]

Ginsberg, M

J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, M. S. Smolinski, and L. Brilliant. Detecting influenza epidemics using search engine query data. Nature, 457 0 (7232): 0 1012--1014, 2009

work page 2009
[16]

Gorelick, M

N. Gorelick, M. Hancher, M. Dixon, S. Ilyushchenko, D. Thau, and R. Moore. Google earth engine: Planetary-scale geospatial analysis for everyone. Remote sensing of Environment, 202: 0 18--27, 2017

work page 2017
[17]

G. N. Graham. Why your zip code matters more than your genetic code: promoting healthy outcomes from mother to child. Breastfeeding Medicine, 11 0 (8): 0 396--397, 2016

work page 2016
[18]

Grinsztajn, E

L. Grinsztajn, E. Oyallon, and G. Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data? Advances in neural information processing systems, 35: 0 507--520, 2022

work page 2022
[19]

Gupta, P

N. Gupta, P. Zurn, K. Diallo, and M. R. Dal Poz. Uses of population census data for monitoring geographical imbalance in the health workforce: snapshots from three developing countries. International Journal for Equity in Health, 2: 0 1--12, 2003

work page 2003
[20]

Hamilton, Z

W. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017

work page 2017
[21]

Jaidka, J

K. Jaidka, J. Eichstaedt, S. Giorgi, H. A. Schwartz, and L. H. Ungar. Information-seeking vs. sharing: Which explains regional health? an analysis of google search and twitter trends. Telematics and Informatics, 59: 0 101540, 2021

work page 2021
[22]

Kaplan, Y

H. Kaplan, Y. Mansour, Y. Matias, and U. Stemmer. Differentially private learning of geometric concepts. In International Conference on Machine Learning, pages 3233--3241. PMLR, 2019

work page 2019
[23]

Klemmer, E

K. Klemmer, E. Rolf, C. Robinson, L. Mackey, and M. Ru wurm. Satclip: Global, general-purpose location embeddings with satellite imagery. arXiv preprint arXiv:2311.17179, 2023

work page arXiv 2023
[24]

R. Lam, A. Sanchez-Gonzalez, M. Willson, P. Wirnsberger, M. Fortunato, F. Alet, S. Ravuri, T. Ewalds, Z. Eaton-Rosen, W. Hu, et al. Learning skillful medium-range global weather forecasting. Science, 382 0 (6677): 0 1416--1421, 2023

work page 2023
[25]

G. Mai, N. Lao, Y. He, J. Song, and S. Ermon. Csp: Self-supervised contrastive spatial pre-training for geospatial-visual representations. In International Conference on Machine Learning, pages 23498--23515. PMLR, 2023

work page 2023
[26]

Manvi, S

R. Manvi, S. Khanna, G. Mai, M. Burke, D. Lobell, and S. Ermon. Geollm: Extracting geospatial knowledge from large language models. arXiv preprint arXiv:2310.06213, 2023

work page arXiv 2023
[27]

S. M. Monnat, D. J. Peters, M. T. Berg, and A. Hochstetler. Using census data to understand county-level differences in overall drug mortality and opioid-related mortality by opioid type. American Journal of Public Health, 109 0 (8): 0 1084--1091, 2019

work page 2019
[28]

Nearing, D

G. Nearing, D. Cohen, V. Dube, M. Gauch, O. Gilon, S. Harrigan, A. Hassidim, D. Klotz, F. Kratzert, A. Metzger, et al. Global prediction of extreme floods in ungauged watersheds. Nature, 627 0 (8004): 0 559--563, 2024

work page 2024
[29]

E. Rolf, J. Proctor, T. Carleton, I. Bolliger, V. Shankar, M. Ishihara, B. Recht, and S. Hsiang. A generalizable and accessible approach to machine learning with global satellite imagery. Nature communications, 12 0 (1): 0 4392, 2021

work page 2021
[30]

D. Shepard. A two-dimensional interpolation function for irregularly-spaced data. In Proceedings of the 1968 23rd ACM national conference, pages 517--524, 1968

work page 1968
[31]

Shwartz-Ziv and A

R. Shwartz-Ziv and A. Armon. Tabular data: Deep learning is not all you need. Information Fusion, 81: 0 84--90, 2022

work page 2022
[32]

M. Sun, C. Kamath, M. Agarwal, A. Muslim, H. Yee, D. Schottlander, S. Bavadekar, N. Efron, S. Shetty, and G. Prasad. Community search signatures as foundation features for human-centered geospatial modeling. arXiv preprint arXiv:2410.22721, 2024

work page arXiv 2024
[33]

Tkachenko, S

N. Tkachenko, S. Chotvijit, N. Gupta, E. Bradley, C. Gilks, W. Guo, H. Crosby, E. Shore, M. Thiarai, R. Procter, et al. Google trends can improve surveillance of type 2 diabetes. Scientific reports, 7 0 (1): 0 4993, 2017

work page 2017
[34]

Vivanco Cepeda, G

V. Vivanco Cepeda, G. K. Nayak, and M. Shah. Geoclip: Clip-inspired alignment between locations and images for effective worldwide geo-localization. Advances in Neural Information Processing Systems, 36, 2024

work page 2024
[35]

Y. Yin, Z. Liu, Y. Zhang, S. Wang, R. R. Shah, and R. Zimmermann. Gps2vec: Towards generating worldwide gps embeddings. In Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 416--419, 2019

work page 2019

[1] [1]

Abadi, P

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. \ TensorFlow \ : a system for \ Large-Scale \ machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16), pages 265--283, 2016

work page 2016

[2] [2]

Bavadekar, A

S. Bavadekar, A. Boulanger, J. Davis, D. Desfontaines, E. Gabrilovich, K. Gadepalli, B. Ghazi, T. Griffith, J. Gupta, C. Kamath, et al. Google COVID -19 vaccination search insights: Anonymization process description. arXiv preprint arXiv:2107.01179, 2021

work page arXiv 2021

[3] [3]

G. E. Box and G. M. Jenkins. Time series analysis. Forecasting and control. Holden-Day Series in Time Series Analysis. Holden-Day, 1976

work page 1976

[4] [4]

J. M. Brick and G. Kalton. Handling missing data in survey research. Statistical methods in medical research, 5 0 (3): 0 215--238, 1996

work page 1996

[5] [5]

Cdc places, 2024

Centers for Disease Control and Prevention . Cdc places, 2024. URL https://www.cdc.gov/places. Accessed 29 May 2024

work page 2024

[6] [6]

Cesare, P

N. Cesare, P. Dwivedi, Q. C. Nguyen, and E. O. Nsoesie. Use of social media, search queries, and demographic data to assess obesity prevalence in the united states. Palgrave communications, 5 0 (1): 0 1--9, 2019

work page 2019

[7] [7]

Choi and H

H. Choi and H. Varian. Predicting the present with google trends. Economic record, 88: 0 2--9, 2012

work page 2012

[8] [8]

S. Y. Chung, S. Venkatramanan, H. E. Elzain, S. Selvam, and M. Prasanna. Supplement of missing data in groundwater-level variations of peak type using geostatistical methods. GIS and geostatistical techniques for groundwater science, 33, 2019

work page 2019

[9] [9]

A. Das, W. Kong, R. Sen, and Y. Zhou. A decoder-only foundation model for time-series forecasting. arXiv preprint arXiv:2310.10688, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[10] [10]

Data commons 2024, cdc places, electronic dataset

Data Commons . Data commons 2024, cdc places, electronic dataset. https://datacommons.org, 2024. Accessed: 2024-05-29

work page 2024

[11] [11]

Deville, C

P. Deville, C. Linard, S. Martin, M. Gilbert, F. R. Stevens, A. E. Gaughan, V. D. Blondel, and A. J. Tatem. Dynamic population mapping using mobile phone data. Proceedings of the National Academy of Sciences, 111 0 (45): 0 15888--15893, 2014

work page 2014

[12] [12]

O. J. Dunn. Multiple comparisons among means. Journal of the American statistical association, 56 0 (293): 0 52--64, 1961

work page 1961

[13] [13]

P. Fabian. Scikit-learn: Machine learning in python. Journal of machine learning research 12, page 2825, 2011

work page 2011

[14] [14]

G. E. Fasshauer. Meshfree Approximation Methods with MATLAB. World Scientific Pub Co Inc, 2007

work page 2007

[15] [15]

Ginsberg, M

J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, M. S. Smolinski, and L. Brilliant. Detecting influenza epidemics using search engine query data. Nature, 457 0 (7232): 0 1012--1014, 2009

work page 2009

[16] [16]

Gorelick, M

N. Gorelick, M. Hancher, M. Dixon, S. Ilyushchenko, D. Thau, and R. Moore. Google earth engine: Planetary-scale geospatial analysis for everyone. Remote sensing of Environment, 202: 0 18--27, 2017

work page 2017

[17] [17]

G. N. Graham. Why your zip code matters more than your genetic code: promoting healthy outcomes from mother to child. Breastfeeding Medicine, 11 0 (8): 0 396--397, 2016

work page 2016

[18] [18]

Grinsztajn, E

L. Grinsztajn, E. Oyallon, and G. Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data? Advances in neural information processing systems, 35: 0 507--520, 2022

work page 2022

[19] [19]

Gupta, P

N. Gupta, P. Zurn, K. Diallo, and M. R. Dal Poz. Uses of population census data for monitoring geographical imbalance in the health workforce: snapshots from three developing countries. International Journal for Equity in Health, 2: 0 1--12, 2003

work page 2003

[20] [20]

Hamilton, Z

W. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017

work page 2017

[21] [21]

Jaidka, J

K. Jaidka, J. Eichstaedt, S. Giorgi, H. A. Schwartz, and L. H. Ungar. Information-seeking vs. sharing: Which explains regional health? an analysis of google search and twitter trends. Telematics and Informatics, 59: 0 101540, 2021

work page 2021

[22] [22]

Kaplan, Y

H. Kaplan, Y. Mansour, Y. Matias, and U. Stemmer. Differentially private learning of geometric concepts. In International Conference on Machine Learning, pages 3233--3241. PMLR, 2019

work page 2019

[23] [23]

Klemmer, E

K. Klemmer, E. Rolf, C. Robinson, L. Mackey, and M. Ru wurm. Satclip: Global, general-purpose location embeddings with satellite imagery. arXiv preprint arXiv:2311.17179, 2023

work page arXiv 2023

[24] [24]

R. Lam, A. Sanchez-Gonzalez, M. Willson, P. Wirnsberger, M. Fortunato, F. Alet, S. Ravuri, T. Ewalds, Z. Eaton-Rosen, W. Hu, et al. Learning skillful medium-range global weather forecasting. Science, 382 0 (6677): 0 1416--1421, 2023

work page 2023

[25] [25]

G. Mai, N. Lao, Y. He, J. Song, and S. Ermon. Csp: Self-supervised contrastive spatial pre-training for geospatial-visual representations. In International Conference on Machine Learning, pages 23498--23515. PMLR, 2023

work page 2023

[26] [26]

Manvi, S

R. Manvi, S. Khanna, G. Mai, M. Burke, D. Lobell, and S. Ermon. Geollm: Extracting geospatial knowledge from large language models. arXiv preprint arXiv:2310.06213, 2023

work page arXiv 2023

[27] [27]

S. M. Monnat, D. J. Peters, M. T. Berg, and A. Hochstetler. Using census data to understand county-level differences in overall drug mortality and opioid-related mortality by opioid type. American Journal of Public Health, 109 0 (8): 0 1084--1091, 2019

work page 2019

[28] [28]

Nearing, D

G. Nearing, D. Cohen, V. Dube, M. Gauch, O. Gilon, S. Harrigan, A. Hassidim, D. Klotz, F. Kratzert, A. Metzger, et al. Global prediction of extreme floods in ungauged watersheds. Nature, 627 0 (8004): 0 559--563, 2024

work page 2024

[29] [29]

E. Rolf, J. Proctor, T. Carleton, I. Bolliger, V. Shankar, M. Ishihara, B. Recht, and S. Hsiang. A generalizable and accessible approach to machine learning with global satellite imagery. Nature communications, 12 0 (1): 0 4392, 2021

work page 2021

[30] [30]

D. Shepard. A two-dimensional interpolation function for irregularly-spaced data. In Proceedings of the 1968 23rd ACM national conference, pages 517--524, 1968

work page 1968

[31] [31]

Shwartz-Ziv and A

R. Shwartz-Ziv and A. Armon. Tabular data: Deep learning is not all you need. Information Fusion, 81: 0 84--90, 2022

work page 2022

[32] [32]

M. Sun, C. Kamath, M. Agarwal, A. Muslim, H. Yee, D. Schottlander, S. Bavadekar, N. Efron, S. Shetty, and G. Prasad. Community search signatures as foundation features for human-centered geospatial modeling. arXiv preprint arXiv:2410.22721, 2024

work page arXiv 2024

[33] [33]

Tkachenko, S

N. Tkachenko, S. Chotvijit, N. Gupta, E. Bradley, C. Gilks, W. Guo, H. Crosby, E. Shore, M. Thiarai, R. Procter, et al. Google trends can improve surveillance of type 2 diabetes. Scientific reports, 7 0 (1): 0 4993, 2017

work page 2017

[34] [34]

Vivanco Cepeda, G

V. Vivanco Cepeda, G. K. Nayak, and M. Shah. Geoclip: Clip-inspired alignment between locations and images for effective worldwide geo-localization. Advances in Neural Information Processing Systems, 36, 2024

work page 2024

[35] [35]

Y. Yin, Z. Liu, Y. Zhang, S. Wang, R. R. Shah, and R. Zimmermann. Gps2vec: Towards generating worldwide gps embeddings. In Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 416--419, 2019

work page 2019