Enhancing the Socioeconomic Understanding of Foundation Models with Urban Mobility
Pith reviewed 2026-06-28 12:08 UTC · model grok-4.3
The pith
Incorporating mobility networks improves foundation models' socioeconomic predictions for urban areas.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Mobility networks can elicit the geospatial capabilities of foundation models by explicitly encoding connectivity among urban entities that static attributes such as POI text and satellite imagery do not capture. MobFusion, instantiated in three complementary designs on anonymized large-scale mobility datasets from three U.S. metropolitan areas, improves urban prediction tasks including median household income, population density, and crime prediction.
What carries the argument
MobFusion, a modular mobility-enhanced foundation model fusion paradigm with three designs: mobility networks as contexts for zero-shot LLM prompting, as graph connectors for fusing geospatial visual embeddings with textual embeddings, and as structured tokens for multimodal LLM reasoning.
If this is right
- Mobility integration improves accuracy on socioeconomic prediction tasks such as income, density, and crime across multiple cities.
- Three fusion designs offer complementary ways to combine mobility with existing foundation model inputs.
- Foundation models acquire better geospatial understanding when mobility patterns are explicitly included.
- Urban applications that rely on socioeconomic forecasts can use mobility-enhanced models for higher performance.
Where Pith is reading between the lines
- Mobility fusion techniques could extend to other spatial prediction domains such as traffic flow or land-use change.
- Testing on cities outside the U.S. or with different mobility data resolutions would clarify the scope of the gains.
- The results imply that dynamic network data can serve as a general complement to static geospatial features in multimodal models.
Load-bearing premise
Mobility networks provide connectivity information among urban places that static attributes like POI text and satellite imagery cannot capture.
What would settle it
An experiment on the same urban prediction tasks where adding mobility networks produces no accuracy gain or produces lower accuracy than the static-attribute baselines.
Figures
read the original abstract
Foundation models have recently been applied to urban socioeconomic prediction using POI text, satellite imagery, and geospatial descriptions. However, these models mostly rely on static attributes of individual places, while ignoring the mobility patterns that reveal how places are functionally connected. To address this gap, we explore whether mobility networks can elicit the geospatial capabilities of foundation models by explicitly encoding connectivity among urban entities. We propose \textit{MobFusion}, a modular mobility-enhanced foundation model fusion paradigm, and instantiate it through three complementary designs: (i) mobility networks as contexts for zero-shot LLM prompting, (ii) as graph connectors for fusing geospatial visual embeddings with textual embeddings, and (iii) as structured tokens for multimodal LLM reasoning. Using anonymized large-scale mobility datasets from three U.S. metropolitan areas, we find that \textit{MobFusion} improves urban prediction tasks (e.g., median household income, population density, and crime prediction) across three instantiations, demonstrating that incorporating human mobility can effectively improve the socioeconomic understanding of foundation models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MobFusion, a modular paradigm for fusing urban mobility networks with foundation models to enhance their socioeconomic understanding. It instantiates the approach in three complementary ways—mobility networks as contexts for zero-shot LLM prompting, as graph connectors for fusing visual and textual embeddings, and as structured tokens for multimodal LLM reasoning—and reports that these yield performance gains on urban prediction tasks (median household income, population density, crime) using anonymized large-scale mobility data from three U.S. metropolitan areas.
Significance. If the empirical gains prove robust, the work would indicate that mobility-derived functional connectivity supplies geospatial signals absent from static POI text or satellite imagery, thereby extending foundation-model applications in urban socioeconomic modeling. The modular design, which supports multiple fusion strategies, is a constructive contribution that could facilitate further experimentation.
major comments (2)
- [§5] §5 (Experimental Evaluation): The reported improvements lack control experiments that preserve input volume, dimensionality, and architecture while destroying mobility structure (e.g., random rewiring of the mobility graph that retains degree sequence). Without such ablations, it remains unclear whether gains arise from the claimed connectivity signal or from the added fusion mechanisms and data volume themselves; this directly bears on the central claim that mobility networks elicit unique geospatial capabilities.
- [§4] §4 (MobFusion Instantiations): The descriptions of the three fusion designs do not specify the precise encoding of mobility networks (e.g., how edges are tokenized or how graph connectors are constructed) or include ablation variants that isolate connectivity from other mobility-derived features, making it difficult to attribute performance differences to the functional connectivity asserted in the abstract.
minor comments (2)
- The abstract would be strengthened by the inclusion of concrete quantitative deltas, baseline models, and statistical significance measures to support the performance claims.
- Notation for the three instantiations could be made more consistent across the text and figures to improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate the requested controls and clarifications.
read point-by-point responses
-
Referee: [§5] §5 (Experimental Evaluation): The reported improvements lack control experiments that preserve input volume, dimensionality, and architecture while destroying mobility structure (e.g., random rewiring of the mobility graph that retains degree sequence). Without such ablations, it remains unclear whether gains arise from the claimed connectivity signal or from the added fusion mechanisms and data volume themselves; this directly bears on the central claim that mobility networks elicit unique geospatial capabilities.
Authors: We agree that such controls are necessary to isolate the contribution of mobility structure. In the revised manuscript we will add random-rewiring ablations that preserve degree sequence, input volume, and model architecture for all three fusion designs. Results will be reported alongside the original experiments to directly test whether performance gains depend on functional connectivity rather than data volume or fusion mechanics alone. revision: yes
-
Referee: [§4] §4 (MobFusion Instantiations): The descriptions of the three fusion designs do not specify the precise encoding of mobility networks (e.g., how edges are tokenized or how graph connectors are constructed) or include ablation variants that isolate connectivity from other mobility-derived features, making it difficult to attribute performance differences to the functional connectivity asserted in the abstract.
Authors: We acknowledge that greater technical detail is required. The revision will expand §4 with explicit specifications of edge tokenization, graph-connector construction, and embedding fusion steps for each of the three designs. We will also add ablation variants that disrupt connectivity (e.g., random edge permutation or feature ablation) while retaining other mobility-derived statistics, allowing clearer attribution of gains to functional connectivity. revision: yes
Circularity Check
No circularity: empirical fusion experiments with measured outcomes
full rationale
The paper proposes the MobFusion paradigm and its three instantiations, then reports empirical performance gains on socioeconomic prediction tasks using real mobility datasets from three metropolitan areas. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described content. The central claim rests on experimental results rather than any quantity defined in terms of its own inputs, satisfying the default expectation of a non-circular empirical study.
Axiom & Free-Parameter Ledger
invented entities (1)
-
MobFusion
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Nature Cities , volume=
The city as text , author=. Nature Cities , volume=. 2025 , publisher=
2025
-
[2]
Science , volume=
Network diversity and economic development , author=. Science , volume=. 2010 , publisher=
2010
-
[3]
Nature communications , volume=
Uncovering the spatial structure of mobility networks , author=. Nature communications , volume=. 2015 , publisher=
2015
-
[4]
Nature Cities , pages=
Global urban visual perception varies across demographics and personalities , author=. Nature Cities , pages=. 2025 , publisher=
2025
-
[5]
Proceedings of the National Academy of Sciences , volume=
Urban visual intelligence: Uncovering hidden city profiles with street view images , author=. Proceedings of the National Academy of Sciences , volume=. 2023 , publisher=
2023
-
[6]
arXiv preprint arXiv:2310.06213 , year=
Geollm: Extracting geospatial knowledge from large language models , author=. arXiv preprint arXiv:2310.06213 , year=
-
[7]
arXiv preprint arXiv:2402.02680 , year=
Large language models are geographically biased , author=. arXiv preprint arXiv:2402.02680 , year=
-
[8]
arXiv preprint arXiv:2507.22291 , year=
Alphaearth foundations: An embedding field model for accurate and efficient global mapping from sparse label data , author=. arXiv preprint arXiv:2507.22291 , year=
-
[9]
CityLens: Evaluating Large Vision-Language Models for Urban Socioeconomic Sensing , author=
-
[10]
arXiv preprint arXiv:2510.22282 , year=
CityRiSE: Reasoning Urban Socio-Economic Status in Vision-Language Models via Reinforcement Learning , author=. arXiv preprint arXiv:2510.22282 , year=
-
[11]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
UrbanVLP: Multi-granularity vision-language pretraining for urban socioeconomic indicator prediction , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[12]
The Fourteenth International Conference on Learning Representations , year=
MoRA: Mobility as the Backbone for Geospatial Representation Learning at Scale , author=. The Fourteenth International Conference on Learning Representations , year=
-
[13]
arXiv preprint arXiv:2411.07207 , year=
General geospatial inference with a population dynamics foundation model , author=. arXiv preprint arXiv:2411.07207 , year=
-
[14]
Scientific reports , volume=
Uncovering the socioeconomic facets of human mobility , author=. Scientific reports , volume=. 2021 , publisher=
2021
-
[15]
Annual Review of Sociology , volume=
Urban mobility and activity space , author=. Annual Review of Sociology , volume=. 2020 , publisher=
2020
-
[16]
Nature , volume=
Machine learning and phone data can improve targeting of humanitarian aid , author=. Nature , volume=. 2022 , publisher=
2022
-
[17]
Nature communications , volume=
Mobility patterns are associated with experienced income segregation in large US cities , author=. Nature communications , volume=. 2021 , publisher=
2021
-
[18]
Proceedings of the National Academy of Sciences , volume=
Estimating experienced racial segregation in US cities using large-scale GPS data , author=. Proceedings of the National Academy of Sciences , volume=. 2021 , publisher=
2021
-
[19]
Environment and Planning B: Urban Analytics and City Science , volume=
Revisiting Jane Jacobs: quantifying urban diversity , author=. Environment and Planning B: Urban Analytics and City Science , volume=. 2022 , publisher=
2022
-
[20]
Scientific Reports , volume=
Commuting network effect on urban wealth scaling , author=. Scientific Reports , volume=. 2021 , publisher=
2021
-
[21]
Journal of The Royal Society Interface , volume=
Unravelling daily human mobility motifs , author=. Journal of The Royal Society Interface , volume=. 2013 , publisher=
2013
-
[22]
Scientific Reports , volume=
Uncovering structural diversity in commuting networks: global and local entropy , author=. Scientific Reports , volume=. 2022 , publisher=
2022
-
[23]
Nature , volume=
Mobility network models of COVID-19 explain inequities and inform reopening , author=. Nature , volume=. 2021 , publisher=
2021
-
[24]
American Sociological Review , volume=
Triple disadvantage: neighborhood networks of everyday urban mobility and violence in US cities , author=. American Sociological Review , volume=. 2020 , publisher=
2020
-
[25]
Proceedings of the AAAI conference on artificial intelligence , volume=
Heterogeneous region embedding with prompt learning , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
-
[26]
Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence , pages=
Multi-view joint graph representation learning for urban region embedding , author=. Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence , pages=
-
[27]
Proceedings of the 29th ACM International Conference on Information & Knowledge Management , pages=
Predicting economic growth by region embedding: A multigraph convolutional network approach , author=. Proceedings of the 29th ACM International Conference on Information & Knowledge Management , pages=
-
[28]
arXiv preprint arXiv:2510.13774 , year=
UrbanFusion: Stochastic Multimodal Fusion for Contrastive Learning of Robust Spatial Representations , author=. arXiv preprint arXiv:2510.13774 , year=
-
[29]
2025 , eprint=
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation , author=. 2025 , eprint=
2025
-
[30]
Foot. 2025 , publisher =. doi:10.82551/HYH5-PC45 , url =
-
[31]
Global. 2022 , publisher =. doi:10.82551/SMXB-1K04 , url =
-
[32]
2023 , howpublished =
Census. 2023 , howpublished =
2023
-
[33]
arXiv preprint arXiv:1711.03654 , year=
Poverty prediction with public landsat 7 satellite imagery and machine learning , author=. arXiv preprint arXiv:1711.03654 , year=
-
[34]
Science , volume=
Combining satellite imagery and machine learning to predict poverty , author=. Science , volume=. 2016 , publisher=
2016
-
[35]
2024 , eprint=
Let Your Graph Do the Talking: Encoding Structured Data for LLMs , author=. 2024 , eprint=
2024
-
[36]
International Conference on Learning Representations (ICLR) , year=
Talk like a Graph: Encoding Graphs for Large Language Models , author=. International Conference on Learning Representations (ICLR) , year=
-
[37]
2024 , eprint=
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing , author=. 2024 , eprint=
2024
-
[38]
2017 , eprint=
Modeling Relational Data with Graph Convolutional Networks , author=. 2017 , eprint=
2017
-
[39]
arXiv preprint arXiv:2002.05709 , year=
A Simple Framework for Contrastive Learning of Visual Representations , author=. arXiv preprint arXiv:2002.05709 , year=
Pith/arXiv arXiv 2002
-
[40]
arXiv preprint arXiv:1807.03748 , year=
Representation learning with contrastive predictive coding , author=. arXiv preprint arXiv:1807.03748 , year=
-
[41]
Crime Incident Reports (August 2015 to Date) (Source: New System) , year =
2015
-
[42]
Crimes -- 2023 , year =
2023
-
[43]
2024 , howpublished =
2024
-
[44]
arXiv preprint arXiv:1802.03426 , year=
Umap: Uniform manifold approximation and projection for dimension reduction , author=. arXiv preprint arXiv:1802.03426 , year=
-
[45]
Authorea Preprints , year=
Language Models Meet Urban Mobility: A Data-Centric Review , author=. Authorea Preprints , year=
-
[46]
arXiv preprint arXiv:2303.08774 , year=
Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=
-
[47]
arXiv preprint arXiv:2507.06261 , year=
Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities , author=. arXiv preprint arXiv:2507.06261 , year=
-
[48]
2025 , howpublished=
Introducing GPT-5 , author=. 2025 , howpublished=
2025
-
[49]
Technometrics , volume=
Ridge regression: Biased estimation for nonorthogonal problems , author=. Technometrics , volume=. 1970 , publisher=
1970
-
[50]
arXiv preprint arXiv:2511.21631 , year=
Qwen3-vl technical report , author=. arXiv preprint arXiv:2511.21631 , year=
-
[51]
Transportation Research Part D: Transport and Environment , volume=
Quantifying the nonlinear causal impact of commute time on US remote work , author=. Transportation Research Part D: Transport and Environment , volume=. 2026 , publisher=
2026
-
[52]
NeurIPS , year=
Visual Instruction Tuning , author=. NeurIPS , year=
-
[53]
, author=
Lora: Low-rank adaptation of large language models. , author=. Iclr , volume=
-
[54]
Cities , volume=
Evaluating cities' vitality and identifying ghost cities in China with emerging geographical data , author=. Cities , volume=. 2017 , publisher=
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.