pith. sign in

arxiv: 2606.08046 · v1 · pith:JGZLTIYOnew · submitted 2026-06-06 · 💻 cs.AI · cs.CV· cs.LG

OSMGraphCLIP: Learning Global Location Representations from OpenStreetMap Graphs

Pith reviewed 2026-06-27 19:47 UTC · model grok-4.3

classification 💻 cs.AI cs.CVcs.LG
keywords OpenStreetMapgeospatial embeddingsgraph neural networkscontrastive learninglocation representationsheterogeneous graphs
0
0 comments X

The pith

Structured OpenStreetMap data alone supports global location embeddings that match or exceed satellite baselines on most tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents OSMGraphCLIP, which turns OpenStreetMap features into heterogeneous graphs and trains a multi-scale graph encoder to align with a spherical-harmonics location encoder through contrastive learning. It tests the resulting embeddings on regression and classification tasks covering climate, ecology, socioeconomic indicators, public health, land cover, biodiversity, and wildfire forecasting. Results show the embeddings perform at or above satellite-based methods on the majority of benchmarks, with clearest gains on socioeconomic and public-health tasks where explicit labels for roads, buildings, and land use capture human activity patterns directly. The approach stays competitive on environmental tasks despite using no imagery input, and the embeddings recover biome boundaries and urban gradients from map topology alone.

Core claim

OSMGraphCLIP shows that representing geographic environments as heterogeneous graphs of typed OSM features, processed by a multi-scale graph encoder and aligned via contrastive objective to a spherical-harmonics location encoder, produces embeddings that generalize across domains and match or exceed satellite-based baselines, especially where built-environment semantics matter.

What carries the argument

Heterogeneous graphs of OSM features with multi-scale graph encoding aligned contrastively to a spherical-harmonics location encoder.

If this is right

  • Embeddings recover biome boundaries, urban gradients, and tropical-temperate distinctions from map topology alone.
  • Advantages over satellite methods are largest on socioeconomic and public-health tasks due to explicit semantic annotation of the built environment.
  • Ecological and environmental tasks remain competitive with imagery methods despite using no Earth observation data.
  • The learned embeddings organize geographic space coherently without any satellite input.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Map-derived embeddings could lower data costs for prediction tasks that currently rely on commercial satellite sources.
  • Adding temporal OSM updates might strengthen performance on forecasting tasks such as wildfire prediction.
  • The same graph construction could be tested on regions with sparse OSM coverage to measure how annotation density affects downstream accuracy.

Load-bearing premise

The graph construction from OSM features plus the multi-scale encoder and contrastive objective extract semantic and topological signals that generalize to the reported downstream domains.

What would settle it

Performance on the reported benchmarks drops below satellite baselines when the contrastive alignment step is removed or when key OSM feature types such as buildings and roads are withheld from the graphs.

Figures

Figures reproduced from arXiv: 2606.08046 by Dimitrios Michail, Eleni Saka, Ioannis Giannopoulos, Ioannis Papoutsis.

Figure 1
Figure 1. Figure 1: Geographic distribution of the initial 200k candidate locations used to construct [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: OSMGraphCLIP overview. Given a geographic coordinate, a bounding box of [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: RGB visualization of the first three principal components of OSMGraphCLIP [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Cosine similarity between two reference locations (marked [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: RGB visualization of the first three principal components of OSMGraphCLIP [PITH_FULL_IMAGE:figures/full_fig_p029_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Cosine similarity between two reference locations (marked [PITH_FULL_IMAGE:figures/full_fig_p030_6.png] view at source ↗
read the original abstract

We present OSMGraphCLIP, a CLIP-style geospatial representation model that learns global location embeddings from freely available OpenStreetMap (OSM) data. OSMGraphCLIP represents geographic environments as heterogeneous graphs of typed OSM features, preserving the topological and semantic relationships among roads, buildings, land-use regions, and points of interest. A multi-scale graph encoder captures both fine-grained local structure and broader landscape composition, and supervises a spherical-harmonics location encoder through a contrastive alignment objective. We evaluate OSMGraphCLIP across a diverse suite of downstream geospatial regression and classification tasks spanning climate, ecology, socioeconomic indicators, public health, land cover, biodiversity, and wildfire forecasting, and show that structured OSM data alone supports strong global location representations across domains. OSMGraphCLIP matches or exceeds satellite-based baselines on the majority of benchmarks, with the most pronounced advantage on socioeconomic and public-health tasks, where OSM's explicit semantic annotation of the built environment encodes patterns of human activity that satellite pixels can only capture indirectly. On ecological and environmental tasks, the model remains closely competitive with imagery-based methods despite using no Earth observation data. Qualitative analysis confirms that the learned embeddings organize geographic space coherently, recovering biome boundaries, urban gradients, and tropical--temperate distinctions from map topology alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces OSMGraphCLIP, a CLIP-style model that constructs heterogeneous graphs from OpenStreetMap features (roads, buildings, land-use, POIs) and trains a multi-scale graph encoder to align with a spherical-harmonics location encoder via contrastive loss. It evaluates the resulting global location embeddings on downstream regression and classification tasks spanning climate, ecology, socioeconomic indicators, public health, land cover, biodiversity, and wildfire forecasting, claiming that OSMGraphCLIP matches or exceeds satellite-based baselines on the majority of benchmarks (with largest gains on socioeconomic and public-health tasks) while remaining competitive on ecological tasks despite using no imagery.

Significance. If the empirical claims are substantiated, the result would be significant: it would establish that freely available, semantically annotated vector map data can produce location representations competitive with or superior to satellite imagery for many geospatial tasks, particularly those involving human activity patterns. This has practical implications for data accessibility and cost in geospatial ML and demonstrates the value of explicit topological and semantic structure over pixel-based inputs.

major comments (2)
  1. [Experimental evaluation (implied §4–5)] The provided abstract and summary supply no information on dataset sizes, number of evaluation samples, baseline implementations, hyper-parameter search, or statistical testing for the reported downstream results. Without these details it is impossible to assess whether the claimed superiority on the majority of benchmarks is robust or could be explained by differences in training scale or evaluation protocol.
  2. [Methods and data preparation (implied §3)] The central generalization claim—that the heterogeneous graph construction plus multi-scale encoder and contrastive objective extract task-relevant signals that transfer to the reported domains—rests on the assumption that no data leakage occurs between OSM feature selection/graph construction and the downstream task labels. The manuscript must explicitly describe the train/test splits and confirm that no OSM attributes used in graph construction overlap with evaluation targets.
minor comments (1)
  1. [Abstract] The abstract refers to 'qualitative analysis' confirming coherent organization of geographic space but provides no description of the visualization or analysis method.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their thoughtful review and valuable suggestions for improving the clarity and rigor of our manuscript. We address the two major comments point-by-point below. Both points can be addressed through revisions that enhance experimental details and methodological transparency without altering the core contributions.

read point-by-point responses
  1. Referee: [Experimental evaluation (implied §4–5)] The provided abstract and summary supply no information on dataset sizes, number of evaluation samples, baseline implementations, hyper-parameter search, or statistical testing for the reported downstream results. Without these details it is impossible to assess whether the claimed superiority on the majority of benchmarks is robust or could be explained by differences in training scale or evaluation protocol.

    Authors: We agree that greater detail on these aspects is necessary for assessing robustness. Although the full manuscript describes the evaluation datasets and tasks in Sections 4–5, we will add a new subsection (or expanded table) in the experimental evaluation section that explicitly reports dataset sizes, number of evaluation samples per task, baseline implementation details (including any re-implementations or public code used), the hyperparameter search procedure, and statistical testing (e.g., standard deviations across runs or significance tests). This will allow direct evaluation of whether performance differences are robust. revision: yes

  2. Referee: [Methods and data preparation (implied §3)] The central generalization claim—that the heterogeneous graph construction plus multi-scale encoder and contrastive objective extract task-relevant signals that transfer to the reported domains—rests on the assumption that no data leakage occurs between OSM feature selection/graph construction and the downstream task labels. The manuscript must explicitly describe the train/test splits and confirm that no OSM attributes used in graph construction overlap with evaluation targets.

    Authors: We agree that explicit confirmation of no data leakage is essential. OSM feature selection relies exclusively on standard, globally available map elements (roads, buildings, land use, POIs) chosen without reference to any downstream task labels. All downstream tasks use independent public datasets whose train/test splits are followed exactly as defined by their original sources. We will revise Section 3 to include (i) explicit descriptions of the train/test splits employed for each downstream task and (ii) a clear statement confirming that no OSM attributes were selected or filtered on the basis of evaluation targets. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The provided abstract and description outline a standard contrastive learning pipeline (heterogeneous OSM graph construction, multi-scale graph encoder, spherical-harmonics location encoder, contrastive alignment) evaluated on downstream tasks. No equations, fitted parameters, or self-citations are shown that would reduce any reported performance metric to a quantity defined by the same inputs or by construction. The central claim rests on empirical generalization across domains rather than any self-referential derivation step, rendering the argument self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies insufficient technical detail to enumerate specific free parameters, axioms, or invented entities; the graph construction, encoder architecture, and contrastive objective are described at a high level only.

pith-pipeline@v0.9.1-grok · 5774 in / 1168 out tokens · 17701 ms · 2026-06-27T19:47:27.520312+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 3 canonical work pages

  1. [1]

    General geospatial inference with a population dynamics foundation model.arXiv preprint arXiv:2411.07207, 2024

    Mohit Agarwal, Mimi Sun, Chaitanya Kamath, Arbaaz Muslim, Prithul Sarker, Joy- deep Paul, Hector Yee, Marcin Sieniek, Kim Jablonski, Yael Mayer, et al. General geospatial inference with a population dynamics foundation model.arXiv preprint arXiv:2411.07207, 2024

  2. [2]

    GAIR: Aligning satellite, street view, and location embeddings via contrastive learning.arXiv preprint arXiv:2503.16683, 2025

    GAIR Authors. GAIR: Aligning satellite, street view, and location embeddings via contrastive learning.arXiv preprint arXiv:2503.16683, 2025

  3. [3]

    H3-MOSAIC: Combining OSM semantics and satellite imagery on spatial grids.International Journal of Health Geographics, 2025

    H3-MOSAIC Authors. H3-MOSAIC: Combining OSM semantics and satellite imagery on spatial grids.International Journal of Health Geographics, 2025

  4. [4]

    Geolink: Empowering remote sensing foundation model with open- streetmap data.arXiv preprint arXiv:2509.26016, 2025

    Lubian Bai, Xiuyuan Zhang, Siqi Zhang, Zepeng Zhang, Haoyu Wang, Wei Qin, and Shihong Du. Geolink: Empowering remote sensing foundation model with open- streetmap data.arXiv preprint arXiv:2509.26016, 2025

  5. [5]

    H3: Uber’s hexagonal hierarchical spatial index

    Isaac Brodsky. H3: Uber’s hexagonal hierarchical spatial index. Uber Engineering Blog, 2018. URLhttps://eng.uber.com/h3/. Accessed 2026

  6. [6]

    Alphaearth foundations: An embedding field model for accurate and efficient global mapping from sparse label data.arXiv preprint arXiv:2507.22291, 2025

    Christopher F Brown, Michal R Kazmierski, Valerie J Pasquarella, William J Ruck- lidge, Masha Samsikova, Chenhui Zhang, Evan Shelhamer, Estefania Lahera, Olivia Wiles, Simon Ilyushchenko, et al. Alphaearth foundations: An embedding field model for accurate and efficient global mapping from sparse label data.arXiv preprint arXiv:2507.22291, 2025

  7. [7]

    PLACES: Local data for better health, ZCTA data (GIS-friendly format), 2023 release

    Centers for Disease Control and Prevention. PLACES: Local data for better health, ZCTA data (GIS-friendly format), 2023 release. Data.CDC.gov, 2023. URLhttps://data.cdc.gov/500-Cities-Places/ PLACES-ZCTA-Data-GIS-Friendly-Format-2023-release/c7b2-4ecy/about_ data. Accessed 2026

  8. [8]

    reBEN: Refined BigEarthNet dataset for remote sensing image analysis.arXiv preprint arXiv:2407.03653, 2024

    Kai Norman Clasen, Leonard Hackel, Tom Burgert, Gencer Sumbul, Beg¨ um Demir, and Volker Markl. reBEN: Refined BigEarthNet dataset for remote sensing image analysis.arXiv preprint arXiv:2407.03653, 2024

  9. [9]

    A small set of formal topological relationships suitable for end-user interaction

    Eliseo Clementini, Paolino Di Felice, and Peter Van Oosterom. A small set of formal topological relationships suitable for end-user interaction. InInternational symposium on spatial databases, pages 277–295. Springer, 1993

  10. [10]

    A formal approach to imprecise and incomplete geographical objects.Computers, Envi- ronment and Urban Systems, 22(5):395–408, 1998

    Jo˜ ao Paulo de Almeida, Jonathan Raper, Gilberto Camara, and Thomas Cova. A formal approach to imprecise and incomplete geographical objects.Computers, Envi- ronment and Urban Systems, 22(5):395–408, 1998

  11. [11]

    An 22 ecoregion-based approach to protecting half the terrestrial realm.BioScience, 67(6): 534–545, 2017

    Eric Dinerstein, David Olson, Anup Joshi, Carly Vynne, Neil D Burgess, Eric Wikra- manayake, Nathan Hahn, Suzanne Palminteri, Prashant Hedao, Reed Noss, et al. An 22 ecoregion-based approach to protecting half the terrestrial realm.BioScience, 67(6): 534–545, 2017

  12. [12]

    Geovex: Geospatial vectors with hexagonal con- volutional autoencoders

    Daniele Donghi and Anne Morvan. Geovex: Geospatial vectors with hexagonal con- volutional autoencoders. InProceedings of the 6th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, pages 3–13, 2023

  13. [13]

    A global dataset of air temperature derived from satellite remote sensing and weather stations.Scientific Data, 5(1):180246, 2018

    Jake Hooker, Gregory Duveiller, and Alessandro Cescatti. A global dataset of air temperature derived from satellite remote sensing and weather stations.Scientific Data, 5(1):180246, 2018

  14. [14]

    Residual correlation in graph neural network regression

    Junteng Jia and Austin R Benson. Residual correlation in graph neural network regression. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 588–598, 2020

  15. [15]

    Satclip: Global, general-purpose location embeddings with satellite im- agery

    Konstantin Klemmer, Esther Rolf, Caleb Robinson, Lester Mackey, and Marc Rußwurm. Satclip: Global, general-purpose location embeddings with satellite im- agery. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 4347–4355, 2025

  16. [16]

    Mesogeos: A multi-purpose dataset for data-driven wildfire modeling in the mediter- ranean

    Spyros Kondylatos, Ioannis Prapas, Gustau Camps-Valls, and Ioannis Papoutsis. Mesogeos: A multi-purpose dataset for data-driven wildfire modeling in the mediter- ranean. InThirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. URLhttps://openreview.net/forum?id= VH1vxapUTs

  17. [17]

    Highway2vec: Representing OpenStreetMap microregions with respect to their road network characteristics

    Kacper Le´ sniara and Piotr Szyma´ nski. Highway2vec: Representing OpenStreetMap microregions with respect to their road network characteristics. InProceedings of the 5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, pages 18–29, 2022

  18. [18]

    Enriching location representation with detailed semantic information

    Junyuan Liu, Xinglei Wang, Tao Cheng, and Stephen Law. Enriching location representation with detailed semantic information. In12th International Confer- ence on Geographic Information Science (GIScience 2025), volume 352 ofLeib- niz International Proceedings in Informatics (LIPIcs), pages 3:1–3:7, 2025. doi: 10.4230/LIPIcs.GIScience.2025.3

  19. [19]

    Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

  20. [20]

    Presence-only geographical priors for fine-grained image classification

    Oisin Mac Aodha, Elijah Cole, and Pietro Perona. Presence-only geographical priors for fine-grained image classification. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9596–9606, 2019

  21. [21]

    Multi- scale representation learning for spatial feature distributions using grid cells

    Gengchen Mai, Krzysztof Janowicz, Bo Yan, Rui Zhu, Ling Cai, and Ni Lao. Multi- scale representation learning for spatial feature distributions using grid cells. InIn- ternational Conference on Learning Representations, 2020. 23

  22. [22]

    Gengchen Mai, Yao Xuan, Ni Lao, Jinmeng He, Chris Cundy, Weiming Zhao, Song Gao, and Stefano Ermon. Sphere2vec: A general-purpose location representation learning over a spherical surface for large-scale geospatial predictions.ISPRS Journal of Photogrammetry and Remote Sensing, 202:439–462, 2023

  23. [23]

    OpenStreetMap: The free wiki world map

    OpenStreetMap Contributors. OpenStreetMap: The free wiki world map. https://www.openstreetmap.org, 2004

  24. [24]

    Semiparametric maximum likelihood estimates of spatial dependence.Geographical Analysis, 35(1):76–90, 2003

    R Kelley Pace and Ronald P Barry. Semiparametric maximum likelihood estimates of spatial dependence.Geographical Analysis, 35(1):76–90, 2003

  25. [25]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Woon Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InProceedings of the 38th International Conference on Machine Learning, pages 8748–8763. PMLR, 2021

  26. [26]

    Sentence-BERT: Sentence embeddings using Siamese BERT-Networks.arXiv preprint arXiv:1908.10084, 2019

    Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT-Networks.arXiv preprint arXiv:1908.10084, 2019

  27. [27]

    A generalizable and accessible approach to machine learning with global satellite imagery.Nature Communications, 12(1):4392, 2021

    Esther Rolf, Jonathan Proctor, Tamma Carleton, Ian Bolliger, Vaishaal Shankar, Miyabi Ishihara, Benjamin Recht, and Solomon Hsiang. A generalizable and accessible approach to machine learning with global satellite imagery.Nature Communications, 12(1):4392, 2021

  28. [28]

    Geographic location encoding with spherical harmonics and sinusoidal representation networks

    Marc Rußwurm, Konstantin Klemmer, Esther Rolf, Robin Zbinden, and Devis Tuia. Geographic location encoding with spherical harmonics and sinusoidal representation networks. InInternational Conference on Learning Representations, 2024

  29. [29]

    Gt-loc: Unifying when and where in images through a joint embedding space

    David G Shatwell, Ishan Rajendrakumar Dave, Sirnam Swetha, and Mubarak Shah. Gt-loc: Unifying when and where in images through a joint embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1–11, 2025

  30. [30]

    Implicit neural representations with periodic activation functions

    Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. Implicit neural representations with periodic activation functions. InAd- vances in Neural Information Processing Systems, volume 33, pages 7462–7473, 2020

  31. [31]

    Satbird: a dataset for bird species distribu- tion modeling using remote sensing and citizen science data

    M´ elisande Teng, Amna Elmustafa, Benjamin Akera, Yoshua Bengio, Hager Radi, Hugo Larochelle, and David Rolnick. Satbird: a dataset for bird species distribu- tion modeling using remote sensing and citizen science data. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Infor- mation Processing Systems, volu...

  32. [32]

    URLhttps://proceedings.neurips.cc/paper_files/paper/2023/file/ ef7653bbc4655305efb89a32362e332a-Paper-Datasets_and_Benchmarks.pdf. 24

  33. [33]

    The iNaturalist species classifi- cation and detection dataset

    Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The iNaturalist species classifi- cation and detection dataset. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8769–8778, 2018

  34. [34]

    Graph attention networks

    Petar Veliˇ ckovi´ c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li` o, and Yoshua Bengio. Graph attention networks. InInternational Conference on Learn- ing Representations, 2018

  35. [35]

    Order matters: Sequence to sequence for sets

    Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. Order matters: Sequence to sequence for sets. InInternational Conference on Learning Representations, 2016

  36. [36]

    Geoclip: Clip-inspired alignment between locations and images for effective worldwide geo- localization.Advances in Neural Information Processing Systems, 36:8690–8701, 2023

    Vicente Vivanco Cepeda, Gaurav Kumar Nayak, and Mubarak Shah. Geoclip: Clip-inspired alignment between locations and images for effective worldwide geo- localization.Advances in Neural Information Processing Systems, 36:8690–8701, 2023

  37. [37]

    Materials Science and Engineering A 930, 148175

    Xinglei Wang, Tao Cheng, Stephen Law, Zichao Zeng, Lu Yin, and Junyuan Liu. Multi-modal contrastive learning of urban space representations from POI data. Computers, Environment and Urban Systems, 118:102299, 2025. doi: 10.1016/j. compenvurbsys.2025.102299

  38. [38]

    Stewart, Thomas Dujardin, Niko- laos Ioannis Bountos, Angelos Zavras, Franziska Gerken, Ioannis Papoutsis, Laura Leal-Taix´ e, and Xiao Xiang Zhu

    Yi Wang, Zhitong Xiong, Chenying Liu, Adam J. Stewart, Thomas Dujardin, Niko- laos Ioannis Bountos, Angelos Zavras, Franziska Gerken, Ioannis Papoutsis, Laura Leal-Taix´ e, and Xiao Xiang Zhu. Towards a unified copernicus foundation model for earth vision, 2025. URLhttps://arxiv.org/abs/2503.11849

  39. [39]

    MoRA: Mobility as the backbone for geospatial representation learning at scale

    Ya Wen, Jixuan Cai, Qiyao Ma, Linyan Li, Xinhua Chen, Chris Webster, and Yulun Zhou. MoRA: Mobility as the backbone for geospatial representation learning at scale. arXiv preprint arXiv:2506.01297, 2025

  40. [40]

    Hex2vec: Context-aware embedding H3 hexagons with OpenStreetMap tags

    Szymon Wo´ zniak and Piotr Szyma´ nski. Hex2vec: Context-aware embedding H3 hexagons with OpenStreetMap tags. InProceedings of the 4th ACM SIGSPATIAL In- ternational Workshop on AI for Geographic Knowledge Discovery, pages 61–71, 2021

  41. [41]

    Urbanclip: Learning text-enhanced urban region profiling with contrastive language-image pretraining from the web

    Yibo Yan, Haomin Wen, Siru Zhong, Wei Chen, Haodong Chen, Qingsong Wen, Roger Zimmermann, and Yuxuan Liang. Urbanclip: Learning text-enhanced urban region profiling with contrastive language-image pretraining from the web. InProceedings of the ACM Web Conference 2024, WWW ’24, page 4006–4017, New York, NY, USA,

  42. [42]

    Proceedings of the ACM Web Conference 2024 , series =

    Association for Computing Machinery. ISBN 9798400701719. doi: 10.1145/ 3589334.3645378. URLhttps://doi.org/10.1145/3589334.3645378. 25 A Appendix A.1 Evaluation Protocol Details A.1.1 Dataset Overview Unless otherwise specified, we use official benchmark splits and preprocessing protocols. For California Housing we use the standardscikit-learnimplementati...