pith. sign in

arxiv: 2605.20921 · v1 · pith:WMFCXPUWnew · submitted 2026-05-20 · 💻 cs.CE

Distance between Road Networks: A Macroscopic Method for Road Network Datasets Comparison Using Traffic-weighted Geographic Distribution

Pith reviewed 2026-05-21 02:13 UTC · model grok-4.3

classification 💻 cs.CE
keywords road network comparisontraffic assignmentWasserstein distancegeographic distributiondataset evaluationtransportation networks
0
0 comments X

The pith

Road network datasets can be compared quantitatively by assigning hypothetical traffic and measuring Wasserstein distance between flow distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a method to compare different road network datasets for the same region by running static traffic assignment with a hypothetical demand on each one. This step creates a traffic-weighted geographic distribution that shows where vehicles would travel across the network. The Wasserstein distance is then used to quantify the difference between these distributions on a two-dimensional plane. A reader would care if the approach works because it offers a way to select the most suitable dataset for transportation studies instead of relying only on topology checks or visual inspection. The case study applies the method to datasets from different sources and to simplified versions to show its potential.

Core claim

The central claim is that performing static traffic assignment with a hypothetical demand matrix on each road network dataset, followed by computing the Wasserstein distance between the resulting traffic-weighted geographic distributions, yields a quantitative dissimilarity measure that accounts for transportation flows rather than topology alone.

What carries the argument

Static traffic assignment with hypothetical demand to generate traffic-weighted geographic distributions, then compared via Wasserstein distance.

If this is right

  • Road network datasets from different sources can be ranked by how similar their simulated traffic patterns are.
  • The impact of network simplifications on overall traffic distribution can be measured numerically.
  • Analysts gain a concrete numeric criterion for choosing a dataset when traffic flow behavior matters for the intended analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be tested against observed traffic counts from real cities to check whether the hypothetical demand produces useful rankings.
  • If the distance correlates with errors in downstream models, it might serve as a pre-screening tool before running full simulations on large networks.

Load-bearing premise

A single static traffic assignment run with one hypothetical demand matrix produces a representative traffic distribution whose Wasserstein distance meaningfully captures dataset quality differences.

What would settle it

If two road network datasets show a small Wasserstein distance under this method but produce substantially different results when used in an actual traffic simulation or empirical study, the claim that the distance reflects meaningful quality differences would be challenged.

Figures

Figures reproduced from arXiv: 2605.20921 by Hengyi Zhong, Toru Seo.

Figure 1
Figure 1. Figure 1: Common problems in road network datasets [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Effect on traffic states caused by low topological correctness [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of road network dataset processing in the proposed comparison method [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Flowchart of proposed method for comparing datasets [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Factors on evaluating similarity of traffic states distribution between two datasets [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Link 𝑒𝑖 𝑗 in geographic coordinates overall traffic across the road network. Traffic assignment method is applied to obtain possible link traffic volumes on road network datasets. Parameters of supplies and demands is necessary for assigning traffic on road network, and can be derived solely from the road network dataset appropriately. Supplies include traffic characteristic values of links, such capacity … view at source ↗
Figure 8
Figure 8. Figure 8: Discretization of link [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Distance term and weight term for link 𝑒𝑖 𝑗 values arise when heavily used routes are significantly displaced or detoured due to positional or topological inconsistencies. Moreover, the optimal transport formulation not only provides a scalar distance for global (TGW distance), but also describes how traffic mass is transported between locations. This transport plan forms the basis of TG-OTM, which enables… view at source ↗
Figure 10
Figure 10. Figure 10: Comparison for analyzing road networks with network extraction [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: OSM network of study area in southwest Tokyo, Japan [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Zones in the study area 4.3 Study area and parameter settings 4.3.1 Study area and datasets The case study uses road network datasets within a 10 km × 10 km study area in southwest Tokyo, Japan ( [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Networks and TG distributions of different extracted OSM networks [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Relationship between link length reduction rates and TGW distances of extracted OSM networks to [PITH_FULL_IMAGE:figures/full_fig_p017_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: TG-OTM of full-detail and trunk-level OSM networks [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: TG-OTM of full-detail and primary-level OSM networks [PITH_FULL_IMAGE:figures/full_fig_p019_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: TG-OTM of full-detail and secondary-level OSM networks [PITH_FULL_IMAGE:figures/full_fig_p020_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: TG-OTM of full-detail and tertiary-level OSM networks [PITH_FULL_IMAGE:figures/full_fig_p021_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: TG-OTM of full-detail OSM and DRM networks [PITH_FULL_IMAGE:figures/full_fig_p023_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: TG-OTM of secondary-level OSM and DRM networks [PITH_FULL_IMAGE:figures/full_fig_p024_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: TG-OTM of trunk-level OSM and DRM networks [PITH_FULL_IMAGE:figures/full_fig_p025_21.png] view at source ↗
read the original abstract

In transportation network analysis, various types of road network data can be used even when focusing on the same region. Since different road network datasets can make different performance in analyses, it is necessary to compare them and make appropriate selections in a qualitative manner. However, many of the existing methods for comparing road network datasets are limited to specific topological evaluations and do not consider transportation. This study proposes a method for quantitative comparison of different road network datasets with explicit consideration for traffic flows on them. The method first conducts a static traffic assignment with hypothetical demand for each dataset, and then compare the results using Wasserstein distance on two dimensional plane. Case study on different sources of road network datasets and their simplifications suggests the potential use of the proposed method in evaluating and selecting road network datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a macroscopic method to quantitatively compare road network datasets for the same region by first performing static traffic assignment using a single hypothetical demand matrix on each dataset, then computing the Wasserstein distance between the resulting traffic-weighted geographic distributions on the 2D plane. A case study applies the approach to road networks from different sources and to their simplifications, suggesting its utility for dataset evaluation and selection beyond purely topological metrics.

Significance. If the central assumption holds, the method supplies a transportation-aware quantitative metric that incorporates flow patterns rather than topology alone, which could help practitioners select among competing road-network datasets for downstream analyses such as routing or congestion modeling. The use of standard static assignment followed by an optimal-transport distance is a straightforward and reproducible idea that leverages existing tools without introducing new fitted parameters.

major comments (2)
  1. [Case study / Method description] The central claim—that Wasserstein distance on flows from one fixed hypothetical demand reliably distinguishes dataset quality—rests on an untested assumption of representativeness. No sensitivity analysis to demand choice (e.g., uniform vs. population-weighted) or correlation with external metrics such as link-flow RMSE against observed counts is reported in the case-study section.
  2. [Abstract and Method] The manuscript provides no derivation details, error bounds, or validation against ground-truth traffic for the claim that the resulting Wasserstein distance is a valid quality metric. The pipeline description in the abstract and method therefore leaves the transportation relevance of the numerical distances unverified.
minor comments (2)
  1. [Method] Notation for the traffic-weighted distribution and the precise Wasserstein formulation (including any discretization or normalization steps) should be stated explicitly with equations.
  2. [Case study] The case-study figures would benefit from clearer legends indicating which datasets correspond to which curves and from reporting the actual numerical distance values rather than qualitative statements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the valuable comments. We respond to each major comment below and will make revisions to improve the manuscript accordingly.

read point-by-point responses
  1. Referee: [Case study / Method description] The central claim—that Wasserstein distance on flows from one fixed hypothetical demand reliably distinguishes dataset quality—rests on an untested assumption of representativeness. No sensitivity analysis to demand choice (e.g., uniform vs. population-weighted) or correlation with external metrics such as link-flow RMSE against observed counts is reported in the case-study section.

    Authors: We agree that the representativeness of the demand matrix is a key consideration. The current work uses a hypothetical demand to focus on structural differences between networks. We will revise the case study to include sensitivity analysis with alternative demand specifications, such as population-weighted demands. We will also add a discussion on how the proposed metric could be correlated with external validation metrics like RMSE against observed counts where such data is available. revision: yes

  2. Referee: [Abstract and Method] The manuscript provides no derivation details, error bounds, or validation against ground-truth traffic for the claim that the resulting Wasserstein distance is a valid quality metric. The pipeline description in the abstract and method therefore leaves the transportation relevance of the numerical distances unverified.

    Authors: The pipeline relies on standard components from traffic assignment and optimal transport, which are established in the literature. We will expand the method section with more detailed explanations, including the mathematical formulation and a clearer pipeline description to enhance reproducibility. We note that the Wasserstein distance here is used as a comparative tool rather than a predictive model requiring error bounds or direct ground-truth validation. We will clarify this in the revised abstract and method, and acknowledge the lack of such validation as a limitation. revision: partial

Circularity Check

0 steps flagged

No circularity: method applies external traffic assignment and standard Wasserstein distance without self-referential reduction

full rationale

The paper proposes a comparison procedure that runs static traffic assignment (using an external routine) on a hypothetical demand matrix for each road network dataset, then computes the Wasserstein distance between the resulting traffic-weighted geographic distributions. No equations, parameters, or steps are shown that define the final distance in terms of itself or reduce it to a fitted quantity derived from the same inputs. The approach relies on standard, independent components (traffic assignment solvers and the Wasserstein metric) whose definitions and implementations lie outside the paper, so the derivation chain remains self-contained and non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The method depends on the validity of static traffic assignment under hypothetical demand and on the interpretability of Wasserstein distance between the resulting flow distributions; no new entities are postulated.

free parameters (1)
  • hypothetical demand matrix
    A made-up origin-destination demand is required to run the traffic assignment; its specific values are not derived from data in the abstract.
axioms (2)
  • domain assumption Static traffic assignment produces a traffic distribution that is representative of dataset differences.
    Invoked when the method treats the assignment output as the basis for comparison.
  • domain assumption Wasserstein distance on 2D geographic traffic distributions is a meaningful scalar summary of network quality.
    Central modeling choice that converts the assignment results into a single comparison number.

pith-pipeline@v0.9.0 · 5659 in / 1338 out tokens · 30393 ms · 2026-05-21T02:13:27.235190+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    2017 , eprint=

    Towards Principled Methods for Training Generative Adversarial Networks , author=. 2017 , eprint=

  2. [2]

    2019 , volume =

    Foundations and Trends in Machine Learning , title =. 2019 , volume =

  3. [3]

    2015 , eprint=

    A Smoothed Dual Approach for Variational Wasserstein Problems , author=. 2015 , eprint=

  4. [4]

    2022 , issn =

    Data-driven Wasserstein distributionally robust mitigation and recovery against random supply chain disruption , journal =. 2022 , issn =. doi:https://doi.org/10.1016/j.tre.2022.102751 , url =

  5. [5]

    2009 , pdf =

    Ambrosio, Luigi and Gigli, Nicola , url =. 2009 , pdf =

  6. [6]

    , journal=

    Kolouri, Soheil and Park, Se Rim and Thorpe, Matthew and Slepcev, Dejan and Rohde, Gustavo K. , journal=. Optimal Mass Transport: Signal processing and machine-learning applications , year=

  7. [7]

    Journal of Mathematical Imaging and Vision , author =

    Sliced and. Journal of Mathematical Imaging and Vision , author =. 2015 , pages =. doi:10.1007/s10851-014-0506-3 , number =

  8. [8]

    2019 , issn =

    Data-driven Wasserstein distributionally robust optimization for biomass with agricultural waste-to-energy network design under uncertainty , journal =. 2019 , issn =. doi:https://doi.org/10.1016/j.apenergy.2019.113857 , url =

  9. [9]

    Mathematical Programming , author =

    Data-driven distributionally robust optimization using the. Mathematical Programming , author =. 2018 , pages =. doi:10.1007/s10107-017-1172-1 , number =

  10. [10]

    The Earth Mover's Distance as a Metric for Image Retrieval , volume =

    Rubner, Yossi and Tomasi, Carlo and Guibas, Leonidas , year =. The Earth Mover's Distance as a Metric for Image Retrieval , volume =. International Journal of Computer Vision , doi =

  11. [11]

    Wasserstein Regularization of Imaging Problems , journal =

    Rabin, Julien and Peyre, Gabriel , year =. Wasserstein Regularization of Imaging Problems , journal =

  12. [12]

    A testbed for evaluating network construction algorithms from

    Hashemi, Mahdi , doi =. A testbed for evaluating network construction algorithms from. Computers, Environment and Urban Systems , keywords =. 2017 , Bdsk-Url-1 =

  13. [13]

    Automatic

    Huang, Jincai and Deng, Min and Tang, Jianbo and Hu, Shuling and Liu, Huimin and Wariyo, Sembeto and He, Jinqiang , doi =. Automatic. IEEE Access , keywords =. 2018 , Bdsk-Url-1 =

  14. [14]

    Zhou, Baoding and Zheng, Tianjing and Huang, Jincai and Zhang, Yunfei and Tu, Wei and Li, Qingquan and Deng, Min , doi =. A. IEEE Internet of Things Journal , keywords =. 2021 , Bdsk-Url-1 =

  15. [15]

    A novel approach for generating routable road maps from vehicle

    Wang, Jing and Rui, Xiaoping and Song, Xianfeng and Tan, Xiangshuang and Wang, Chaoliang and Raghavan, Venkatesh , doi =. A novel approach for generating routable road maps from vehicle. International Journal of Geographical Information Science , keywords =. 2015 , Bdsk-Url-1 =

  16. [16]

    A novel method for road network mining from floating car data , volume =

    Guo, Yuan and Li, Bijun and Lu, Zhi and Zhou, Jian , doi =. A novel method for road network mining from floating car data , volume =. Geo-spatial Information Science , keywords =

  17. [17]

    Transactions in GIS , volume =

    Girres, Jean-Francois and Touya, Guillaume , title =. Transactions in GIS , volume =. doi:https://doi.org/10.1111/j.1467-9671.2010.01203.x , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-9671.2010.01203.x , year =

  18. [18]

    Environment and Planning B: Planning and Design , volume =

    Mordechai Haklay , title =. Environment and Planning B: Planning and Design , volume =. 2010 , doi =

  19. [19]

    Oort, Pepijn , year =

  20. [20]

    10th International Symposium on Transportation Data and Modelling (ISTDM2023) Ispra, 19-22 June 2023 , eventtitle =

    Generation of Aggregated Road Network by Vehicle Trajectory Data , author =. 10th International Symposium on Transportation Data and Modelling (ISTDM2023) Ispra, 19-22 June 2023 , eventtitle =. 2023 , month =

  21. [21]

    Geography Compass , volume =

    Haklay, Muki and Singleton, Alex and Parker, Chris , title =. Geography Compass , volume =. doi:https://doi.org/10.1111/j.1749-8198.2008.00167.x , url =. https://compass.onlinelibrary.wiley.com/doi/pdf/10.1111/j.1749-8198.2008.00167.x , year =

  22. [22]

    1976 , issn =

    A method to simplify network representation in transportation planning , journal =. 1976 , issn =. doi:https://doi.org/10.1016/0041-1647(76)90073-3 , url =

  23. [23]

    Environment and Planning B: Urban Analytics and City Science , volume =

    Huan Ning and Xinyue Ye and Zhihui Chen and Tao Liu and Tianzhi Cao , title =. Environment and Planning B: Urban Analytics and City Science , volume =. 2022 , doi =

  24. [24]

    Remote Sensing , volume =

    Gao, Lin and Song, Weidong and Dai, Jiguang and Chen, Yang , title =. Remote Sensing , volume =. 2019 , number =

  25. [25]

    2016 , issn =

    A review of road extraction from remote sensing images , journal =. 2016 , issn =. doi:https://doi.org/10.1016/j.jtte.2016.05.005 , url =

  26. [26]

    A Semi-Automatic Method for Road Centerline Extraction From VHR Images , year=

    Miao, Zelang and Wang, Bin and Shi, Wenzhong and Zhang, Hua , journal=. A Semi-Automatic Method for Road Centerline Extraction From VHR Images , year=

  27. [27]

    and Perera, A.S

    Uduwaragoda, E.R.I.A.C.M. and Perera, A.S. and Dias, S.A.D. , booktitle =. Generating lane level road data from vehicle trajectories using. doi:10.1109/ITSC.2013.6728262 , isbn =

  28. [28]

    Theory and Applications of GIS , volume=

    Comparison between OpenStreetMap Roads and Digital Road Map on the Perspectives of Positional Difference and Completeness , author=. Theory and Applications of GIS , volume=. 2019 , doi=

  29. [29]

    2003 , note =

    Automatic extraction of urban road networks from multi-view aerial imagery , journal =. 2003 , note =. doi:https://doi.org/10.1016/S0924-2716(03)00019-4 , url =

  30. [30]

    1975 , issn =

    An efficient approach to solving the road network equilibrium traffic assignment problem , journal =. 1975 , issn =. doi:https://doi.org/10.1016/0041-1647(75)90030-1 , url =

  31. [31]

    Urban Transportation Networks: Equilibrium Analysis With Mathematical Programming Methods , isbn =

    Sheffi, Yossi , year =. Urban Transportation Networks: Equilibrium Analysis With Mathematical Programming Methods , isbn =

  32. [32]

    Wardrop, J. G. Some Theoretical Aspects of Road Traffic Research. Proceedings of the Institution of Civil Engineers. 1952

  33. [33]

    2003 , month =

    Guidebook on traffic demand forecasting: Application of user equilibrium assignment techniques , isbn =. 2003 , month =

  34. [34]

    A general theory of traffic movement , journal =

    Voorhees, Alan Manners , year=. A general theory of traffic movement , journal =

  35. [35]

    On centroid connectors in static traffic assignment: Their effects on flow patterns and how to optimize their selections , journal =

    Zhen. On centroid connectors in static traffic assignment: Their effects on flow patterns and how to optimize their selections , journal =. 2012 , issn =. doi:https://doi.org/10.1016/j.trb.2012.07.006 , url =

  36. [36]

    1964 , publisher=

    Traffic assignment manual for application with a large, high speed computer , author=. 1964 , publisher=

  37. [37]

    Transportation Science , volume =

    Spiess, Heinz , title =. Transportation Science , volume =. 1990 , doi =

  38. [38]

    1976 , issn =

    Link capacity functions: A review , journal =. 1976 , issn =. doi:https://doi.org/10.1016/0041-1647(76)90055-1 , url =

  39. [39]

    2019 , eprint=

    Unbalanced Optimal Transport: Dynamic and Kantorovich Formulation , author=. 2019 , eprint=

  40. [40]

    Scaling Algorithms for Unbalanced Transport Problems , volume =

    Chizat, Lenaic and Peyré, Gabriel and Schmitzer, Bernhard and Vialard, François-Xavier , year =. Scaling Algorithms for Unbalanced Transport Problems , volume =. Mathematics of Computation , doi =

  41. [41]

    Equilibrium

    Eash, R W and Chon, K S and Lee, Y J and Boyce, D E , journal =. Equilibrium

  42. [42]

    Daganzo, Carlos F. , doi =. Network representation, continuum approximations and a solution to the spatial aggregation problem of traffic assignment , volume =. Transportation Research Part B: Methodological , language =