pith. sign in

arxiv: 2404.02172 · v1 · submitted 2024-03-31 · ⚛️ physics.soc-ph

Unveiling Urban Mobility Patterns: A Data-Driven Analysis of Public Transit

Pith reviewed 2026-05-24 02:41 UTC · model grok-4.3

classification ⚛️ physics.soc-ph
keywords public transitdata analysisdigital twinsurban mobilitypredictive modelinggeospatial datatemporal patternsdata preprocessing
0
0 comments X

The pith

Analysis of historical public transit data with time and location details prepares digital twins for predictive modeling of urban mobility.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes detailed historical public transit data enriched with temporal and geospatial metadata. It establishes this analysis as a foundation for using machine learning and deep learning to create dynamic digital twins that model mobility systems. A preprocessing framework refines the raw data to enable both historical pattern examination and future predictive work. The examination looks at trends involving time, geospatial elements, external influences, and operational aspects to judge data quality and select key inputs for the twins. If the approach holds, the resulting twins would allow planners to anticipate demand, spot service gaps, and grasp mobility dynamics for more efficient urban transport.

Core claim

The study presents analysis of detailed historical public transit data, enriched with relevant temporal and geospatial metadata, as a precursor to injecting dynamism into digital twins of mobility systems via ML/DL-based predictive modeling. A data preprocessing framework was implemented to refine the raw data for effective historical analysis and predictive modeling. This paper examines public transit data for patterns and trends incorporating factors such as time, geospatial elements, external influences, and operational aspects, helping to assess the quality of the available transit data and identify important information for use in digital twins.

What carries the argument

The data preprocessing framework that refines raw public transit data enriched with temporal and geospatial metadata to support pattern analysis and predictive modeling for digital twins.

If this is right

  • Digital twins can anticipate infrastructure demand based on identified patterns.
  • Service gaps in public transit can be identified through the enriched data analysis.
  • Mobility dynamics become clearer for planning purposes.
  • Educated decisions for efficient and sustainable urban mobility systems are supported.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same preprocessing steps could be applied to data from additional transport modes to build broader city-scale models.
  • Testing the framework on transit data from other cities would check whether the identified patterns generalize.
  • Adding real-time data streams to the historical records might allow the digital twins to update predictions continuously.

Load-bearing premise

The raw transit data contains sufficient quality and coverage, after the described preprocessing, to support effective predictive modeling in digital twins.

What would settle it

Running predictive models on the preprocessed data and finding that they produce inaccurate forecasts of transit demand or patterns would show the data does not support the intended use in digital twins.

Figures

Figures reproduced from arXiv: 2404.02172 by Adil Rasheed, Frank Lindseth, Oluwaleke Yusuf.

Figure 1
Figure 1. Figure 1: Satellite map of the AtB bus lines in the dataset. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Yearly breakdown of passenger volume for the entire dataset from May 2020 to November 2023. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Detailed overview of passenger volume across different bus lines for the entire dataset. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Daily variation of passenger volume across bus lines. [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Weekly variation of passenger volume across bus lines. [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Effect of Norwegian Covid-19 policies on passenger volume from May 2020 to March 2022. [PITH_FULL_IMAGE:figures/full_fig_p005_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Variation of passenger volume with major holidays and local events in 2022 and 2023. [PITH_FULL_IMAGE:figures/full_fig_p005_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of trip counts and passenger volumes handled by [PITH_FULL_IMAGE:figures/full_fig_p006_10.png] view at source ↗
read the original abstract

The expansion of urban centers necessitates enhanced efficiency and sustainability in their transportation infrastructure and mobility systems. The big data obtainable from various transportation modes potentially offers critical insights for urban planning. This study presents analysis of detailed historical public transit data, enriched with relevant temporal and geospatial metadata, as a precursor to injecting dynamism into digital twins of mobility systems via ML/DL-based predictive modeling. A data preprocessing framework was implemented to refine the raw data for effective historical analysis and predictive modeling. This paper examines public transit data for patterns and trends -- incorporating factors such as time, geospatial elements, external influences, and operational aspects. From a technical standpoint, this research helps to assess the quality of the available transit data and identify important information for use in digital twins. Such digital twins foster educated decisions for efficient, sustainable urban mobility systems by anticipating infrastructure demand, identifying service gaps, and understanding mobility dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript describes a data preprocessing framework applied to historical public transit data enriched with temporal and geospatial metadata. It examines patterns and trends (incorporating time, geospatial, external, and operational factors) as a precursor to ML/DL-based predictive modeling in digital twins of mobility systems, with the goal of assessing data quality to support decisions on infrastructure demand, service gaps, and mobility dynamics.

Significance. If the analysis delivered quantitative validation of post-preprocessing data suitability (e.g., completeness and coverage metrics) together with reproducible pattern findings, the work could usefully inform urban mobility planning and digital-twin construction. The descriptive approach on real transit data aligns with needs in physics-of-society and transportation informatics, but the current absence of such metrics limits its contribution.

major comments (2)
  1. [Abstract] Abstract: the central claim that the preprocessing framework refines the raw data 'for effective historical analysis and predictive modeling' and that the study 'helps to assess the quality of the available transit data' is unsupported by any reported quantitative metrics (missing-value rates, fraction of trips with complete geospatial/temporal fields, spatial coverage before/after cleaning, or validation statistics).
  2. [Abstract / Methods (implied)] The manuscript positions the work as a precursor to ML/DL modeling in digital twins, yet no concrete statistics on data completeness or coverage after preprocessing are provided; this directly undermines the weakest assumption that the cleaned data is sufficient for downstream predictive tasks.
minor comments (1)
  1. The abstract and any subsequent sections would benefit from explicit statements of the dataset source, size, and time span to allow readers to gauge scope.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive report. We address the two major comments below, both of which correctly identify the absence of quantitative metrics in the current manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the preprocessing framework refines the raw data 'for effective historical analysis and predictive modeling' and that the study 'helps to assess the quality of the available transit data' is unsupported by any reported quantitative metrics (missing-value rates, fraction of trips with complete geospatial/temporal fields, spatial coverage before/after cleaning, or validation statistics).

    Authors: We agree that the abstract advances claims about data refinement for predictive modeling and quality assessment that are not supported by any quantitative metrics in the manuscript. The work is descriptive and focuses on observed patterns rather than completeness or coverage statistics. We will revise the abstract to remove these unsupported claims and limit the description to the pattern analysis that is actually performed. revision: yes

  2. Referee: [Abstract / Methods (implied)] The manuscript positions the work as a precursor to ML/DL modeling in digital twins, yet no concrete statistics on data completeness or coverage after preprocessing are provided; this directly undermines the weakest assumption that the cleaned data is sufficient for downstream predictive tasks.

    Authors: The referee is correct that no completeness or coverage statistics are supplied, so the claim that the data is suitable for downstream predictive tasks cannot be substantiated from the presented material. We will revise the manuscript to describe the study strictly as an exploratory preprocessing and pattern-analysis step, without implying validated readiness for ML/DL modeling in digital twins. revision: yes

Circularity Check

0 steps flagged

No circularity: descriptive data analysis with no derivations or fits

full rationale

The paper implements a preprocessing framework and examines patterns in external public transit data as a precursor to future ML modeling in digital twins. No equations, parameter fitting, predictions, or first-principles derivations are described that could reduce to inputs by construction. No self-citations or uniqueness claims appear in the provided text. The analysis is self-contained against external benchmarks, consistent with the reader's assessment of score 1.0 and absence of any load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model, fitted parameters, or new entities are introduced in the abstract; the work rests on the domain assumption that historical transit records are suitable for the stated purpose.

pith-pipeline@v0.9.0 · 5683 in / 1019 out tokens · 21983 ms · 2026-05-24T02:41:04.957745+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in ":" * " " * FUNCTION f...

  2. [2]

    L. Liu, H. J. Miller, Measuring risk of missing transfers in public transit systems using high-resolution schedule and real-time bus location data http://journals.sagepub.com/doi/10.1177/0042098020919323, Urban Studies , 58:3140--3156 (2021)

  3. [3]

    Zhang, Y

    T. Zhang, Y. Li, H. Yang, C. Cui, J. Li, Q. Qiao, Identifying primary public transit corridors using multi-source big transit data https://doi.org/10.1080/13658816.2018.1554812, International Journal of Geographical Information Science , 34:1137--1161 (2020)

  4. [4]

    L. Liu, A. Porr, H. J. Miller, Realizable accessibility: Evaluating the reliability of public transit accessibility using high-resolution real-time data https://doi.org/10.1007/s10109-022-00382-w, Journal of Geographical Systems , 25:429--451 (2023)

  5. [5]

    Y. Park, J. Mount, L. Liu, N. Xiao, H. J. Miller, https://doi.org/10.1080/13658816.2019.1608997 Assessing public transit performance using real-time data: Spatiotemporal patterns of bus operation delays in Columbus , Ohio , USA , International Journal of Geographical Information Science , 34:367--392 (2020)

  6. [6]

    Huang, H

    L. Huang, H. Huang, Y. Wang, https://www.mdpi.com/2076-3417/13/15/8835 Resilience Analysis of Traffic Network under Emergencies : A Case Study of Bus Transit Network , Applied Sciences , 13:8835 (2023)

  7. [7]

    X. Li, Y. Gao, H. Zhang, Y. Liao, https://ieeexplore.ieee.org/document/9261392 Passenger Travel Behavior in Public Transport Corridor After the Operation of Urban Rail Transit : A Random Forest Algorithm Approach , IEEE Access , 8:211303--211314 (2020)

  8. [8]

    Cottreau, A

    B. Cottreau, A. Adraoui, O. Manout, L. Bouzouina, https://onlinelibrary.wiley.com/doi/abs/10.1111/rsp3.12718 Spatio-temporal patterns of the impact of COVID-19 on public transit: An exploratory analysis from Lyon , France , Regional Science Policy & Practice , 15:1702--1721 (2023)

  9. [9]

    D. Yao, L. Xu, J. Li, https://www.mdpi.com/2071-1050/11/13/3555 Evaluating the Performance of Public Transit Systems : A Case Study of Eleven Cities in China , Sustainability , 11:3555 (2019)

  10. [10]

    M. Saki, M. Abolhasan, J. Lipman, https://ieeexplore.ieee.org/document/8701707 A Novel Approach for Big Data Classification and Transportation in Rail Networks , IEEE Transactions on Intelligent Transportation Systems , 21:1239--1249 (2020)

  11. [11]

    Liab , https://ntnuopen.ntnu.no/ntnu-xmlui/handle/11250/3092522 Modeling Passenger Count Data Based on Automatic Counting , Master's thesis, NTNU (2023)

    S. Liab , https://ntnuopen.ntnu.no/ntnu-xmlui/handle/11250/3092522 Modeling Passenger Count Data Based on Automatic Counting , Master's thesis, NTNU (2023)