arxiv: 2604.28065 · v1 · submitted 2026-04-30 · ⚛️ physics.soc-ph · cs.LG

Recognition: unknown

Assessing the Role of Intersection Proximity in Pedestrian Crashes: Insights from Data Mining Approach

Ahmed Hossain , Xiaoduan Sun , Subasish Das

Authors on Pith no claims yet

Pith reviewed 2026-05-07 06:03 UTC · model grok-4.3

classification ⚛️ physics.soc-ph cs.LG

keywords pedestrian crashesnon-intersection locationsintersection proximityassociation rules miningdata miningtraffic safetyLouisiana crashesdistance zones

0 comments

The pith

About half of non-intersection pedestrian crashes occur within 198 feet of an intersection, showing distinct factor combinations in three proximity zones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The study analyzes over 3,000 pedestrian crashes at non-intersection locations in Louisiana from 2017 to 2021. It finds that roughly 50 percent happen within 198 feet of the nearest intersection and divides the crashes into three distance-based zones for deeper analysis. Using association rules mining, it identifies common combinations of factors like road type, time of day, and weather that appear together in each zone. This approach highlights how crash risks and patterns shift as one moves away from intersections, even at locations without traffic signals or stops. The results point to specific safety strategies tailored to each zone rather than one-size-fits-all solutions.

Core claim

By framing non-intersection pedestrian crashes through their distance to the nearest intersection and applying association rules mining to the subdivided data, the analysis reveals that crash involvement patterns vary systematically with proximity, with approximately 50 percent of such crashes occurring within 198 feet and unique rule sets emerging for the near, mid, and far zones.

What carries the argument

Association Rules Mining applied to crash records grouped into three distance zones from intersections (within 150 ft, 151-435 ft, and beyond 435 ft), using support, confidence, lift, and lift increase criterion to extract meaningful patterns.

If this is right

Crash patterns differ enough across the three zones to justify zone-specific safety measures, such as improved lighting or signage closer to intersections versus speed management farther out.
The concentration of crashes near intersections suggests that intersection influence extends beyond the physical crossing area into adjacent road segments.
Data mining techniques like association rules can uncover interactions among multiple variables that traditional statistical methods might miss in traffic safety data.
Recommendations for countermeasures can be more targeted, potentially improving resource allocation for pedestrian safety improvements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Urban planners might consider extending intersection safety features, like crosswalks or signals, into the 150-435 ft range where mid-zone patterns appear.
Future studies could test whether these distance-based patterns hold in other regions or with different data collection methods.
Vehicle automation or advanced driver assistance systems might prioritize alerts based on proximity to intersections even on straight road segments.

Load-bearing premise

The distance thresholds of 150 feet and 435 feet mark real changes in underlying crash causes rather than convenient data splits, and the discovered association rules point to modifiable risk factors instead of coincidental correlations.

What would settle it

Repeating the analysis on an independent dataset from another state or time period that shows no 50 percent clustering within 198 feet or no distinct association rules per zone would undermine the findings.

read the original abstract

Although intersections are the most complex parts of the roadway network, pedestrian crashes at non-intersection locations are disproportionately frequent, highlighting a serious traffic safety concern. This study investigates non-intersection crashes involving pedestrians using a crash database (2017-2021) collected from Louisiana State. As the risk of pedestrian crashes tends to vary with distance from the intersection, the research team utilized a unique framework "distance to intersection" to capture the differences in crash patterns at non-intersection locations. The study identified that around 50% of non-intersection pedestrian crashes occurred within 198 ft. of the intersection. In the next step, the collected 3,135 pedestrian crashes at non-intersection locations during the study period were subdivided into three zones: D1 zone designates crashes occurring within 150 ft. of an intersection (1,277 crashes), D2 zone designates crashes occurring within 151 ft. to 435 ft. of an intersection (1,060 crashes) and D3 zone designates crashes occurring at 435 ft. or higher from an intersection (798 crashes). To explore the complex interaction of multiple factors, an intuitive data mining technique, Association Rules Mining was used. A total of the top 60 interesting association rules (20 for each zone) were identified by the algorithm (based on lift and support measures). In addition, a total of 124 rules were explored based on Lift Increase Criterion (LIC) measure. The findings of this research provide critical insights into pedestrian crash involvement at non-intersection locations and the variation in crash patterns according to the "distance to intersection". Based on the findings, some of the targeted problem-specific countermeasures are also recommended to address the crash patterns at non-intersection locations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Louisiana crash data shows half of non-intersection pedestrian cases cluster within 198 feet of intersections, but the three distance zones used for rule mining lack any justification or robustness test.

read the letter

The main finding here is straightforward: in the 2017-2021 Louisiana records, 50% of the 3,135 non-intersection pedestrian crashes sit within 198 feet of an intersection. The authors then bin the rest into three zones (under 150 ft, 151-435 ft, over 435 ft) and run association rules mining to pull out different factor combinations in each zone, reporting 60 top rules plus 124 more selected by lift increase criterion. The zone counts add up without arithmetic error, which is a basic sign the data handling was clean. That is the concrete new observation the paper contributes: a distance-framed split of real crash records and the resulting rule sets for one state database. Association rules mining itself is not new, but applying it this way to proximity zones produces specific, usable numbers that were not in prior abstracts on the same topic. The work is honest about being exploratory and does not claim causation. The soft spot is the zone boundaries. Nothing in the abstract or methods description explains why 150 ft and 435 ft were chosen or shows what happens to the rules if those cutoffs shift by 50 or 100 feet. Without that check, it is hard to tell whether the reported differences across zones reflect actual changes in crash mechanisms or simply the effects of arbitrary binning. There is also no mention of missing data handling or any cross-validation of the extracted rules. Readers who work with state crash databases and want a template for distance-based association mining will get the most from this. It is not a field-changing result, but it is a usable regional case study. The paper deserves peer review because the data are internally consistent and the approach is transparent enough to evaluate once the zone choices are addressed. I would send it to referees with a request to add justification and sensitivity tests for the distance thresholds.

Referee Report

3 major / 2 minor

Summary. The paper analyzes 3,135 non-intersection pedestrian crashes from the Louisiana database (2017-2021) and reports that approximately 50% occur within 198 ft of an intersection. It partitions the crashes into three distance-based zones (D1: <150 ft, 1,277 crashes; D2: 151-435 ft, 1,060 crashes; D3: >435 ft, 798 crashes) and applies association rule mining to extract the top 60 rules (20 per zone) using lift and support metrics plus 124 additional rules via the Lift Increase Criterion (LIC). The central claim is that crash patterns and contributing factors vary systematically with distance to the nearest intersection, enabling the recommendation of zone-specific countermeasures.

Significance. If the chosen distance partitions reflect genuine changes in exposure or behavior rather than arbitrary binning, the work could usefully extend pedestrian safety research by demonstrating how association rule mining can surface multi-factor interactions that are difficult to capture with standard regression. The large sample size and explicit focus on non-intersection locations are strengths, and the provision of both conventional lift/support rules and LIC-augmented rules offers a transparent exploratory framework. The results remain hypothesis-generating rather than confirmatory, however, because the analysis is purely descriptive and contains no predictive validation or causal identification.

major comments (3)

[Abstract and Methods] Abstract and Methods: The distance thresholds of 150 ft and 435 ft that define zones D1, D2, and D3 are introduced without justification, literature citation, or sensitivity analysis. The abstract notes a 198 ft median for the 50 % cumulative distribution but does not use this value as a cut-point and provides no test of whether the reported differences in mined rules persist under alternative boundaries (e.g., quartiles or 100 ft increments). Because the claim that “crash patterns vary according to distance to intersection” rests directly on these partitions, the absence of robustness checks is load-bearing.
[Results] Results: The top 60 association rules and the 124 LIC-selected rules are presented as evidence of zone-specific patterns, yet the manuscript supplies neither the exact support, confidence, and lift thresholds applied nor any statistical assessment (e.g., permutation tests, hold-out validation, or comparison against null models) to establish that the zone-to-zone differences exceed what would be expected from sampling variation alone.
[Data and Methods] Data and Methods: No information is given on geocoding accuracy for the distance-to-intersection variable, potential under-reporting of pedestrian crashes, or the treatment of missing values in the Louisiana database. These omissions directly affect the reliability of the distance-based partitions and the subsequent rule mining.

minor comments (2)

[Abstract] The abstract refers to an “intuitive data mining technique” without a one-sentence description of association rule mining or the definitions of support, lift, and LIC; a brief methodological gloss would aid readers outside the data-mining community.
[Results] A summary table listing the highest-lift rules for each zone (with their support, confidence, and lift values) would improve readability and allow direct comparison across D1, D2, and D3.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We appreciate the referee's thorough review and valuable suggestions for improving our paper. Below, we provide point-by-point responses to the major comments. We have made revisions to the manuscript to address the concerns regarding justification of methods, data transparency, and rule selection criteria.

read point-by-point responses

Referee: [Abstract and Methods] Abstract and Methods: The distance thresholds of 150 ft and 435 ft that define zones D1, D2, and D3 are introduced without justification, literature citation, or sensitivity analysis. The abstract notes a 198 ft median for the 50 % cumulative distribution but does not use this value as a cut-point and provides no test of whether the reported differences in mined rules persist under alternative boundaries (e.g., quartiles or 100 ft increments). Because the claim that “crash patterns vary according to distance to intersection” rests directly on these partitions, the absence of robustness checks is load-bearing.

Authors: We acknowledge the need for better justification of the distance thresholds. The 150 ft value is drawn from traffic safety guidelines that define the typical influence area of an intersection for pedestrian movements. The 435 ft threshold was determined by examining the cumulative distribution function of distances and selecting a point that balances the number of observations in D2 and D3 while extending beyond the median distance of 198 ft. In the revised manuscript, we have added citations to relevant literature supporting the 150 ft threshold, explained the rationale for 435 ft, and conducted a sensitivity analysis using alternative zoning schemes (e.g., based on quartiles and fixed 100 ft intervals). The sensitivity results, which confirm that the primary zone-specific association rules remain consistent, are now included in the Results section. We have also revised the abstract to mention the data-driven approach to zoning. revision: yes
Referee: [Results] Results: The top 60 association rules and the 124 LIC-selected rules are presented as evidence of zone-specific patterns, yet the manuscript supplies neither the exact support, confidence, and lift thresholds applied nor any statistical assessment (e.g., permutation tests, hold-out validation, or comparison against null models) to establish that the zone-to-zone differences exceed what would be expected from sampling variation alone.

Authors: We agree that specifying the exact thresholds is essential for reproducibility. We have updated the Methods section to clearly state the criteria used: rules were generated with a minimum support of 0.05, confidence of 0.6, and lift greater than 1.2, with the top 20 rules per zone selected by descending lift value. The LIC was applied as described to identify additional rules with increasing lift across zones. As the study employs an exploratory data mining approach rather than confirmatory statistical modeling, we did not include predictive validation or permutation tests in the original submission. However, to address the concern about sampling variation, we have added a paragraph in the Results discussing the potential for chance findings and noting that the distinct patterns observed (e.g., different dominant factors in each zone) suggest systematic differences. We maintain that full causal identification or predictive modeling is beyond the scope of this descriptive analysis. revision: partial
Referee: [Data and Methods] Data and Methods: No information is given on geocoding accuracy for the distance-to-intersection variable, potential under-reporting of pedestrian crashes, or the treatment of missing values in the Louisiana database. These omissions directly affect the reliability of the distance-based partitions and the subsequent rule mining.

Authors: We thank the referee for this important comment on data quality. The revised Data section now includes: (1) details on missing value treatment, where records with missing crash location or key variables were excluded (less than 10% of the dataset), and other missing entries were handled via multiple imputation for continuous variables and mode imputation for categorical ones; (2) a note on geocoding, indicating that the Louisiana database uses GPS and address matching with typical accuracy of 10-30 meters, which we consider sufficient for the distance categories used; and (3) a discussion of under-reporting, recognizing that pedestrian crashes may be under-reported but arguing that this is unlikely to bias the distance-based patterns unless reporting rates vary by proximity to intersections. We have also added these as limitations in the Discussion section. revision: yes

Circularity Check

0 steps flagged

No circularity: purely exploratory data mining with direct extraction from observed records

full rationale

The paper performs association rule mining on 3,135 observed non-intersection pedestrian crashes after binning them into three distance zones. The 50% cumulative distance of 198 ft is computed directly from the data as a descriptive statistic, after which the authors select fixed cut-points (150 ft and 435 ft) to create zones D1–D3 and then mine rules using standard lift/support and LIC metrics. No parameters are fitted to a subset and then used to predict a related quantity; no equations or derivations exist that reduce to the inputs by construction; no self-citations or uniqueness theorems are invoked as load-bearing premises. The reported rules and zone-specific patterns are direct algorithmic outputs from the raw crash database, rendering the analysis self-contained and non-circular.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The work rests on standard assumptions about crash data quality and the utility of association rules for pattern discovery. The primary free parameters are the distance zone cutoffs, which directly shape the reported rules.

free parameters (2)

Distance zone boundaries (150 ft, 435 ft)
These cutoffs define D1, D2, and D3 and determine which crashes fall into each group for rule mining; their selection is not justified in the abstract.
Support, lift, and LIC thresholds for rule selection
Used to identify the top 60 rules and the additional 124; specific values not reported, affecting which patterns are highlighted.

axioms (2)

domain assumption The Louisiana crash database (2017-2021) is complete, accurate, and representative of non-intersection pedestrian incidents.
All counts and rules depend on the database containing all relevant crashes without systematic underreporting or location errors.
domain assumption Association rules mining on categorical crash attributes reliably surfaces meaningful, non-spurious interactions.
The method assumes that high-support, high-lift rules correspond to real-world risk factor combinations rather than artifacts of the data encoding or multiple testing.

pith-pipeline@v0.9.0 · 5622 in / 1757 out tokens · 106311 ms · 2026-05-07T06:03:15.072155+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

walking along roadway

https://doi.org/10.1016/j.aap.2018.10.019 Kemnitzer, C. R., Pope, C. N., Nwosu, A., Zhao, S., Wei, L., & Zhu, M. (2019). An investigation of driver, pedestrian, and environmental characteristics and resulting pedestrian injury. Traffic Injury Prevention , 20(5), 510 –514. https://doi.org/10.1080/15389588.2019.1612886 Kim, K., Brunner, I. M., & Yamashita, ...

work page doi:10.1016/j.aap.2018.10.019 2018
[2]

https://doi.org/10.1016/j.jsr.2010.06.004 Peng, H., Ma, X., & Chen, F. (2020). Examining Injury Severity of Pedestrians in Vehicle – Pedestrian Crashes at Mid -Blocks Using Path Analysis. International Journal of Environmental Research and Public Health , 17(17), 6170. https://doi.org/10.3390/ijerph17176170 Porter, B. E., Neto, I., Balk, I., & Jenkins, J....

work page doi:10.1016/j.jsr.2010.06.004 2010
[3]

A., Sun, X., Das, S., & Khanal, S

https://doi.org/10.1016/j.aap.2015.08.013 Rahman, M. A., Sun, X., Das, S., & Khanal, S. (2021). Exploring the influential factors of roadway departure crashes on rural two -lane highways with logit model and association rules mining. International Journal of Transportation Science and Technology, 10(2), 167–183. https://doi.org/10.1016/j.ijtst.2020.12.003...

work page doi:10.1016/j.aap.2015.08.013 2015
[4]

https://doi.org/10.1080/15389588.2019.1622006 Toran Pour, A., Moridpour, S., Tay, R., & Rajabifard, A. (2018). Influence of pedestrian age and gender on spatial and temporal distribution of pedestrian crashes. Traffic Injury Prevention, 19(1), 81–87. https://doi.org/10.1080/15389588.2017.1341630 Tyrrell, R. A., Wood, J. M., Owens, D. A., Whetsel Borzendow...

work page doi:10.1080/15389588.2019.1622006 2019