Recognition: unknown
Assessing the Role of Intersection Proximity in Pedestrian Crashes: Insights from Data Mining Approach
Pith reviewed 2026-05-07 06:03 UTC · model grok-4.3
The pith
About half of non-intersection pedestrian crashes occur within 198 feet of an intersection, showing distinct factor combinations in three proximity zones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By framing non-intersection pedestrian crashes through their distance to the nearest intersection and applying association rules mining to the subdivided data, the analysis reveals that crash involvement patterns vary systematically with proximity, with approximately 50 percent of such crashes occurring within 198 feet and unique rule sets emerging for the near, mid, and far zones.
What carries the argument
Association Rules Mining applied to crash records grouped into three distance zones from intersections (within 150 ft, 151-435 ft, and beyond 435 ft), using support, confidence, lift, and lift increase criterion to extract meaningful patterns.
If this is right
- Crash patterns differ enough across the three zones to justify zone-specific safety measures, such as improved lighting or signage closer to intersections versus speed management farther out.
- The concentration of crashes near intersections suggests that intersection influence extends beyond the physical crossing area into adjacent road segments.
- Data mining techniques like association rules can uncover interactions among multiple variables that traditional statistical methods might miss in traffic safety data.
- Recommendations for countermeasures can be more targeted, potentially improving resource allocation for pedestrian safety improvements.
Where Pith is reading between the lines
- Urban planners might consider extending intersection safety features, like crosswalks or signals, into the 150-435 ft range where mid-zone patterns appear.
- Future studies could test whether these distance-based patterns hold in other regions or with different data collection methods.
- Vehicle automation or advanced driver assistance systems might prioritize alerts based on proximity to intersections even on straight road segments.
Load-bearing premise
The distance thresholds of 150 feet and 435 feet mark real changes in underlying crash causes rather than convenient data splits, and the discovered association rules point to modifiable risk factors instead of coincidental correlations.
What would settle it
Repeating the analysis on an independent dataset from another state or time period that shows no 50 percent clustering within 198 feet or no distinct association rules per zone would undermine the findings.
read the original abstract
Although intersections are the most complex parts of the roadway network, pedestrian crashes at non-intersection locations are disproportionately frequent, highlighting a serious traffic safety concern. This study investigates non-intersection crashes involving pedestrians using a crash database (2017-2021) collected from Louisiana State. As the risk of pedestrian crashes tends to vary with distance from the intersection, the research team utilized a unique framework "distance to intersection" to capture the differences in crash patterns at non-intersection locations. The study identified that around 50% of non-intersection pedestrian crashes occurred within 198 ft. of the intersection. In the next step, the collected 3,135 pedestrian crashes at non-intersection locations during the study period were subdivided into three zones: D1 zone designates crashes occurring within 150 ft. of an intersection (1,277 crashes), D2 zone designates crashes occurring within 151 ft. to 435 ft. of an intersection (1,060 crashes) and D3 zone designates crashes occurring at 435 ft. or higher from an intersection (798 crashes). To explore the complex interaction of multiple factors, an intuitive data mining technique, Association Rules Mining was used. A total of the top 60 interesting association rules (20 for each zone) were identified by the algorithm (based on lift and support measures). In addition, a total of 124 rules were explored based on Lift Increase Criterion (LIC) measure. The findings of this research provide critical insights into pedestrian crash involvement at non-intersection locations and the variation in crash patterns according to the "distance to intersection". Based on the findings, some of the targeted problem-specific countermeasures are also recommended to address the crash patterns at non-intersection locations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes 3,135 non-intersection pedestrian crashes from the Louisiana database (2017-2021) and reports that approximately 50% occur within 198 ft of an intersection. It partitions the crashes into three distance-based zones (D1: <150 ft, 1,277 crashes; D2: 151-435 ft, 1,060 crashes; D3: >435 ft, 798 crashes) and applies association rule mining to extract the top 60 rules (20 per zone) using lift and support metrics plus 124 additional rules via the Lift Increase Criterion (LIC). The central claim is that crash patterns and contributing factors vary systematically with distance to the nearest intersection, enabling the recommendation of zone-specific countermeasures.
Significance. If the chosen distance partitions reflect genuine changes in exposure or behavior rather than arbitrary binning, the work could usefully extend pedestrian safety research by demonstrating how association rule mining can surface multi-factor interactions that are difficult to capture with standard regression. The large sample size and explicit focus on non-intersection locations are strengths, and the provision of both conventional lift/support rules and LIC-augmented rules offers a transparent exploratory framework. The results remain hypothesis-generating rather than confirmatory, however, because the analysis is purely descriptive and contains no predictive validation or causal identification.
major comments (3)
- [Abstract and Methods] Abstract and Methods: The distance thresholds of 150 ft and 435 ft that define zones D1, D2, and D3 are introduced without justification, literature citation, or sensitivity analysis. The abstract notes a 198 ft median for the 50 % cumulative distribution but does not use this value as a cut-point and provides no test of whether the reported differences in mined rules persist under alternative boundaries (e.g., quartiles or 100 ft increments). Because the claim that “crash patterns vary according to distance to intersection” rests directly on these partitions, the absence of robustness checks is load-bearing.
- [Results] Results: The top 60 association rules and the 124 LIC-selected rules are presented as evidence of zone-specific patterns, yet the manuscript supplies neither the exact support, confidence, and lift thresholds applied nor any statistical assessment (e.g., permutation tests, hold-out validation, or comparison against null models) to establish that the zone-to-zone differences exceed what would be expected from sampling variation alone.
- [Data and Methods] Data and Methods: No information is given on geocoding accuracy for the distance-to-intersection variable, potential under-reporting of pedestrian crashes, or the treatment of missing values in the Louisiana database. These omissions directly affect the reliability of the distance-based partitions and the subsequent rule mining.
minor comments (2)
- [Abstract] The abstract refers to an “intuitive data mining technique” without a one-sentence description of association rule mining or the definitions of support, lift, and LIC; a brief methodological gloss would aid readers outside the data-mining community.
- [Results] A summary table listing the highest-lift rules for each zone (with their support, confidence, and lift values) would improve readability and allow direct comparison across D1, D2, and D3.
Simulated Author's Rebuttal
We appreciate the referee's thorough review and valuable suggestions for improving our paper. Below, we provide point-by-point responses to the major comments. We have made revisions to the manuscript to address the concerns regarding justification of methods, data transparency, and rule selection criteria.
read point-by-point responses
-
Referee: [Abstract and Methods] Abstract and Methods: The distance thresholds of 150 ft and 435 ft that define zones D1, D2, and D3 are introduced without justification, literature citation, or sensitivity analysis. The abstract notes a 198 ft median for the 50 % cumulative distribution but does not use this value as a cut-point and provides no test of whether the reported differences in mined rules persist under alternative boundaries (e.g., quartiles or 100 ft increments). Because the claim that “crash patterns vary according to distance to intersection” rests directly on these partitions, the absence of robustness checks is load-bearing.
Authors: We acknowledge the need for better justification of the distance thresholds. The 150 ft value is drawn from traffic safety guidelines that define the typical influence area of an intersection for pedestrian movements. The 435 ft threshold was determined by examining the cumulative distribution function of distances and selecting a point that balances the number of observations in D2 and D3 while extending beyond the median distance of 198 ft. In the revised manuscript, we have added citations to relevant literature supporting the 150 ft threshold, explained the rationale for 435 ft, and conducted a sensitivity analysis using alternative zoning schemes (e.g., based on quartiles and fixed 100 ft intervals). The sensitivity results, which confirm that the primary zone-specific association rules remain consistent, are now included in the Results section. We have also revised the abstract to mention the data-driven approach to zoning. revision: yes
-
Referee: [Results] Results: The top 60 association rules and the 124 LIC-selected rules are presented as evidence of zone-specific patterns, yet the manuscript supplies neither the exact support, confidence, and lift thresholds applied nor any statistical assessment (e.g., permutation tests, hold-out validation, or comparison against null models) to establish that the zone-to-zone differences exceed what would be expected from sampling variation alone.
Authors: We agree that specifying the exact thresholds is essential for reproducibility. We have updated the Methods section to clearly state the criteria used: rules were generated with a minimum support of 0.05, confidence of 0.6, and lift greater than 1.2, with the top 20 rules per zone selected by descending lift value. The LIC was applied as described to identify additional rules with increasing lift across zones. As the study employs an exploratory data mining approach rather than confirmatory statistical modeling, we did not include predictive validation or permutation tests in the original submission. However, to address the concern about sampling variation, we have added a paragraph in the Results discussing the potential for chance findings and noting that the distinct patterns observed (e.g., different dominant factors in each zone) suggest systematic differences. We maintain that full causal identification or predictive modeling is beyond the scope of this descriptive analysis. revision: partial
-
Referee: [Data and Methods] Data and Methods: No information is given on geocoding accuracy for the distance-to-intersection variable, potential under-reporting of pedestrian crashes, or the treatment of missing values in the Louisiana database. These omissions directly affect the reliability of the distance-based partitions and the subsequent rule mining.
Authors: We thank the referee for this important comment on data quality. The revised Data section now includes: (1) details on missing value treatment, where records with missing crash location or key variables were excluded (less than 10% of the dataset), and other missing entries were handled via multiple imputation for continuous variables and mode imputation for categorical ones; (2) a note on geocoding, indicating that the Louisiana database uses GPS and address matching with typical accuracy of 10-30 meters, which we consider sufficient for the distance categories used; and (3) a discussion of under-reporting, recognizing that pedestrian crashes may be under-reported but arguing that this is unlikely to bias the distance-based patterns unless reporting rates vary by proximity to intersections. We have also added these as limitations in the Discussion section. revision: yes
Circularity Check
No circularity: purely exploratory data mining with direct extraction from observed records
full rationale
The paper performs association rule mining on 3,135 observed non-intersection pedestrian crashes after binning them into three distance zones. The 50% cumulative distance of 198 ft is computed directly from the data as a descriptive statistic, after which the authors select fixed cut-points (150 ft and 435 ft) to create zones D1–D3 and then mine rules using standard lift/support and LIC metrics. No parameters are fitted to a subset and then used to predict a related quantity; no equations or derivations exist that reduce to the inputs by construction; no self-citations or uniqueness theorems are invoked as load-bearing premises. The reported rules and zone-specific patterns are direct algorithmic outputs from the raw crash database, rendering the analysis self-contained and non-circular.
Axiom & Free-Parameter Ledger
free parameters (2)
- Distance zone boundaries (150 ft, 435 ft)
- Support, lift, and LIC thresholds for rule selection
axioms (2)
- domain assumption The Louisiana crash database (2017-2021) is complete, accurate, and representative of non-intersection pedestrian incidents.
- domain assumption Association rules mining on categorical crash attributes reliably surfaces meaningful, non-spurious interactions.
Reference graph
Works this paper leans on
-
[1]
https://doi.org/10.1016/j.aap.2018.10.019 Kemnitzer, C. R., Pope, C. N., Nwosu, A., Zhao, S., Wei, L., & Zhu, M. (2019). An investigation of driver, pedestrian, and environmental characteristics and resulting pedestrian injury. Traffic Injury Prevention , 20(5), 510 –514. https://doi.org/10.1080/15389588.2019.1612886 Kim, K., Brunner, I. M., & Yamashita, ...
-
[2]
https://doi.org/10.1016/j.jsr.2010.06.004 Peng, H., Ma, X., & Chen, F. (2020). Examining Injury Severity of Pedestrians in Vehicle – Pedestrian Crashes at Mid -Blocks Using Path Analysis. International Journal of Environmental Research and Public Health , 17(17), 6170. https://doi.org/10.3390/ijerph17176170 Porter, B. E., Neto, I., Balk, I., & Jenkins, J....
-
[3]
A., Sun, X., Das, S., & Khanal, S
https://doi.org/10.1016/j.aap.2015.08.013 Rahman, M. A., Sun, X., Das, S., & Khanal, S. (2021). Exploring the influential factors of roadway departure crashes on rural two -lane highways with logit model and association rules mining. International Journal of Transportation Science and Technology, 10(2), 167–183. https://doi.org/10.1016/j.ijtst.2020.12.003...
-
[4]
https://doi.org/10.1080/15389588.2019.1622006 Toran Pour, A., Moridpour, S., Tay, R., & Rajabifard, A. (2018). Influence of pedestrian age and gender on spatial and temporal distribution of pedestrian crashes. Traffic Injury Prevention, 19(1), 81–87. https://doi.org/10.1080/15389588.2017.1341630 Tyrrell, R. A., Wood, J. M., Owens, D. A., Whetsel Borzendow...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.