Modeling Vehicle-Type-Specific Pedestrian Crash Avoidance Behavior in Safety-Critical Interactions Using Smooth-Mamba Deep Reinforcement Learning

Di Yang; Hong Yang; Junqing Wang; Kun Xie; Qingwen Pu

arxiv: 2605.28552 · v1 · pith:HQYR6IJYnew · submitted 2026-05-27 · 💻 cs.AI

Modeling Vehicle-Type-Specific Pedestrian Crash Avoidance Behavior in Safety-Critical Interactions Using Smooth-Mamba Deep Reinforcement Learning

Qingwen Pu , Kun Xie , Hong Yang , Di Yang , Junqing Wang This is my paper

Pith reviewed 2026-06-29 12:11 UTC · model grok-4.3

classification 💻 cs.AI

keywords pedestrian behaviorautonomous vehiclesreinforcement learningcrash avoidancemixed trafficvehicle type differencesArgoverse dataset

0 comments

The pith

Pedestrians respond more quickly to automated vehicles than to human-driven vehicles and cross at lower speeds when facing AVs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extracts real safety-critical pedestrian-vehicle encounters from the Argoverse 2 dataset and trains a reinforcement learning model to reproduce how pedestrians avoid crashes. It builds separate policies for encounters with automated vehicles and human-driven vehicles. The resulting model reproduces observed kinematics and shows that pedestrians react faster to AVs, adopt lower crossing speeds with AVs, and produce lower conflict rates in AV interactions. These differences matter because traffic simulations and AV safety systems currently often treat all vehicles the same.

Core claim

The Smooth-Mamba Deep Deterministic Policy Gradient framework learns distinct crash avoidance policies for pedestrians facing AVs versus HDVs. When applied to real interaction data, it reproduces observed behaviors and reveals quicker pedestrian reactions, reduced crossing speeds, and safer outcomes in AV encounters compared to HDV encounters.

What carries the argument

SMamba-DDPG framework, which adds smooth action constraints and Mamba-based temporal modeling to Deep Deterministic Policy Gradient to train separate AV-specific and HDV-specific pedestrian policies.

Load-bearing premise

The safety-critical pedestrian-vehicle interactions pulled from the Argoverse 2 dataset accurately represent real-world differences in crash avoidance behavior between AVs and HDVs.

What would settle it

New field observations or controlled experiments that measure pedestrian reaction times and crossing speeds in matched AV versus HDV encounters and find no consistent difference in speed or timing.

read the original abstract

As automated vehicles (AVs) increasingly share roadways with human-driven vehicles (HDVs), understanding how pedestrians respond to different vehicle types in safety-critical interactions is essential for the safe deployment of automated driving technologies. This study extracts safety-critical pedestrian-vehicle interactions from the Argoverse 2 dataset to capture real-world crash avoidance behaviors in encounters involving AVs and HDVs. To model vehicle-type-specific pedestrian crash avoidance behavior, we develop a Smooth-Mamba Deep Deterministic Policy Gradient framework, termed SMamba-DDPG, which integrates smooth action constraints with efficient temporal representation learning. To quantify pedestrian behavioral differences, the framework trains separate crash avoidance policies for pedestrian interactions with AVs and HDVs. Results show that SMamba-DDPG outperforms baseline reinforcement learning and supervised learning models in reproducing pedestrian crash avoidance behaviors. Reconstructed trajectories demonstrate strong behavioral realism, accurately reproducing crash avoidance kinematics in both AV and HDV scenarios. Reaction time analysis shows that the model captures human-like response delays and reveals that pedestrians respond more quickly to AVs than to HDVs. Counterfactual analysis further indicates that pedestrians adopt lower crossing speeds when interacting with AVs. Large-scale safety analysis of model-generated data revealed that pedestrian-AV interactions consistently yielded lower conflict rates and higher pedestrian yielding rates compared to pedestrian-HDV interactions. The findings highlight the importance of incorporating vehicle-type-specific pedestrian behavioral models for safer automated driving system design and more realistic traffic simulations in mixed-traffic environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies a Mamba-DDPG variant with smoothness constraints to Argoverse 2 pedestrian interactions and reports faster reactions plus lower conflict rates with AVs, but the AV/HDV split is likely confounded by ego-centric collection.

read the letter

The core contribution is training separate SMamba-DDPG policies on safety-critical pedestrian-vehicle pairs extracted from Argoverse 2, then using the resulting trajectories to claim vehicle-type differences in reaction time, crossing speed, and yielding. The architecture combines Mamba state representation with DDPG and explicit smooth action constraints, which is a straightforward domain extension rather than a new method.

It does a clean job of framing the problem around mixed-traffic AV deployment and pulling real trajectory data instead of synthetic scenarios. The counterfactual and large-scale safety analysis steps are reasonable ways to generate the reported behavioral distinctions.

The main weakness is the data labeling. Argoverse 2 is recorded from an instrumented AV, so interactions labeled AV are ego-pedestrian encounters while HDV labels apply to other vehicles; sensor coverage, completeness, and the safety-critical filter itself can differ systematically between the two groups. Nothing in the abstract indicates controls for that confound, which directly threatens the reaction-time and conflict-rate claims. A secondary issue is that the policies are optimized to reconstruct the same dataset behaviors, so the differences are reconstructions rather than out-of-sample predictions.

This work is aimed at people building AV control policies or mixed-traffic simulators who need vehicle-specific pedestrian models. A reader already working in that area could extract useful implementation details from the architecture, but the empirical distinctions require stronger validation before they can be treated as reliable.

I would send it to peer review. The topic is relevant and the modeling approach is concrete enough that referees can check the data pipeline and metrics directly.

Referee Report

3 major / 1 minor

Summary. The manuscript extracts safety-critical pedestrian-vehicle interactions from Argoverse 2, develops a Smooth-Mamba DDPG (SMamba-DDPG) framework to train separate crash-avoidance policies for AV and HDV encounters, and reports that the model outperforms baselines while revealing vehicle-type differences: faster pedestrian reaction times to AVs, lower crossing speeds with AVs, and lower conflict/higher yielding rates in AV interactions.

Significance. If the Argoverse 2 subsets cleanly isolate vehicle-type effects and the learned policies generalize beyond the training distribution, the work could support more accurate mixed-traffic simulations and AV safety evaluations. The combination of Mamba-based temporal modeling with smooth action constraints is a modest technical contribution, but the absence of independent validation data limits broader impact.

major comments (3)

[Abstract, first paragraph; data extraction procedure] Abstract and data-extraction description: the central claim that pedestrians exhibit distinct crash-avoidance kinematics toward AVs versus HDVs rests on the assumption that Argoverse 2 safety-critical subsets labeled 'AV' and 'HDV' reflect genuine vehicle-type differences. Because the dataset is ego-centric, AV-labeled interactions are ego-pedestrian encounters while HDV labels apply to non-ego vehicles; this introduces uncontrolled differences in sensor coverage, trajectory completeness, and selection into the safety-critical filter that are not vehicle-type effects. No controls or sensitivity checks for these confounds are described, rendering the counterfactual speed analysis and large-scale safety findings non-interpretable.
[Results (reaction-time, counterfactual, and safety analyses)] Results and behavioral-analysis sections: the reported differences in reaction time, crossing speed, conflict rate, and yielding rate are outputs of policies trained directly on the same Argoverse 2 subsets used to define the target behaviors. This creates a circularity in which the 'findings' are reconstructions of fitted quantities rather than independent predictions or tests against held-out or external data.
[Abstract; evaluation/results] Abstract and evaluation sections: the claim that SMamba-DDPG 'outperforms baseline reinforcement learning and supervised learning models' is asserted without any reported metrics, baseline specifications, validation splits, or error bars. Because the central contribution is the reproduction of vehicle-type-specific behaviors, the absence of quantitative evidence prevents assessment of whether the performance gain is meaningful or merely post-hoc selection.

minor comments (1)

[Methods] Notation for the smooth action constraint parameters and Mamba state representation hyperparameters should be defined explicitly in the methods section rather than left as free parameters without ranges or sensitivity analysis.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major concern point by point below, clarifying our approach and outlining revisions where appropriate to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract, first paragraph; data extraction procedure] Abstract and data-extraction description: the central claim that pedestrians exhibit distinct crash-avoidance kinematics toward AVs versus HDVs rests on the assumption that Argoverse 2 safety-critical subsets labeled 'AV' and 'HDV' reflect genuine vehicle-type differences. Because the dataset is ego-centric, AV-labeled interactions are ego-pedestrian encounters while HDV labels apply to non-ego vehicles; this introduces uncontrolled differences in sensor coverage, trajectory completeness, and selection into the safety-critical filter that are not vehicle-type effects. No controls or sensitivity checks for these confounds are described, rendering the counterfactual speed analysis and large-scale safety findings non-interpretable.

Authors: We acknowledge that the ego-centric nature of Argoverse 2 introduces potential confounds between AV (ego) and HDV (non-ego) subsets, including differences in sensor coverage and trajectory completeness. In the revised manuscript, we will add a dedicated subsection on the data extraction and labeling procedure, report comparative statistics on interaction distances, trajectory lengths, and observation quality across subsets, and perform sensitivity analyses by subsampling matched pairs to control for these factors. This will allow readers to better assess the robustness of the vehicle-type-specific findings. revision: yes
Referee: [Results (reaction-time, counterfactual, and safety analyses)] Results and behavioral-analysis sections: the reported differences in reaction time, crossing speed, conflict rate, and yielding rate are outputs of policies trained directly on the same Argoverse 2 subsets used to define the target behaviors. This creates a circularity in which the 'findings' are reconstructions of fitted quantities rather than independent predictions or tests against held-out or external data.

Authors: The reported behavioral differences arise from separately trained policies that capture type-specific patterns in the respective data subsets; the large-scale safety analysis then uses the generative capabilities of these policies to produce additional interactions beyond the original observations. To address the circularity concern, the revised manuscript will include explicit held-out validation results, demonstrating that the learned policies generalize to unseen interactions while preserving the observed differences in reaction time, speed, and conflict metrics. revision: partial
Referee: [Abstract; evaluation/results] Abstract and evaluation sections: the claim that SMamba-DDPG 'outperforms baseline reinforcement learning and supervised learning models' is asserted without any reported metrics, baseline specifications, validation splits, or error bars. Because the central contribution is the reproduction of vehicle-type-specific behaviors, the absence of quantitative evidence prevents assessment of whether the performance gain is meaningful or merely post-hoc selection.

Authors: We agree that the abstract and high-level evaluation claims require supporting quantitative details for proper assessment. The revised version will update the abstract to include specific performance metrics (e.g., success rate, mean trajectory deviation), describe the baseline models and their configurations, specify the train/validation/test splits, and report results with standard deviations across multiple random seeds. These details are present in the results section but will be elevated for clarity. revision: yes

Circularity Check

2 steps flagged

Behavioral differences (reaction times, speeds, conflict rates) are reconstructions from policies trained on the same Argoverse 2 subsets used to define AV vs HDV labels

specific steps

fitted input called prediction [Abstract]
"Reaction time analysis shows that the model captures human-like response delays and reveals that pedestrians respond more quickly to AVs than to HDVs. Counterfactual analysis further indicates that pedestrians adopt lower crossing speeds when interacting with AVs. Large-scale safety analysis of model-generated data revealed that pedestrian-AV interactions consistently yielded lower conflict rates and higher pedestrian yielding rates compared to pedestrian-HDV interactions."

The SMamba-DDPG policies are trained directly on the Argoverse 2 safety-critical subsets labeled AV and HDV; the reported differences in reaction time, crossing speed, and conflict/yielding rates are therefore properties recovered from the training partitions and presented as model-derived insights rather than out-of-sample predictions.
fitted input called prediction [Abstract]
"Results show that SMamba-DDPG outperforms baseline reinforcement learning and supervised learning models in reproducing pedestrian crash avoidance behaviors. Reconstructed trajectories demonstrate strong behavioral realism, accurately reproducing crash avoidance kinematics in both AV and HDV scenarios."

Superior reproduction performance is measured against the same dataset partitions on which the policies were trained; the 'reproducing' claim is therefore a statement about in-sample fit rather than generalization to unseen interactions.

full rationale

The paper extracts safety-critical interactions from Argoverse 2, trains separate SMamba-DDPG policies on the AV-labeled and HDV-labeled subsets to reproduce observed behaviors, then reports the learned differences (faster reaction to AVs, lower crossing speeds with AVs, lower conflict rates) as findings. These quantities are outputs of the fitted policies and therefore reconstruct properties already present in the training data partitions rather than constituting independent predictions. No self-citation chain or uniqueness theorem is invoked; the circularity is of the fitted-input-called-prediction type.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim depends on the dataset extraction step faithfully capturing distinct AV/HDV behaviors and on the RL training process not introducing artifacts that create the reported differences; many standard DDPG hyperparameters are implicitly present but unspecified.

free parameters (2)

smooth action constraint parameters
Added to the DDPG framework but values and tuning procedure not stated in abstract.
Mamba state representation hyperparameters
Control temporal learning but not detailed.

axioms (1)

domain assumption Argoverse 2 contains representative safety-critical pedestrian-vehicle encounters that can be cleanly labeled as AV or HDV interactions
Extraction step is the data foundation for all subsequent training and analysis.

pith-pipeline@v0.9.1-grok · 5802 in / 1361 out tokens · 38506 ms · 2026-06-29T12:11:53.202955+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 7 canonical work pages · 3 internal anchors

[1]

post encroachment time

Experience with traffic conflicts in canada with emphasis on “post encroachment time” techniques. International calibration study of traffic conflict techniques. Springer, pp. 75-96. Cui, H., Oca, S.R., Year. The impact of external human-machine interfaces on pedestrian crossing intention. In: Proceedings of the 2025 34th IEEE International Conference on ...

2025
[2]

arXiv preprint arXiv:2006.04218

Deep reinforcement learning for human-like driving policies in collision avoidance tasks of self-driving cars. arXiv preprint arXiv:2006.04218. Furlan, A.D., Kajaks, T., Tiong, M., Lavallière, M., Campos, J.L., Babineau, J., Haghzare, S., Ma, T., Vrkljan, B.,

work page arXiv 2006
[3]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752. Gu, A., Dao, T., Year. Mamba: Linear-time sequence modeling with selective state spaces. In: Proceedings of the First Conference on Language Modeling. Guo, H., Keyvan-Ekbatani, M., Xie, K.,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Continuous control with deep reinforcement learning

Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. Lorenzo, J., Alonso, I.P., Izquierdo, R., Ballardini, A.L., Saz, Á.H., Llorca, D.F., Sotelo, M.Á.,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

arXiv preprint arXiv:2508.06074

Me $^ 3$- bev: Mamba-enhanced deep reinforcement learning for end-to-end autonomous driving with bev-perception. arXiv preprint arXiv:2508.06074. Ma, X., Andréasson, I.,

work page arXiv
[6]

Transportation research record 1965 (1), 130-141

Estimation of driver reaction time from car -following data: Application in evaluation of general motor–type model. Transportation research record 1965 (1), 130-141. Ma, Y., Liu, Q., Fu, J., Liufu, K., Li, Q.,

1965
[7]

Accident Analysis & Prevention 184, 106999

Collision- avoidance lane change control method for enhancing safety for connected vehicle platoon in mixed traffic environment. Accident Analysis & Prevention 184, 106999. 36 36 Mahadevan, K., Sanoubari, E., Somanath, S., Young, J.E., Sharlin, E., Year. Av- pedestrian interaction design using a pedestrian mixed traffic simulator. In: Proceedings of the P...

2019
[8]

IEEE Transactions on Intelligent Vehicles 8 (1), 438-457

Pedestrian behavior in shared spaces with autonomous vehicles: An integrated framework and review. IEEE Transactions on Intelligent Vehicles 8 (1), 438-457. Pu, Q., Xie, K., Guo, H., 2026a. Modeling interactive car -following behaviors of automated and human- driven vehicles in safety-critical events: A multi-agent state-space attention-enhanced framework...

work page arXiv
[9]

Future Transportation 4 (3), 722-745

Interactions and behaviors of pedestrians with autonomous vehicles: A synthesis. Future Transportation 4 (3), 722-745. Shen, X., Year. Comparison of ddpg and td3 algorithms in a walker2d scenario. In: Proceedings of the 2023 International Conference on Data Science, Advanced Algorithm and Intelligent Computing (DAI 2023), pp. 148-155. Song, Z., Ding, H.,

2023
[10]

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Argoverse 2: Next generation datasets for self -driving perception and forecasting. arXiv preprint arXiv:2301.00493. Wu, J., Yang, H., Yang, L., Huang, Y., He, X., Lv, C.,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Mambaquant: Quantizing the mamba family with variance aligned rotation methods

Mambaquant: Quantizing the mamba family with variance aligned rotation methods. arXiv preprint arXiv:2501.13484. Yang, X., Lou, M., Hu, J., Ye, H., Zhu, Z., Shen, H., Xiang, Z., Zhang, B.,

work page arXiv

[1] [1]

post encroachment time

Experience with traffic conflicts in canada with emphasis on “post encroachment time” techniques. International calibration study of traffic conflict techniques. Springer, pp. 75-96. Cui, H., Oca, S.R., Year. The impact of external human-machine interfaces on pedestrian crossing intention. In: Proceedings of the 2025 34th IEEE International Conference on ...

2025

[2] [2]

arXiv preprint arXiv:2006.04218

Deep reinforcement learning for human-like driving policies in collision avoidance tasks of self-driving cars. arXiv preprint arXiv:2006.04218. Furlan, A.D., Kajaks, T., Tiong, M., Lavallière, M., Campos, J.L., Babineau, J., Haghzare, S., Ma, T., Vrkljan, B.,

work page arXiv 2006

[3] [3]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752. Gu, A., Dao, T., Year. Mamba: Linear-time sequence modeling with selective state spaces. In: Proceedings of the First Conference on Language Modeling. Guo, H., Keyvan-Ekbatani, M., Xie, K.,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Continuous control with deep reinforcement learning

Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. Lorenzo, J., Alonso, I.P., Izquierdo, R., Ballardini, A.L., Saz, Á.H., Llorca, D.F., Sotelo, M.Á.,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

arXiv preprint arXiv:2508.06074

Me $^ 3$- bev: Mamba-enhanced deep reinforcement learning for end-to-end autonomous driving with bev-perception. arXiv preprint arXiv:2508.06074. Ma, X., Andréasson, I.,

work page arXiv

[6] [6]

Transportation research record 1965 (1), 130-141

Estimation of driver reaction time from car -following data: Application in evaluation of general motor–type model. Transportation research record 1965 (1), 130-141. Ma, Y., Liu, Q., Fu, J., Liufu, K., Li, Q.,

1965

[7] [7]

Accident Analysis & Prevention 184, 106999

Collision- avoidance lane change control method for enhancing safety for connected vehicle platoon in mixed traffic environment. Accident Analysis & Prevention 184, 106999. 36 36 Mahadevan, K., Sanoubari, E., Somanath, S., Young, J.E., Sharlin, E., Year. Av- pedestrian interaction design using a pedestrian mixed traffic simulator. In: Proceedings of the P...

2019

[8] [8]

IEEE Transactions on Intelligent Vehicles 8 (1), 438-457

Pedestrian behavior in shared spaces with autonomous vehicles: An integrated framework and review. IEEE Transactions on Intelligent Vehicles 8 (1), 438-457. Pu, Q., Xie, K., Guo, H., 2026a. Modeling interactive car -following behaviors of automated and human- driven vehicles in safety-critical events: A multi-agent state-space attention-enhanced framework...

work page arXiv

[9] [9]

Future Transportation 4 (3), 722-745

Interactions and behaviors of pedestrians with autonomous vehicles: A synthesis. Future Transportation 4 (3), 722-745. Shen, X., Year. Comparison of ddpg and td3 algorithms in a walker2d scenario. In: Proceedings of the 2023 International Conference on Data Science, Advanced Algorithm and Intelligent Computing (DAI 2023), pp. 148-155. Song, Z., Ding, H.,

2023

[10] [10]

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Argoverse 2: Next generation datasets for self -driving perception and forecasting. arXiv preprint arXiv:2301.00493. Wu, J., Yang, H., Yang, L., Huang, Y., He, X., Lv, C.,

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Mambaquant: Quantizing the mamba family with variance aligned rotation methods

Mambaquant: Quantizing the mamba family with variance aligned rotation methods. arXiv preprint arXiv:2501.13484. Yang, X., Lou, M., Hu, J., Ye, H., Zhu, Z., Shen, H., Xiang, Z., Zhang, B.,

work page arXiv