Modeling Vehicle-Type-Specific Pedestrian Crash Avoidance Behavior in Safety-Critical Interactions Using Smooth-Mamba Deep Reinforcement Learning
Pith reviewed 2026-06-29 12:11 UTC · model grok-4.3
The pith
Pedestrians respond more quickly to automated vehicles than to human-driven vehicles and cross at lower speeds when facing AVs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Smooth-Mamba Deep Deterministic Policy Gradient framework learns distinct crash avoidance policies for pedestrians facing AVs versus HDVs. When applied to real interaction data, it reproduces observed behaviors and reveals quicker pedestrian reactions, reduced crossing speeds, and safer outcomes in AV encounters compared to HDV encounters.
What carries the argument
SMamba-DDPG framework, which adds smooth action constraints and Mamba-based temporal modeling to Deep Deterministic Policy Gradient to train separate AV-specific and HDV-specific pedestrian policies.
Load-bearing premise
The safety-critical pedestrian-vehicle interactions pulled from the Argoverse 2 dataset accurately represent real-world differences in crash avoidance behavior between AVs and HDVs.
What would settle it
New field observations or controlled experiments that measure pedestrian reaction times and crossing speeds in matched AV versus HDV encounters and find no consistent difference in speed or timing.
read the original abstract
As automated vehicles (AVs) increasingly share roadways with human-driven vehicles (HDVs), understanding how pedestrians respond to different vehicle types in safety-critical interactions is essential for the safe deployment of automated driving technologies. This study extracts safety-critical pedestrian-vehicle interactions from the Argoverse 2 dataset to capture real-world crash avoidance behaviors in encounters involving AVs and HDVs. To model vehicle-type-specific pedestrian crash avoidance behavior, we develop a Smooth-Mamba Deep Deterministic Policy Gradient framework, termed SMamba-DDPG, which integrates smooth action constraints with efficient temporal representation learning. To quantify pedestrian behavioral differences, the framework trains separate crash avoidance policies for pedestrian interactions with AVs and HDVs. Results show that SMamba-DDPG outperforms baseline reinforcement learning and supervised learning models in reproducing pedestrian crash avoidance behaviors. Reconstructed trajectories demonstrate strong behavioral realism, accurately reproducing crash avoidance kinematics in both AV and HDV scenarios. Reaction time analysis shows that the model captures human-like response delays and reveals that pedestrians respond more quickly to AVs than to HDVs. Counterfactual analysis further indicates that pedestrians adopt lower crossing speeds when interacting with AVs. Large-scale safety analysis of model-generated data revealed that pedestrian-AV interactions consistently yielded lower conflict rates and higher pedestrian yielding rates compared to pedestrian-HDV interactions. The findings highlight the importance of incorporating vehicle-type-specific pedestrian behavioral models for safer automated driving system design and more realistic traffic simulations in mixed-traffic environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript extracts safety-critical pedestrian-vehicle interactions from Argoverse 2, develops a Smooth-Mamba DDPG (SMamba-DDPG) framework to train separate crash-avoidance policies for AV and HDV encounters, and reports that the model outperforms baselines while revealing vehicle-type differences: faster pedestrian reaction times to AVs, lower crossing speeds with AVs, and lower conflict/higher yielding rates in AV interactions.
Significance. If the Argoverse 2 subsets cleanly isolate vehicle-type effects and the learned policies generalize beyond the training distribution, the work could support more accurate mixed-traffic simulations and AV safety evaluations. The combination of Mamba-based temporal modeling with smooth action constraints is a modest technical contribution, but the absence of independent validation data limits broader impact.
major comments (3)
- [Abstract, first paragraph; data extraction procedure] Abstract and data-extraction description: the central claim that pedestrians exhibit distinct crash-avoidance kinematics toward AVs versus HDVs rests on the assumption that Argoverse 2 safety-critical subsets labeled 'AV' and 'HDV' reflect genuine vehicle-type differences. Because the dataset is ego-centric, AV-labeled interactions are ego-pedestrian encounters while HDV labels apply to non-ego vehicles; this introduces uncontrolled differences in sensor coverage, trajectory completeness, and selection into the safety-critical filter that are not vehicle-type effects. No controls or sensitivity checks for these confounds are described, rendering the counterfactual speed analysis and large-scale safety findings non-interpretable.
- [Results (reaction-time, counterfactual, and safety analyses)] Results and behavioral-analysis sections: the reported differences in reaction time, crossing speed, conflict rate, and yielding rate are outputs of policies trained directly on the same Argoverse 2 subsets used to define the target behaviors. This creates a circularity in which the 'findings' are reconstructions of fitted quantities rather than independent predictions or tests against held-out or external data.
- [Abstract; evaluation/results] Abstract and evaluation sections: the claim that SMamba-DDPG 'outperforms baseline reinforcement learning and supervised learning models' is asserted without any reported metrics, baseline specifications, validation splits, or error bars. Because the central contribution is the reproduction of vehicle-type-specific behaviors, the absence of quantitative evidence prevents assessment of whether the performance gain is meaningful or merely post-hoc selection.
minor comments (1)
- [Methods] Notation for the smooth action constraint parameters and Mamba state representation hyperparameters should be defined explicitly in the methods section rather than left as free parameters without ranges or sensitivity analysis.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major concern point by point below, clarifying our approach and outlining revisions where appropriate to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract, first paragraph; data extraction procedure] Abstract and data-extraction description: the central claim that pedestrians exhibit distinct crash-avoidance kinematics toward AVs versus HDVs rests on the assumption that Argoverse 2 safety-critical subsets labeled 'AV' and 'HDV' reflect genuine vehicle-type differences. Because the dataset is ego-centric, AV-labeled interactions are ego-pedestrian encounters while HDV labels apply to non-ego vehicles; this introduces uncontrolled differences in sensor coverage, trajectory completeness, and selection into the safety-critical filter that are not vehicle-type effects. No controls or sensitivity checks for these confounds are described, rendering the counterfactual speed analysis and large-scale safety findings non-interpretable.
Authors: We acknowledge that the ego-centric nature of Argoverse 2 introduces potential confounds between AV (ego) and HDV (non-ego) subsets, including differences in sensor coverage and trajectory completeness. In the revised manuscript, we will add a dedicated subsection on the data extraction and labeling procedure, report comparative statistics on interaction distances, trajectory lengths, and observation quality across subsets, and perform sensitivity analyses by subsampling matched pairs to control for these factors. This will allow readers to better assess the robustness of the vehicle-type-specific findings. revision: yes
-
Referee: [Results (reaction-time, counterfactual, and safety analyses)] Results and behavioral-analysis sections: the reported differences in reaction time, crossing speed, conflict rate, and yielding rate are outputs of policies trained directly on the same Argoverse 2 subsets used to define the target behaviors. This creates a circularity in which the 'findings' are reconstructions of fitted quantities rather than independent predictions or tests against held-out or external data.
Authors: The reported behavioral differences arise from separately trained policies that capture type-specific patterns in the respective data subsets; the large-scale safety analysis then uses the generative capabilities of these policies to produce additional interactions beyond the original observations. To address the circularity concern, the revised manuscript will include explicit held-out validation results, demonstrating that the learned policies generalize to unseen interactions while preserving the observed differences in reaction time, speed, and conflict metrics. revision: partial
-
Referee: [Abstract; evaluation/results] Abstract and evaluation sections: the claim that SMamba-DDPG 'outperforms baseline reinforcement learning and supervised learning models' is asserted without any reported metrics, baseline specifications, validation splits, or error bars. Because the central contribution is the reproduction of vehicle-type-specific behaviors, the absence of quantitative evidence prevents assessment of whether the performance gain is meaningful or merely post-hoc selection.
Authors: We agree that the abstract and high-level evaluation claims require supporting quantitative details for proper assessment. The revised version will update the abstract to include specific performance metrics (e.g., success rate, mean trajectory deviation), describe the baseline models and their configurations, specify the train/validation/test splits, and report results with standard deviations across multiple random seeds. These details are present in the results section but will be elevated for clarity. revision: yes
Circularity Check
Behavioral differences (reaction times, speeds, conflict rates) are reconstructions from policies trained on the same Argoverse 2 subsets used to define AV vs HDV labels
specific steps
-
fitted input called prediction
[Abstract]
"Reaction time analysis shows that the model captures human-like response delays and reveals that pedestrians respond more quickly to AVs than to HDVs. Counterfactual analysis further indicates that pedestrians adopt lower crossing speeds when interacting with AVs. Large-scale safety analysis of model-generated data revealed that pedestrian-AV interactions consistently yielded lower conflict rates and higher pedestrian yielding rates compared to pedestrian-HDV interactions."
The SMamba-DDPG policies are trained directly on the Argoverse 2 safety-critical subsets labeled AV and HDV; the reported differences in reaction time, crossing speed, and conflict/yielding rates are therefore properties recovered from the training partitions and presented as model-derived insights rather than out-of-sample predictions.
-
fitted input called prediction
[Abstract]
"Results show that SMamba-DDPG outperforms baseline reinforcement learning and supervised learning models in reproducing pedestrian crash avoidance behaviors. Reconstructed trajectories demonstrate strong behavioral realism, accurately reproducing crash avoidance kinematics in both AV and HDV scenarios."
Superior reproduction performance is measured against the same dataset partitions on which the policies were trained; the 'reproducing' claim is therefore a statement about in-sample fit rather than generalization to unseen interactions.
full rationale
The paper extracts safety-critical interactions from Argoverse 2, trains separate SMamba-DDPG policies on the AV-labeled and HDV-labeled subsets to reproduce observed behaviors, then reports the learned differences (faster reaction to AVs, lower crossing speeds with AVs, lower conflict rates) as findings. These quantities are outputs of the fitted policies and therefore reconstruct properties already present in the training data partitions rather than constituting independent predictions. No self-citation chain or uniqueness theorem is invoked; the circularity is of the fitted-input-called-prediction type.
Axiom & Free-Parameter Ledger
free parameters (2)
- smooth action constraint parameters
- Mamba state representation hyperparameters
axioms (1)
- domain assumption Argoverse 2 contains representative safety-critical pedestrian-vehicle encounters that can be cleanly labeled as AV or HDV interactions
Reference graph
Works this paper leans on
-
[1]
post encroachment time
Experience with traffic conflicts in canada with emphasis on “post encroachment time” techniques. International calibration study of traffic conflict techniques. Springer, pp. 75-96. Cui, H., Oca, S.R., Year. The impact of external human-machine interfaces on pedestrian crossing intention. In: Proceedings of the 2025 34th IEEE International Conference on ...
2025
-
[2]
arXiv preprint arXiv:2006.04218
Deep reinforcement learning for human-like driving policies in collision avoidance tasks of self-driving cars. arXiv preprint arXiv:2006.04218. Furlan, A.D., Kajaks, T., Tiong, M., Lavallière, M., Campos, J.L., Babineau, J., Haghzare, S., Ma, T., Vrkljan, B.,
-
[3]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752. Gu, A., Dao, T., Year. Mamba: Linear-time sequence modeling with selective state spaces. In: Proceedings of the First Conference on Language Modeling. Guo, H., Keyvan-Ekbatani, M., Xie, K.,
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
Continuous control with deep reinforcement learning
Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. Lorenzo, J., Alonso, I.P., Izquierdo, R., Ballardini, A.L., Saz, Á.H., Llorca, D.F., Sotelo, M.Á.,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
arXiv preprint arXiv:2508.06074
Me $^ 3$- bev: Mamba-enhanced deep reinforcement learning for end-to-end autonomous driving with bev-perception. arXiv preprint arXiv:2508.06074. Ma, X., Andréasson, I.,
-
[6]
Transportation research record 1965 (1), 130-141
Estimation of driver reaction time from car -following data: Application in evaluation of general motor–type model. Transportation research record 1965 (1), 130-141. Ma, Y., Liu, Q., Fu, J., Liufu, K., Li, Q.,
1965
-
[7]
Accident Analysis & Prevention 184, 106999
Collision- avoidance lane change control method for enhancing safety for connected vehicle platoon in mixed traffic environment. Accident Analysis & Prevention 184, 106999. 36 36 Mahadevan, K., Sanoubari, E., Somanath, S., Young, J.E., Sharlin, E., Year. Av- pedestrian interaction design using a pedestrian mixed traffic simulator. In: Proceedings of the P...
2019
-
[8]
IEEE Transactions on Intelligent Vehicles 8 (1), 438-457
Pedestrian behavior in shared spaces with autonomous vehicles: An integrated framework and review. IEEE Transactions on Intelligent Vehicles 8 (1), 438-457. Pu, Q., Xie, K., Guo, H., 2026a. Modeling interactive car -following behaviors of automated and human- driven vehicles in safety-critical events: A multi-agent state-space attention-enhanced framework...
-
[9]
Future Transportation 4 (3), 722-745
Interactions and behaviors of pedestrians with autonomous vehicles: A synthesis. Future Transportation 4 (3), 722-745. Shen, X., Year. Comparison of ddpg and td3 algorithms in a walker2d scenario. In: Proceedings of the 2023 International Conference on Data Science, Advanced Algorithm and Intelligent Computing (DAI 2023), pp. 148-155. Song, Z., Ding, H.,
2023
-
[10]
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting
Argoverse 2: Next generation datasets for self -driving perception and forecasting. arXiv preprint arXiv:2301.00493. Wu, J., Yang, H., Yang, L., Huang, Y., He, X., Lv, C.,
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
Mambaquant: Quantizing the mamba family with variance aligned rotation methods
Mambaquant: Quantizing the mamba family with variance aligned rotation methods. arXiv preprint arXiv:2501.13484. Yang, X., Lou, M., Hu, J., Ye, H., Zhu, Z., Shen, H., Xiang, Z., Zhang, B.,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.