Cognitive Accident Prediction in Driving Scenes: A Multimodality Benchmark

Jianru Xue; Jianwu Fang; Kuan Yang; Lei-Lei Li; Tat-Seng Chua; Zhedong Zheng

arxiv: 2212.09381 · v2 · pith:3SCVOJMMnew · submitted 2022-12-19 · 💻 cs.CV · cs.AI

Cognitive Accident Prediction in Driving Scenes: A Multimodality Benchmark

Jianwu Fang , Lei-Lei Li , Kuan Yang , Zhedong Zheng , Jianru Xue , Tat-Seng Chua This is my paper

classification 💻 cs.CV cs.AI

keywords accidentattentionpredictiondescriptiondriverdrivingcontextmodule

0 comments

read the original abstract

Traffic accident prediction in driving videos aims to provide an early warning of the accident occurrence, and supports the decision making of safe driving systems. Previous works usually concentrate on the spatial-temporal correlation of object-level context, while they do not fit the inherent long-tailed data distribution well and are vulnerable to severe environmental change. In this work, we propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training. In particular, the text description provides a dense semantic description guidance for the primary context of the traffic scene, while the driver attention provides a traction to focus on the critical region closely correlating with safe driving. CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module. We leverage the attention mechanism in these modules to explore the core semantic cues for accident prediction. In order to train CAP, we extend an existing self-collected DADA-2000 dataset (with annotated driver attention for each frame) with further factual text descriptions for the visual observations before the accidents. Besides, we construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames (named as CAP-DATA) together with labeled fact-effect-reason-introspection description and temporal accident frame label. Based on extensive experiments, the superiority of CAP is validated compared with state-of-the-art approaches. The code, CAP-DATA, and all results will be released in \url{https://github.com/JWFanggit/LOTVS-CAP}.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

VZCrash: A Large-Scale IMU Dataset of Ego-Vehicle Crashes
cs.CV 2026-06 unverdicted novelty 7.0

Introduces VZCrash, the largest public IMU dataset for ego-vehicle crashes, and shows through benchmarks that larger data scale improves crash detection models especially for real-world deployment.
PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning
cs.CL 2026-06 unverdicted novelty 7.0

PaSBench-Video benchmark shows no tested MLLM exceeds 20% on strict proactive safety metrics, with recall correlated 0.64 to false-positive rate on safe clips.
VAGNet: Vision-based Accident Anticipation with Global Features
cs.CV 2026-04 unverdicted novelty 4.0

VAGNet anticipates accidents in dashcam videos using global features from VideoMAE-V2 combined with transformers and graphs, reporting higher average precision and mean time-to-accident on four benchmarks while runnin...