A Modular Zero-Shot Pipeline for Accident Detection, Localization, and Classification in Traffic Surveillance Video

Amey Thakur , Sarvesh Talele

Authors on Pith no claims yet

classification 💻 cs.CV cs.LG

keywords accidentcollisionmodulepipelinevideochallengedetectionembeddings

read the original abstract

We describe a zero-shot pipeline developed for the ACCIDENT @ CVPR 2026 challenge. The challenge requires predicting when, where, and what type of traffic accident occurs in surveillance video, without labeled real-world training data. Our method separates the problem into three independent modules. The first module localizes the collision in time by running peak detection on z-score normalized frame-difference signals. The second module finds the impact location by computing the weighted centroid of cumulative dense optical flow magnitude maps using the Farneback algorithm. The third module classifies collision type by measuring cosine similarity between CLIP image embeddings of frames near the detected peak and text embeddings built from multi-prompt natural language descriptions of each collision category. No domain-specific fine-tuning is involved; the pipeline processes each video using only pre-trained model weights. Our implementation is publicly available as a Kaggle notebook.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Two-Pass Zero-Shot Temporal-Spatial Grounding of Rare Traffic Events in Surveillance Video
cs.CV 2026-05 unverdicted novelty 6.0

A two-pass pipeline with Qwen3-VL-Plus and Gemini 3.1 Flash-Lite achieves 0.539 accuracy on the ACCIDENT@CVPR 2026 benchmark of 2,027 real CCTV videos for zero-shot temporal-spatial grounding of traffic events.