Recognition: unknown
Learning from the Unseen: Generative Data Augmentation for Geometric-Semantic Accident Anticipation
Pith reviewed 2026-05-09 20:00 UTC · model grok-4.3
The pith
Prompt-guided video synthesis plus a semantic graph network lets models anticipate traffic accidents more accurately and with longer lead times.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a prompt-guided synthesis pipeline can produce driving videos whose feature distributions match real data, and that a graph neural network augmented with semantic cues can then reason dynamically over spatial and semantic relations among road users, jointly overcoming data scarcity and interaction-modeling limits to raise both prediction accuracy and anticipation lead time.
What carries the argument
The dual-path framework: a prompt-guided video synthesis pipeline that derives and reproduces statistical patterns from existing corpora, paired with a semantic-enriched graph neural network that performs dynamic reasoning over spatial positions and semantic attributes of road participants.
If this is right
- Accuracy on accident anticipation tasks rises across multiple existing datasets and the released benchmark.
- Anticipation lead time lengthens, giving autonomous systems more reaction margin.
- A single standardized benchmark now covers varied regions, weather, and traffic densities for future comparisons.
- Reliance on rare real-world crash recordings decreases because synthetic scenes supply the missing volume and variety.
Where Pith is reading between the lines
- If the synthesis pipeline generalizes, the same prompt method could supply training data for other low-frequency events such as near-miss pedestrian crossings or sudden lane changes.
- The semantic-graph reasoning layer might be reusable in other multi-agent settings where both geometry and labels matter, such as warehouse robot coordination.
- Widespread adoption of the released benchmark would allow direct head-to-head tests of future augmentation techniques without dataset mismatch.
Load-bearing premise
The synthetic scenes generated from prompts are close enough in statistical distribution to real driving footage that training on them transfers usefully to real test videos.
What would settle it
Train the same anticipation model once with the synthetic data and once without it, then measure accuracy and lead time on the new benchmark; if performance is identical or lower with the synthetic data, the augmentation claim fails.
read the original abstract
Anticipating traffic accidents is a critical yet unresolved problem for autonomous driving, hindered by the inherent complexity of modeling interactions between road users and the limited availability of diverse, large-scale datasets. To address these issues, we propose a dual-path framework. On the one hand, we employ a video synthesis pipeline that, guided by structured prompts, derives feature distributions from existing corpora and produces high-fidelity synthetic driving scenes consistent with the statistical patterns of real data. On the other hand, we design a graph neural network enriched with semantic cues, enabling dynamic reasoning over both spatial and semantic relations among participants. To validate the effectiveness of our approach, we release a new benchmark dataset containing standardized, finely annotated video sequences that cover a broad spectrum of regions, weather, and traffic conditions. Evaluations across existing datasets and our new benchmark confirm notable gains in both accuracy and anticipation lead time, highlighting the capacity of the proposed framework to mitigate current data bottlenecks and enhance the reliability of autonomous driving systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a dual-path framework for accident anticipation in autonomous driving. One path is a prompt-guided video synthesis pipeline that derives feature distributions from existing corpora to generate high-fidelity synthetic driving scenes. The second path is a semantic-enriched graph neural network that performs dynamic reasoning over spatial and semantic relations among road users. The authors release a new benchmark dataset of annotated video sequences spanning diverse regions, weather, and traffic conditions, and report notable gains in accuracy and anticipation lead time on both existing datasets and the new benchmark.
Significance. If the central claims hold, the work would address a key data bottleneck in accident anticipation by demonstrating that generative augmentation can produce usable synthetic scenes and that semantic GNNs improve dynamic reasoning. The release of a standardized, multi-condition benchmark would be a concrete community contribution, potentially enabling more reproducible progress on this safety-critical task.
major comments (2)
- Abstract: The central claim of 'notable gains in both accuracy and anticipation lead time' is asserted without any quantitative metrics, baseline comparisons, ablation results, or statistical significance tests. This absence prevents assessment of whether the reported improvements are load-bearing for the dual-path framework or could be explained by other factors.
- Abstract / Methods: The assertion that synthetic scenes are 'consistent with the statistical patterns of real data' is load-bearing for the generative augmentation path, yet the manuscript provides no validation protocol (e.g., distribution divergence metrics, FID scores, or cross-dataset feature alignment results) to substantiate fidelity.
minor comments (1)
- The description of the semantic-enriched GNN would benefit from explicit notation for how semantic cues are injected into the graph edges or node features.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive comments on our manuscript. We address each major comment point by point below, clarifying the content of the full paper and indicating the revisions we will make to improve clarity and substantiation of our claims.
read point-by-point responses
-
Referee: Abstract: The central claim of 'notable gains in both accuracy and anticipation lead time' is asserted without any quantitative metrics, baseline comparisons, ablation results, or statistical significance tests. This absence prevents assessment of whether the reported improvements are load-bearing for the dual-path framework or could be explained by other factors.
Authors: We agree that the abstract would benefit from greater specificity to allow immediate assessment of the claims. The full manuscript (Section 4, Experiments) contains the requested elements: quantitative accuracy and lead-time results with baseline comparisons, ablation studies isolating the contributions of the generative augmentation and semantic GNN paths, and statistical significance testing. To address the concern directly, we will revise the abstract to include representative quantitative metrics, explicit baseline comparisons, and a brief reference to the ablation and significance results, while keeping the abstract concise. revision: yes
-
Referee: Abstract / Methods: The assertion that synthetic scenes are 'consistent with the statistical patterns of real data' is load-bearing for the generative augmentation path, yet the manuscript provides no validation protocol (e.g., distribution divergence metrics, FID scores, or cross-dataset feature alignment results) to substantiate fidelity.
Authors: We acknowledge that explicit quantitative validation of synthetic data fidelity strengthens the generative path. The current manuscript describes the prompt-guided synthesis pipeline and its integration but does not report distribution-level metrics such as FID, KL divergence, or cross-dataset alignment. We will add a new validation subsection (likely in Methods or as part of Experiments) that includes these metrics, along with qualitative examples and feature-distribution comparisons, to substantiate the consistency claim. revision: yes
Circularity Check
No significant circularity
full rationale
The paper's central claims rest on a generative video synthesis pipeline and a semantic-enriched GNN, validated through a newly released benchmark dataset plus cross-dataset evaluations on existing corpora. No equations, derivations, or fitted parameters are presented in the provided text that reduce by construction to the inputs; the synthetic data is described as derived from existing feature distributions but evaluated externally rather than asserted as a prediction by definition. Self-citations, if present, are not load-bearing for the core argument, and the release of new annotated data provides independent grounding. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Generative models guided by structured prompts can produce synthetic scenes whose feature distributions match real driving data statistics
- domain assumption Enriching graph neural networks with semantic cues enables dynamic reasoning over spatial and semantic relations among road users
Reference graph
Works this paper leans on
-
[1]
Journal of modern transportation24, 284–303 (2016)
Bagloee, S.A., Tavana, M., Asadi, M., Oliver, T.: Autonomous vehicles: chal- lenges, opportunities, and future implications for transportation policies. Journal of modern transportation24, 284–303 (2016)
2016
-
[2]
Kalra, N., Paddock, S.M.: Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability? Transportation Research Part A: Policy and Practice94, 182–193 (2016) 21
2016
-
[3]
Chain1(1), 46–53 (2024)
Li, Z., Cui, Z., Liao, H., Ash, J., Zhang, G., Xu, C., Wang, Y.: Steering the future: Redefining intelligent transportation systems with foundation models. Chain1(1), 46–53 (2024)
2024
-
[4]
IEEE Transactions on Circuits and Systems for Video Technology (2023)
Fang, J., Qiao, J., Xue, J., Li, Z.: Vision-based traffic accident detection and anticipation: A survey. IEEE Transactions on Circuits and Systems for Video Technology (2023)
2023
-
[5]
Science China Information Sciences64(7), 172203 (2021)
Jiang, Y., Zhang, X., Xu, X., Zhou, X., Dong, Z.: Event-triggered shared lat- eral control for safe-maneuver of intelligent vehicles. Science China Information Sciences64(7), 172203 (2021)
2021
-
[6]
Science China Information Sciences63, 1–16 (2020)
Liu, J., Guo, H., Song, L., Dai, Q., Chen, H.: Driver-automation shared steering control for highly automated vehicles. Science China Information Sciences63, 1–16 (2020)
2020
-
[7]
arXiv preprint arXiv:2405.17705 (2024)
Wang, L., Cheng, K., Lei, S., Wang, S., Yin, W., Lei, C., Long, X., Lu, C.-T.: Dc- gaussian: Improving 3d gaussian splatting for reflective dash cam videos. arXiv preprint arXiv:2405.17705 (2024)
-
[8]
arXiv preprint arXiv:2507.12755 (2025)
Guan, Y., Liao, H., Wang, C., Wang, B., Zhang, J., Hu, J., Li, Z.: Domain- enhanced dual-branch model for efficient and interpretable accident anticipation. arXiv preprint arXiv:2507.12755 (2025)
-
[9]
IEEE Transactions on Image Processing26(12), 6061– 6073 (2017)
Cheong, J.Y., Simon, C., Kim, C.-S., Park, I.K.: Reflection removal under fast forward camera motion. IEEE Transactions on Image Processing26(12), 6061– 6073 (2017)
2017
-
[10]
In: 2010 20th International Conference on Pattern Recognition, pp
Sadeky, S., Al-Hamadiy, A., Michaelisy, B., Sayed, U.: Real-time automatic traffic accident recognition using hfg. In: 2010 20th International Conference on Pattern Recognition, pp. 3348–3351 (2010). IEEE
2010
-
[11]
IEEE Transactions on Intelligent Vehicles9(1), 2249–2261 (2023)
Wang, T., Chen, K., Chen, G., Li, B., Li, Z., Liu, Z., Jiang, C.: Gsc: A graph and spatio-temporal continuity based framework for accident anticipation. IEEE Transactions on Intelligent Vehicles9(1), 2249–2261 (2023)
2023
-
[12]
In: Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part IV 13, pp
Chan, F.-H., Chen, Y.-T., Xiang, Y., Sun, M.: Anticipating accidents in dashcam videos. In: Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part IV 13, pp. 136–153 (2017). Springer
2016
-
[13]
In: ACM Multimedia Conference (2020)
Bao, W., Yu, Q., Kong, Y.: Uncertainty-based traffic accident anticipation with spatio-temporal relational learning. In: ACM Multimedia Conference (2020)
2020
-
[14]
IEEE Transactions on Intelligent Transportation Systems23(7), 9590–9600 (2022) 22
Karim, M.M., Li, Y., Qin, R., Yin, Z.: A dynamic spatial-temporal attention net- work for early anticipation of traffic accidents. IEEE Transactions on Intelligent Transportation Systems23(7), 9590–9600 (2022) 22
2022
-
[15]
In: 2020 25th International Conference on Pattern Recognition (ICPR), pp
Fatima, M., Khan, M.U.K., Kyung, C.-M.: Global feature aggregation for acci- dent anticipation. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 2809–2816 (2021). IEEE
2020
-
[16]
Pattern Recognition147, 110071 (2024)
Song, W., Li, S., Chang, T., Xie, K., Hao, A., Qin, H.: Dynamic attention aug- mented graph network for video accident anticipation. Pattern Recognition147, 110071 (2024)
2024
-
[17]
Computer-Aided Civil and Infrastructure Engineering36(7), 838–857 (2021)
Chen, S., Dong, J., Ha, P., Li, Y., Labi, S.: Graph neural network and rein- forcement learning for multi-agent cooperative control of connected autonomous vehicles. Computer-Aided Civil and Infrastructure Engineering36(7), 838–857 (2021)
2021
-
[18]
Computer-Aided Civil and Infrastructure Engineering35(2), 178–199 (2020)
Ou, J., Xia, J., Wang, Y., Wang, C., Lu, Z.: A data-driven approach to determin- ing freeway incident impact areas with fuzzy and graph theory-based clustering. Computer-Aided Civil and Infrastructure Engineering35(2), 178–199 (2020)
2020
-
[19]
In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp
Thakur, N., Gouripeddi, P., Li, B.: Graph (graph): A nested graph-based frame- work for early accident anticipation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 7533–7541 (2024)
2024
-
[20]
Information Sciences634, 744–760 (2023)
Liu, W., Zhang, T., Lu, Y., Chen, J., Wei, L.: That-net: Two-layer hidden state aggregation based two-stream network for traffic accident prediction. Information Sciences634, 744–760 (2023)
2023
-
[21]
Accident Analysis & Prevention207, 107760 (2024)
Liao, H., Li, Y., Li, Z., Bian, Z., Lee, J., Cui, Z., Zhang, G., Xu, C.: Real-time accident anticipation for autonomous driving through monocular depth-enhanced 3d modeling. Accident Analysis & Prevention207, 107760 (2024)
2024
-
[22]
In: European Conference on Computer Vision, pp
Liang, M., Yang, B., Hu, R., Chen, Y., Liao, R., Feng, S., Urtasun, R.: Learning lane graph representations for motion forecasting. In: European Conference on Computer Vision, pp. 541–556 (2020). Springer
2020
-
[23]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Mohamed, A., Qian, K., Elhoseiny, M., Claudel, C.: Social-stgcnn: A social spatio- temporal graph convolutional neural network for human trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14424–14432 (2020)
2020
-
[24]
In: 2022 IEEE Intelligent Vehicles Symposium (IV), pp
Gesnouin, J., Pechberti, S., Stanciulescu, B., Moutarde, F.: Assessing cross- dataset generalization of pedestrian crossing predictors. In: 2022 IEEE Intelligent Vehicles Symposium (IV), pp. 419–426 (2022). IEEE
2022
-
[25]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
Martin, M., Roitberg, A., Haurilet, M., Horne, M., Reiß, S., Voit, M., Stiefelhagen, R.: Drive&act: A multi-modal dataset for fine-grained driver behavior recogni- tion in autonomous vehicles. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2801–2810 (2019) 23
2019
-
[26]
IEEE transactions on neural networks and learning systems32(1), 4–24 (2020)
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems32(1), 4–24 (2020)
2020
-
[27]
IEEE Transactions on Intelligent Transportation Systems24(5), 4697–4715 (2023)
Xiao, D., Dianati, M., Geiger, W.G., Woodman, R.: Review of graph-based hazardous event detection methods for autonomous driving systems. IEEE Transactions on Intelligent Transportation Systems24(5), 4697–4715 (2023)
2023
-
[28]
IEEE transactions on pattern analysis and machine intelligence46(8), 5625–5644 (2024)
Zhang, J., Huang, J., Jin, S., Lu, S.: Vision-language models for vision tasks: A survey. IEEE transactions on pattern analysis and machine intelligence46(8), 5625–5644 (2024)
2024
-
[29]
In: International Conference on Machine Learning, pp
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J.,et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763 (2021). PmLR
2021
-
[30]
In: 2024 IEEE Inter- national Automated Vehicle Validation Conference (IAVVC), pp
Lohner, A., Compagno, F., Francis, J., Oltramari, A.: Enhancing vision-language models with scene graphs for traffic accident understanding. In: 2024 IEEE Inter- national Automated Vehicle Validation Conference (IAVVC), pp. 1–7 (2024). IEEE
2024
-
[31]
International Journal of Computer Vision132(2), 581–595 (2024)
Gao, P., Geng, S., Zhang, R., Ma, T., Fang, R., Zhang, Y., Li, H., Qiao, Y.: Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision132(2), 581–595 (2024)
2024
-
[32]
arXiv preprint arXiv:2106.11097 (2021)
Fang, H., Xiong, P., Xu, L., Chen, Y.: Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097 (2021)
-
[33]
Advances in neural information processing systems34, 23634–23651 (2021)
Zellers, R., Lu, X., Hessel, J., Yu, Y., Park, J.S., Cao, J., Farhadi, A., Choi, Y.: Merlot: Multimodal neural script knowledge models. Advances in neural information processing systems34, 23634–23651 (2021)
2021
-
[34]
In: European Conference on Computer Vision, pp
Bar-Tal, O., Ofri-Amar, D., Fridman, R., Kasten, Y., Dekel, T.: Text2live: Text- driven layered image and video editing. In: European Conference on Computer Vision, pp. 707–723 (2022). Springer
2022
-
[35]
NPJ Digital Medicine5(1), 122 (2022)
Wendland, P., Birkenbihl, C., Gomez-Freixa, M., Sood, M., Kschischo, M., Fr¨ ohlich, H.: Generation of realistic synthetic data using multimodal neural ordinary differential equations. NPJ Digital Medicine5(1), 122 (2022)
2022
-
[36]
Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., Fan, Y., Ge, W., Han, Y., Huang, F., et al.: Qwen technical report. arXiv preprint arXiv:2309.16609 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[37]
Traffic Safety and Human Behavior, 1039–1083 (2017) 24
Shinar, D.: Accident/crash causation and analysis. Traffic Safety and Human Behavior, 1039–1083 (2017) 24
2017
-
[38]
In: 2011 Second International Conference on Mechanic Automation and Control Engineering, pp
Wang, H., Yu, Y., Yuan, Q.: Application of dijkstra algorithm in robot path- planning. In: 2011 Second International Conference on Mechanic Automation and Control Engineering, pp. 1067–1069 (2011). IEEE
2011
-
[39]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Hu, Y., Yang, J., Chen, L., Li, K., Sima, C., Zhu, X., Chai, S., Du, S., Lin, T., Wang, W.,et al.: Planning-oriented autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17853–17862 (2023)
2023
-
[40]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krish- nan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)
2020
-
[41]
arXiv preprint arXiv:2502.14801 (2025)
Li, C., Zhou, K., Liu, T., Wang, Y., Zhuang, M., Gao, H.-a., Jin, B., Zhao, H.: Avd2: Accident video diffusion for accident video description. arXiv preprint arXiv:2502.14801 (2025)
-
[42]
Proceedings of the Computational Methods in Systems and Software, 102–114 (2020)
Obukhov, A., Krasnyanskiy, M.: Quality assessment method for gan based on modified metrics inception score and fr´ echet inception distance. Proceedings of the Computational Methods in Systems and Software, 102–114 (2020)
2020
-
[43]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Ge, S., Mahapatra, A., Parmar, G., Zhu, J.-Y., Huang, J.-B.: On the content bias in fr´ echet video distance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7277–7288 (2024)
2024
-
[44]
37–38 (2012)
Korhonen, J., You, J.: Peak signal-to-noise ratio revisited: Is simple beautiful? In: 2012 Fourth International Workshop on Quality of Multimedia Experience, pp. 37–38 (2012). IEEE
2012
-
[45]
IEEE Transactions on Image Processing21(4), 1488–1499 (2011)
Brunet, D., Vrscay, E.R., Wang, Z.: On the mathematical properties of the struc- tural similarity index. IEEE Transactions on Image Processing21(4), 1488–1499 (2011)
2011
-
[46]
Videoclip: Contrastive pre-training for zero-shot video-text understanding
Xu, H., Ghosh, G., Huang, P.-Y., Okhonko, D., Aghajanyan, A., Metze, F., Zettle- moyer, L., Feichtenhofer, C.: Videoclip: Contrastive pre-training for zero-shot video-text understanding. arXiv preprint arXiv:2109.14084 (2021)
-
[47]
In: International Conference on Data Intelligence and Cognitive Informatics, pp
Sohan, M., Sai Ram, T., Rami Reddy, C.V.: A review on yolov8 and its advancements. In: International Conference on Data Intelligence and Cognitive Informatics, pp. 529–545 (2024). Springer
2024
-
[48]
In: European Conference on Computer Vision, pp
Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., Wang, X.: Bytetrack: Multi-object tracking by associating every detection box. In: European Conference on Computer Vision, pp. 1–21 (2022). Springer
2022
-
[49]
Very Deep Convolutional Networks for Large-Scale Image Recognition
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale 25 image recognition. arXiv preprint arXiv:1409.1556 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[50]
arXiv preprint arXiv:2302.12288 (2023)
Bhat, S.F., Birkl, R., Wofk, D., Wonka, P., M¨ uller, M.: Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288 (2023)
-
[51]
In: 2023 20th Learning and Technology Conference (L&T), pp
Aftan, S., Shah, H.: A survey on bert and its applications. In: 2023 20th Learning and Technology Conference (L&T), pp. 161–166 (2023). IEEE
2023
-
[52]
In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp
Yao, Y., Xu, M., Wang, Y., Crandall, D.J., Atkins, E.M.: Unsupervised traf- fic accident detection in first-person videos. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 273–280 (2019). IEEE
2019
-
[53]
IEEE transactions on pattern analysis and machine intelligence45(1), 444–459 (2022)
Yao, Y., Wang, X., Xu, M., Pu, Z., Wang, Y., Atkins, E., Crandall, D.J.: Dota: Unsupervised detection of traffic anomaly in driving videos. IEEE transactions on pattern analysis and machine intelligence45(1), 444–459 (2022)
2022
-
[54]
D 2-city: a large-scale dashcam video dataset of diverse traffic scena rios
Che, Z., Li, G., Li, T., Jiang, B., Shi, X., Zhang, X., Lu, Y., Wu, G., Liu, Y., Ye, J.: D 2-city: a large-scale dashcam video dataset of diverse traffic scenarios. arXiv preprint arXiv:1904.01975 (2019)
-
[55]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., Dar- rell, T.: Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2636–2645 (2020)
2020
-
[56]
In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp
Fang, J., Yan, D., Qiao, J., Xue, J., Wang, H., Li, S.: Dada-2000: Can driving acci- dent be predicted by driver attentionƒanalyzed by a benchmark. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 4303–4309 (2019). IEEE
2000
-
[57]
In: 2018 15th IEEE Inter- national Conference on Advanced Video and Signal Based Surveillance (AVSS), pp
Shah, A.P., Lamare, J.-B., Nguyen-Anh, T., Hauptmann, A.: Cadp: A novel dataset for cctv traffic camera based accident analysis. In: 2018 15th IEEE Inter- national Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–9 (2018). IEEE
2018
-
[58]
Advances in neural information processing systems35, 10078–10093 (2022)
Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. Advances in neural information processing systems35, 10078–10093 (2022)
2022
-
[59]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp
Suzuki, T., Kataoka, H., Aoki, Y., Satoh, Y.: Anticipating traffic accidents with adaptive loss and large-scale incident db. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3521–3529 (2018)
2018
-
[60]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Luˇ ci´ c, M., Schmid, C.: Vivit: A video vision transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6836–6846 (2021) 26
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.