Recognition: unknown
CollideNet: Hierarchical Multi-scale Video Representation Learning with Disentanglement for Time-To-Collision Forecasting
Pith reviewed 2026-05-10 08:32 UTC · model grok-4.3
The pith
CollideNet achieves state-of-the-art time-to-collision forecasting on three public datasets by combining multi-scale spatial aggregation with temporal disentanglement of trend and seasonality in a hierarchical transformer.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our method achieves state-of-the-art performance in comparison to prior works on three commonly used public datasets, setting a new state-of-the-art by a considerable margin.
Load-bearing premise
The assumption that the proposed hierarchical multi-scale spatial aggregation combined with temporal disentanglement of non-stationarity, trend, and seasonality will generalize beyond the three evaluated datasets and produce reliable TTC forecasts in unseen real-world conditions.
Figures
read the original abstract
Time-to-Collision (TTC) forecasting is a critical task in collision prevention, requiring precise temporal prediction and comprehending both local and global patterns encapsulated in a video, both spatially and temporally. To address the multi-scale nature of video, we introduce a novel spatiotemporal hierarchical transformer-based architecture called CollideNet, specifically catered for effective TTC forecasting. In the spatial stream, CollideNet aggregates information for each video frame simultaneously at multiple resolutions. In the temporal stream, along with multi-scale feature encoding, CollideNet also disentangles the non-stationarity, trend, and seasonality components. Our method achieves state-of-the-art performance in comparison to prior works on three commonly used public datasets, setting a new state-of-the-art by a considerable margin. We conduct cross-dataset evaluations to analyze the generalization capabilities of our method, and visualize the effects of disentanglement of the trend and seasonality components of the video data. We release our code at https://github.com/DeSinister/CollideNet/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes CollideNet, a hierarchical multi-scale spatiotemporal transformer architecture for time-to-collision (TTC) forecasting. The spatial stream aggregates multi-resolution information per video frame, while the temporal stream encodes multi-scale features and disentangles non-stationarity, trend, and seasonality components. The central claim is state-of-the-art performance on three public datasets, supported by cross-dataset evaluations, visualizations of the disentanglement, and public code release.
Significance. If the reported margins hold under the full experimental protocol, the work advances video representation learning for safety-critical applications such as collision avoidance. The combination of hierarchical spatial aggregation and explicit temporal disentanglement offers a structured approach to multi-scale spatiotemporal modeling, and the cross-dataset tests plus code release provide concrete support for generalization and reproducibility.
minor comments (2)
- The abstract states results on three public datasets but does not name them; adding the dataset names would improve immediate readability.
- Figure captions describing the disentanglement visualizations should explicitly reference the non-stationarity, trend, and seasonality components shown.
Circularity Check
No significant circularity
full rationale
The paper presents CollideNet as an empirical neural architecture for TTC forecasting, with claims resting on experimental SOTA results across three public datasets plus cross-dataset generalization tests. No equations, derivations, or first-principles predictions appear in the provided text; the architecture (hierarchical spatial aggregation and temporal disentanglement of non-stationarity/trend/seasonality) is described as a design choice motivated by video properties rather than any self-referential definition or fitted input relabeled as output. The central performance margins are therefore not forced by construction and remain independently verifiable via the released code.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
The Statistician25, 308 (1976)
Anderson, O.D., Kendall, M.G.: Time-Series. The Statistician25, 308 (1976)
1976
-
[2]
Anjum, T., Chirade, L., Lin, B., Narayan, A.: Learning Spatio-Temporal Features via3DCNNstoForecastTime-to-Accident.In:InternationalConferenceonAgents and Artificial Intelligence. pp. 532–540 (2023)
2023
-
[3]
In: IEEE International Joint Con- ference on Neural Networks
Anjum, T., Kumar, D., Narayan, A.: Spatio-temporal Analysis of Dashboard Cam- era Videos for Time-To-Accident Forecasting. In: IEEE International Joint Con- ference on Neural Networks. pp. 1–8 (2023)
2023
-
[4]
In: IEEE/CVF International Conference on Computer Vision
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: ViViT: A video vision transformer. In: IEEE/CVF International Conference on Computer Vision. pp. 6836–6846 (2021)
2021
-
[5]
IEEE Access11, 111093–111105 (2023)
Bajgoti, A., Gupta, R., B, P., Dwivedi, R., Siwach, M., Gupta, D.: SwinAnomaly: Real-Time Video Anomaly Detection Using Video Swin Transformer and SORT. IEEE Access11, 111093–111105 (2023)
2023
-
[6]
In: ACM Multimedia Conference (May 2020)
Bao, W., Yu, Q., Kong, Y.: Uncertainty-based Traffic Accident Anticipation with Spatio-Temporal Relational Learning. In: ACM Multimedia Conference (May 2020)
2020
-
[7]
Bertasius, G., Wang, H., Torresani, L.: Is Space-time Attention All You Need for Video Understanding? In: International Conference on Machine Learning (2021)
2021
-
[8]
International Conference on In- telligent Transportation Systems pp
Chakraborty, P., Sharma, A., Hegde, C.: Freeway Traffic Incident Detection from Cameras: A Semi-Supervised Learning Approach. International Conference on In- telligent Transportation Systems pp. 1840–1845 (2018)
2018
-
[9]
In: Asian Conference on Computer Vision
Chan, F.H., Chen, Y.T., Xiang, Y., Sun, M.: Anticipating Accidents in Dashcam Videos. In: Asian Conference on Computer Vision. pp. 136–153 (2016)
2016
-
[10]
In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Desai, N.P., Etemad, A., Greenspan, M.: Cyclecrash: A dataset of bicycle collision videos for collision prediction and analysis. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 6688–6698 (2025)
2025
-
[11]
In: International Conference on Learning Representations (2021)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In: International Conference on Learning Representations (2021)
2021
-
[12]
In: IEEE Inter- national Conference on Acoustics, Speech and Signal Processing
Du, D., Su, B., Wei, Z.: Preformer: Predictive Transformer with Multi-scale Segment-wise Correlations for Long-term Time Series Forecasting. In: IEEE Inter- national Conference on Acoustics, Speech and Signal Processing . pp. 1–5 (2023)
2023
-
[13]
In: IEEE/CVF International Conference on Com- puter Vision
Fan, H., Xiong, B., Mangalam, K., Li, Y., Yan, Z., Malik, J., Feichtenhofer, C.: Multiscale Vision Transformers. In: IEEE/CVF International Conference on Com- puter Vision. pp. 6824–6835 (2021)
2021
-
[14]
In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Feichtenhofer, C.: X3D: Expanding Architectures for Efficient Video Recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
2020
-
[15]
arXiv preprint arXiv:1404.7592 (2014)
Grosek, J., Kutz, J.N.: Dynamic Mode Decomposition for Real-Time Back- ground/Foreground Separation in Video. arXiv preprint arXiv:1404.7592 (2014)
-
[16]
In: IEEE International Conference on Computer Vision Workshops
Hara, K., Kataoka, H., Satoh, Y.: Learning Spatio-temporal Features with 3d Residual Networks for Action Recognition. In: IEEE International Conference on Computer Vision Workshops. pp. 3154–3160 (2017) CollideNet: Hierarchical Multi-scale Video Learning for TTC Forecasting 13
2017
-
[17]
In: IEEE/CVF Conference on Computer Vision and Pattern Recognition
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 15979–15988 (2022)
2022
-
[18]
International Co-operation on Theories and Concepts in Traffic Safety Workshop (1994)
Janssen, W., Thomas, M.: Time-to-Collision and Collision Avoidance Systems. International Co-operation on Theories and Concepts in Traffic Safety Workshop (1994)
1994
-
[19]
The Kinetics Human Action Video Dataset
Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, A., Suleyman, M., Zisserman, A.: The Kinetics Human Action Video Dataset. arXiv preprint arXiv:1705.06950 (2017)
work page internal anchor Pith review arXiv 2017
-
[20]
In: IEEE International Conference on Computer Vision Workshop
Kutz, J.N., Fu, X., Brunton, S.L., Erichson, N.B.: Multi-resolution Dynamic Mode Decomposition for Foreground/Background Separation and Object Tracking. In: IEEE International Conference on Computer Vision Workshop. pp. 921–929 (2015)
2015
-
[21]
Kwiatkowski, D., Phillips, P.C., Schmidt, P., Shin, Y.: Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root: How Sure Are We That Economic Time Series Have a Unit Root? Journal of Econometrics54(1), 159–178 (1992)
1992
-
[22]
In: IEEE/CVF Conference on Computer Vision and Pattern Recognition
Li, Y., Wu, C.Y., Fan, H., Mangalam, K., Xiong, B., Malik, J., Feichtenhofer, C.: mViTv2: Improved Multiscale Vision Transformers for Classification and Detec- tion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4804–4814 (2022)
2022
-
[23]
Liu, Y., Wu, H., Wang, J., Long, M.: Non-stationary transformers: Exploring the StationarityinTimeSeriesForecasting.AdvancesinNeuralInformationProcessing Systems35, 9881–9893 (2022)
2022
-
[24]
In: IEEE/CVF Conference on Computer Vision and Pattern Recognition
Liu, Z., Ning, J., Cao, Y., Wei, Y., Zhang, Z., Lin, S., Hu, H.: Video Swin Trans- former. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3202–3211 (2022)
2022
-
[25]
IEEE International Conference on Acoustics, Speech and Signal Processing pp
Luo, H., Wang, F.: A Simulation-Based Framework for Urban Traffic Accident De- tection. IEEE International Conference on Acoustics, Speech and Signal Processing pp. 1–5 (2023)
2023
-
[26]
IEEE Transactions on Pattern Analysis and Machine Intelli- gence44, 7505–7520 (2021)
Luo, W., Liu, W., Lian, D., Gao, S.: Future Frame Prediction Network for Video Anomaly Detection. IEEE Transactions on Pattern Analysis and Machine Intelli- gence44, 7505–7520 (2021)
2021
-
[27]
In: IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems
Manglik, A., Weng, X., Ohn-Bar, E., Kitanil, K.M.: Forecasting Time-to-Collision from Monocular Video: Feasibility, Dataset, and Challenges. In: IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems. pp. 8081–8088 (2019)
2019
-
[28]
In: Neural Information Processing Systems (1991)
Mozer, M.C.: Induction of Multiscale Temporal Structure. In: Neural Information Processing Systems (1991)
1991
-
[29]
IEEE/CVF Winter Conference on Applications of Computer Vision pp
Nagar, P., Shastry, A., Chaudhari, J., Arora, C.: SEMA: Semantic Attention for Capturing Long-Range Dependencies in Egocentric Lifelogs. IEEE/CVF Winter Conference on Applications of Computer Vision pp. 7010–7020 (2024)
2024
-
[30]
National Highway Traffic Safety Administration (NHTSA): Automated Vehi- cles for Safety (2024), https://www.nhtsa.gov/vehicle-safety/automated-vehicles- safety, Accessed: 2024-12-24
2024
-
[31]
National Transportation Safety Board: Special Investigation Report: Highway Ve- hicleandInfrastructure-basedTechnologyforthePreventionofRear-endCollisions (2001), http://www.ntsb.gov/Publictn/2001/SIR0101.pdf
2001
-
[32]
International Conference on Multimedia Retrieval (2020) 14 N.P
Nguyen, K.T., Dinh, D.T., Do, M.N., Tran, M.T.: Anomaly Detection in Traf- fic Surveillance Videos with GAN-based Future Frame Prediction. International Conference on Multimedia Retrieval (2020) 14 N.P. Desai et al
2020
-
[33]
arXiv preprint arXiv:1905.10437 (2019)
Oreshkin, B.N., Carpov, D., Chapados, N., Bengio, Y.: N-BEATS: Neural Ba- sis Expansion Analysis for Interpretable Time Series Forecasting. arXiv preprint arXiv:1905.10437 (2019)
-
[34]
In: IEEE International Conference on Computer Vision Workshops
Pendergrass, S., Brunton, S.L., Kutz, J.N., Erichson, N.B., Askham, T.: Dynamic Mode Decomposition for Background Modeling. In: IEEE International Conference on Computer Vision Workshops. pp. 1862–1870 (2017)
2017
-
[35]
https://github.com/huggingface/diffusers (2022)
von Platen, P., Patil, S., Lozhkov, A., Cuenca, P., Lambert, N., Rasul, K., Davaadorj, M., Nair, D., Paul, S., Berman, W., Xu, Y., Liu, S., Wolf, T.: Diffusers: State-of-the-art Diffusion Models. https://github.com/huggingface/diffusers (2022)
2022
-
[36]
IEEE Transactions on Neural Networks and Learning Systems (2023)
Qin, H., Zhou, D., Xu, T., Bian, Z., Li, J.: Factorization Vision Transformer: Modeling Long Range Dependency with Local Window Cost. IEEE Transactions on Neural Networks and Learning Systems (2023)
2023
-
[37]
IEEE/CVF Conference on Computer Vision and Pattern Recognition pp
Qiu, Z., Yao, T., Ngo, C.W., Tian, X., Mei, T.: Learning Spatio-Temporal Repre- sentation With Local and Global Diffusion. IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 12048–12057 (2019)
2019
-
[38]
In: International Conference on Machine Learning (2021)
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning Transferable Visual Models from Natural Language Supervision. In: International Conference on Machine Learning (2021)
2021
-
[39]
Journal Of Statistics6, 3–73 (1990)
RB, C.: STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal Of Statistics6, 3–73 (1990)
1990
-
[40]
YOLOv4: Optimal Speed and Accuracy of Object Detection
Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement. arXiv preprint arXiv:2004.10934 (2018)
work page internal anchor Pith review arXiv 2004
-
[41]
In: International Conference on Ma- chine Learning
Ryali, C., Hu, Y.T., Bolya, D., Wei, C., Fan, H., Huang, P.Y., Aggarwal, V., Chowdhury, A., Poursaeed, O., Hoffman, J., et al.: Hiera: A Hierarchical Vision Transformer Without the Bells-and-Whistles. In: International Conference on Ma- chine Learning. pp. 29441–29454 (2023)
2023
-
[42]
Biometrika71(3), 599–607 (1984)
Said, S.E., Dickey, D.A.: Testing for Unit Roots in Autoregressive-Moving Average Models of Unknown Order. Biometrika71(3), 599–607 (1984)
1984
-
[43]
IEEE Transactions on Intelligent Transportation Systems23, 11891–11902 (2021)
Santhosh, K.K., Dogra, D.P., Roy, P.P., Mitra, A.: Vehicular Trajectory Classifica- tion and Traffic Anomaly Detection in Videos Using a Hybrid CNN-VAE Architec- ture. IEEE Transactions on Intelligent Transportation Systems23, 11891–11902 (2021)
2021
-
[44]
In: International Confer- ence on Learning Representations (2023)
Shabani, M.A., Abdi, A.H., Meng, L., Sylvain, T.: Scaleformer: Iterative Multi- scale Refining Transformers for Time Series Forecasting. In: International Confer- ence on Learning Representations (2023)
2023
-
[45]
IEEE Transactions on Intelligent Transportation Systems20, 879–887 (2019)
Singh, D., Mohan, C.K.: Deep Spatio-Temporal Representation for Detection of Road Accidents Using Stacked Autoencoder. IEEE Transactions on Intelligent Transportation Systems20, 879–887 (2019)
2019
-
[46]
International Conference on Intelligent Data Science Technologies and Applications pp
Srinivasan, A., Srikanth, A., Indrajit, H., Narasimhan, V.: A Novel Approach for Road Accident Detection using DETR Algorithm. International Conference on Intelligent Data Science Technologies and Applications pp. 75–80 (2020)
2020
-
[47]
IEEE/CVF Conference on Computer Vision and Pattern Recognition pp
Suzuki, T., Kataoka, H., Aoki, Y., Satoh, Y.: Anticipating Traffic Accidents with Adaptive Loss and Large-Scale Incident DB. IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 3521–3529 (2018)
2018
-
[48]
International Conference on Intelligent Transportation Systems pp
Taccari, L., Sambo, F., Bravi, L., Salti, S., Sarti, L., Simoncini, M., Lori, A.: Clas- sification of Crash and Near-Crash Events from Dashcam Videos and Telemat- ics. International Conference on Intelligent Transportation Systems pp. 2460–2465 (2018) CollideNet: Hierarchical Multi-scale Video Learning for TTC Forecasting 15
2018
-
[49]
The American Statistician72(1), 37–45 (2018)
Taylor, S.J., Letham, B.: Forecasting at Scale. The American Statistician72(1), 37–45 (2018)
2018
-
[50]
Traffic Injury Prevention (March 2021), https://www.iihs.org/topics/bibliography/ref/2211, Insurance Institute for High- way Safety, Highway Loss Data Institute, ID: 2211
Teoh, E.R.: Effectiveness of Front Crash Prevention Systems in Reducing Large Truck Real-World Crash Rates. Traffic Injury Prevention (March 2021), https://www.iihs.org/topics/bibliography/ref/2211, Insurance Institute for High- way Safety, Highway Loss Data Institute, ID: 2211
2021
-
[51]
In: IEEE/CVF Winter Conference on Ap- plications of Computer Vision
Thakur, N., Gouripeddi, P., Li, B.: Graph(Graph): A Nested Graph-Based Frame- work for Early Accident Anticipation. In: IEEE/CVF Winter Conference on Ap- plications of Computer Vision. pp. 7533–7541 (2024)
2024
-
[52]
In: IEEE International Conference on Computer Vision
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning Spatiotempo- ral Features with 3d Convolutional Networks. In: IEEE International Conference on Computer Vision. pp. 4489–4497 (2015)
2015
-
[53]
Vaswani,A.:AttentionisAllYouNeed.AdvancesinNeuralInformationProcessing Systems (2017)
2017
-
[54]
In: Interna- tional Conference on Learning Representations (2023)
Wang, H., Peng, J., Huang, F., Wang, J., Chen, J., Xiao, Y.: MICN: Multi-scale Local and Global Context Modeling for Long-term Series Forecasting. In: Interna- tional Conference on Learning Representations (2023)
2023
-
[55]
arXiv preprint arXiv:2405.14616 , year=
Wang, S., Wu, H., Shi, X.L., Hu, T., Luo, H., Ma, L., Zhang, J.Y., Zhou, J.: TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting. ArXiv abs/2405.14616(2024)
-
[56]
In: Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence (2023)
Wang, T., Kim, S., Ji, W., Xie, E., Ge, C., Chen, J., Li, Z., Luo, P.: DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving. In: Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence (2023)
2023
-
[57]
IEEE Transactions on Neural Networks and Learning Systems33, 2301–2312 (2020)
Wang, X., Che, Z., Yang, K., Jiang, B., Tang, J.B., Ye, J., Wang, J., Qi, Q.: Robust Unsupervised Video Anomaly Detection by Multipath Frame Prediction. IEEE Transactions on Neural Networks and Learning Systems33, 2301–2312 (2020)
2020
-
[58]
IEEE Transactions on Vehicular Technology69, 9497–9508 (2020)
Wang, X., Liu, J., Qiu, T., Mu, C., Chen, C., Zhou, P.: A Real-Time Collision Prediction Mechanism With Deep Learning for Intelligent Transportation System. IEEE Transactions on Vehicular Technology69, 9497–9508 (2020)
2020
-
[59]
In: IEEE/CVF International Conference on Computer Vision (2023)
Wasim, S.T., Khattak, M.U., Naseer, M., Khan, S., Shah, M., Khan, F.S.: Video- FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition. In: IEEE/CVF International Conference on Computer Vision (2023)
2023
-
[60]
IEEE Trans- actions on Circuits and Systems for Video Technology24(12), 2034–2048 (2014)
Wen, J., Xu, Y., Tang, J., Zhan, Y., Lai, Z., Guo, X.: Joint Video Frame Set Division and Low-Rank Decomposition for Background Subtraction. IEEE Trans- actions on Circuits and Systems for Video Technology24(12), 2034–2048 (2014)
2034
-
[61]
In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W
Wu, H., Xu, J., Wang, J., Long, M.: Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems (2021)
2021
-
[62]
IEEE International Confer- ence on Multimedia and Expo Workshops pp
Wu, M.X., Chang, C.S., Miao, J.M., Lee, C.Y.: Predicting Car Accidents with YOLOv7 Object Detection and Object Relationships. IEEE International Confer- ence on Multimedia and Expo Workshops pp. 87–89 (2023)
2023
-
[63]
IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)
Yao, Y., Wang, X., Xu, M., Pu, Z., Wang, Y., Atkins, E., Crandall, D.: DoTA: unsupervised detection of traffic anomaly in driving videos. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)
2022
-
[64]
IEEE International Conference on Multimedia and Expo Workshops pp
Yi, C., Huang, T., Ye, H.J., chuan Zhan, D.: Improved Dynamic Spatial-Temporal Attention Network for Early Anticipation of Traffic Accidents. IEEE International Conference on Multimedia and Expo Workshops pp. 81–86 (2023) 16 N.P. Desai et al
2023
-
[65]
IEEE/CVF Conference on Computer Vision and Pattern Recognition pp
Yoo, J.H., Kim, S., Lee, D., Kim, C., Hong, S.: Towards End-to-End Genera- tive Modeling of Long Videos with Memory-Efficient Bidirectional Transformers. IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 22888– 22897 (2023)
2023
-
[66]
In: Eu- ropean Conference on Computer Vision (2020)
You, T., Han, B.: Traffic Accident Benchmark for Causality Recognition. In: Eu- ropean Conference on Computer Vision (2020)
2020
-
[67]
IEEE Conference on Computer Vision and Pattern Recognition pp
Zeng, K.H., Chou, S.H., Chan, F.H., Niebles, J.C., Sun, M.: Agent-Centric Risk As- sessment: Accident Anticipation and Risky Region Localization. IEEE Conference on Computer Vision and Pattern Recognition pp. 1330–1338 (2017)
2017
-
[68]
In: IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zhang, M., Wang, J., Qi, Q., Sun, H., Zhuang, Z., Ren, P., Ma, R., Liao, J.: Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Repre- sentation Learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 17385–17394 (2024)
2024
-
[69]
In: ACM International Conference on Information and Knowledge Management
Zhao, S., Jin, M., Hou, Z., Yang, C., Li, Z., Wen, Q., Wang, Y.: HiMTM: Hierarchi- cal Multi-Scale Masked Time Series Modeling with Self-Distillation for Long-Term Forecasting. In: ACM International Conference on Information and Knowledge Management. p. 3352–3362 (2024)
2024
-
[70]
IEEE Transactions on Circuits and Systems for Video Technology32, 8285–8296 (2022) CollideNet: Hierarchical Multi-scale Video Learning for TTC Forecasting 17 Appendix A.1
Zhong, Y., Chen, X., Hu, Y., Tang, P., Ren, F.: Bidirectional Spatio-Temporal Feature Learning With Multiscale Evaluation for Video Anomaly Detection. IEEE Transactions on Circuits and Systems for Video Technology32, 8285–8296 (2022) CollideNet: Hierarchical Multi-scale Video Learning for TTC Forecasting 17 Appendix A.1. Release details The code to implem...
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.