pith. sign in

arxiv: 2604.09327 · v1 · submitted 2026-04-10 · 💻 cs.CV

From Frames to Events: Rethinking Evaluation in Human-Centric Video Anomaly Detection

Pith reviewed 2026-05-10 17:26 UTC · model grok-4.3

classification 💻 cs.CV
keywords video anomaly detectionevent-level evaluationtemporal localizationframe vs event metricssurveillancepose-based detectionbenchmark audit
0
0 comments X

The pith

Traditional frame-level scoring in video anomaly detection overestimates how well models can identify coherent anomalous events.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that current evaluation practices treat videos as bags of independent frames, which hides the fact that anomalies occur as extended temporal episodes with clear onsets and durations. Real-world surveillance systems need reliable detection of these whole events to trigger actionable alerts, not scattered frame flags. By auditing standard benchmarks and adapting event-matching metrics, the work demonstrates that even strong frame-level performers fail dramatically at localizing events. This gap arises because frame metrics ignore temporal coherence and boundary accuracy. The authors therefore advocate shifting both evaluation and model design toward explicit event outputs.

Core claim

While state-of-the-art pose-based VAD models exceed 52 percent frame-level AUC-ROC on datasets such as NWPUC, they achieve event-level localization precision below 10 percent even at the lenient tIoU threshold of 0.2, with an average multi-threshold event F1 of only 0.11. The paper establishes this by first characterizing the event structure of common benchmarks and then applying tIoU-based matching together with hierarchical smoothing pipelines to convert frame scores into event detections.

What carries the argument

Event-level evaluation protocol that adapts tIoU-based event matching and multi-threshold F1 scoring from temporal action localization, combined with a score-refinement pipeline using hierarchical Gaussian smoothing and adaptive binarization.

If this is right

  • Models must be redesigned or post-processed to output explicit event intervals instead of per-frame scores if they are to support operational surveillance.
  • Frame-level AUC-ROC alone is no longer sufficient to claim readiness for deployment; event-level precision and F1 become the primary reported figures.
  • Benchmark datasets require updated annotations that mark contiguous anomalous episodes rather than per-frame labels.
  • The proposed refinement pipeline can be applied to existing frame-based models to produce usable event detections without full retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Event-centric evaluation may naturally encourage training objectives that penalize fragmented or temporally misaligned predictions, potentially improving robustness in long untrimmed videos.
  • The observed gap suggests that pose-based VAD could benefit from explicit integration with temporal action localization architectures rather than relying on post-hoc conversion.
  • If event-level metrics become standard, future datasets might prioritize clearer event boundaries during collection, reducing the current mismatch between annotation style and operational needs.

Load-bearing premise

The assumption that anomaly events in existing video datasets have sufficiently well-defined boundaries so that action-localization metrics can be applied without VAD-specific adjustments for ambiguity or overlapping episodes.

What would settle it

A re-annotation of the NWPUC benchmark with explicit start and end times for each anomalous episode, followed by re-computation of event F1 scores; if the gap disappears under the new labels, the low performance would be attributable to benchmark structure rather than model capability.

Figures

Figures reproduced from arXiv: 2604.09327 by Armin Danesh Pazho, Babak Rahimi Ardabili, Hamed Tabkhi, Narges Rashvand, Shanle Yao.

Figure 1
Figure 1. Figure 1: The proposed three-stage Frame-to-Event Transformation framework. Raw anomaly scores undergo hierarchical Gaussian [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: branch event-level anomaly detection framework. Given an input pose sequence, the model processes the data through two [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
read the original abstract

Pose-based Video Anomaly Detection (VAD) has gained significant attention for its privacy-preserving nature and robustness to environmental variations. However, traditional frame-level evaluations treat video as a collection of isolated frames, fundamentally misaligned with how anomalies manifest and are acted upon in the real world. In operational surveillance systems, what matters is not the flagging of individual frames, but the reliable detection, localization, and reporting of a coherent anomalous event, a contiguous temporal episode with an identifiable onset and duration. Frame-level metrics are blind to this distinction, and as a result, they systematically overestimate model performance for any deployment that requires actionable, event-level alerts. In this work, we propose a shift toward an event-centric perspective in VAD. We first audit widely used VAD benchmarks, including SHT[19], CHAD[6], NWPUC[4], and HuVAD[25], to characterize their event structure. We then introduce two strategies for temporal event localization: a score-refinement pipeline with hierarchical Gaussian smoothing and adaptive binarization, and an end-to-end Dual-Branch Model that directly generates event-level detections. Finally, we establish the first event-based evaluation standard for VAD by adapting Temporal Action Localization metrics, including tIoU-based event matching and multi-threshold F1 evaluation. Our results quantify a substantial performance gap: while all SoTA models achieve frame-level AUC-ROC exceeding 52% on the NWPUC[4], their event-level localization precision falls below 10% even at a minimal tIoU=0.2, with an average event-level F1 of only 0.11 across all thresholds. The code base for this work is available at https://github.com/TeCSAR-UNCC/EventCentric-VAD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that frame-level evaluation in pose-based Video Anomaly Detection (VAD) systematically overestimates performance because it ignores the contiguous, event-level nature of real-world anomalies. Through an audit of benchmarks SHT, CHAD, NWPUC and HuVAD, it reports that SOTA models achieve frame-level AUC-ROC >52% yet event-level localization precision <10% at tIoU=0.2 with average multi-threshold F1 of only 0.11. The authors introduce a score-refinement pipeline (hierarchical Gaussian smoothing + adaptive binarization), an end-to-end Dual-Branch Model, and adapt Temporal Action Localization metrics (tIoU matching and multi-threshold F1) as the new evaluation standard, with code released at https://github.com/TeCSAR-UNCC/EventCentric-VAD.

Significance. If the reported gap is robust to the choice of event boundary definition, the work could meaningfully redirect VAD research toward operationally relevant metrics and models. The open-source code and explicit proposal of an event-based benchmark standard are concrete strengths that support reproducibility. The significance is tempered by the need to verify that the adapted TAL protocol does not itself create an artifactual gap for anomalies with gradual transitions.

major comments (3)
  1. [Event Localization Strategies] Section on event localization and score refinement: the hierarchical Gaussian smoothing and adaptive binarization steps introduce free parameters whose exact values, selection procedure, and sensitivity are not reported with ablations; because these steps directly determine the event intervals fed to tIoU matching, the quantitative gap (precision <10% at tIoU=0.2) cannot be fully assessed without this information.
  2. [Evaluation Metrics and Benchmark Audit] Evaluation protocol and benchmark audit: the direct transfer of tIoU-based matching (minimum tIoU=0.2) and multi-threshold F1 from TAL assumes that anomaly boundaries are as sharply delineated as action instances; the paper's own characterization of event structure in SHT/CHAD/NWPUC/HuVAD does not quantify boundary fuzziness or label noise at onsets/offsets, leaving open the possibility that the reported performance gap is partly an artifact of the metric rather than evidence of model failure.
  3. [Dual-Branch Model] Dual-Branch Model architecture: the description does not specify how the two branches are fused or whether the model still relies on post-processing steps equivalent to the score-refinement pipeline; without these details it is unclear whether the model constitutes a genuine end-to-end alternative or merely reparameterizes the same evaluation protocol.
minor comments (2)
  1. [Abstract and Section 2] Citation style for benchmarks (SHT[19], NWPUC[4], etc.) is inconsistent between abstract and main text; a single reference list entry per dataset would improve clarity.
  2. [Figure 1] The caption of the figure illustrating event structure should explicitly state the criteria used to merge consecutive anomalous frames into events.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We have addressed each major point below with clarifications and revisions to the manuscript where needed.

read point-by-point responses
  1. Referee: Section on event localization and score refinement: the hierarchical Gaussian smoothing and adaptive binarization steps introduce free parameters whose exact values, selection procedure, and sensitivity are not reported with ablations; because these steps directly determine the event intervals fed to tIoU matching, the quantitative gap (precision <10% at tIoU=0.2) cannot be fully assessed without this information.

    Authors: We agree that the parameter details and sensitivity analysis were insufficiently documented. In the revised manuscript we have added an explicit subsection (now Section 5.2) that reports the exact hyperparameter values (Gaussian kernel sizes and standard deviations at each hierarchy level, plus the adaptive binarization threshold formula), the grid-search procedure used on a held-out validation split to select them, and a full ablation table measuring the effect of each parameter on event-level precision, recall, and F1 at multiple tIoU thresholds. These additions allow direct reproduction of the reported gap. revision: yes

  2. Referee: Evaluation protocol and benchmark audit: the direct transfer of tIoU-based matching (minimum tIoU=0.2) and multi-threshold F1 from TAL assumes that anomaly boundaries are as sharply delineated as action instances; the paper's own characterization of event structure in SHT/CHAD/NWPUC/HuVAD does not quantify boundary fuzziness or label noise at onsets/offsets, leaving open the possibility that the reported performance gap is partly an artifact of the metric rather than evidence of model failure.

    Authors: The concern about boundary fuzziness is valid and was not fully quantified in the original audit. We have expanded Section 4 with new statistics on onset/offset label variability (where multiple annotators exist) and a sensitivity experiment that perturbs ground-truth boundaries by up to 2 seconds while re-computing event-level metrics. Even under these relaxed boundary conditions the performance gap remains large (event-level precision still below 15 % at tIoU = 0.2). We therefore maintain that the gap is not solely an artifact of the metric, but we now explicitly discuss the residual uncertainty introduced by gradual transitions. revision: partial

  3. Referee: Dual-Branch Model architecture: the description does not specify how the two branches are fused or whether the model still relies on post-processing steps equivalent to the score-refinement pipeline; without these details it is unclear whether the model constitutes a genuine end-to-end alternative or merely reparameterizes the same evaluation protocol.

    Authors: We apologize for the incomplete architectural description. The revised Section 6 now states that the two branches (temporal feature encoder and event-proposal decoder) are fused by a cross-attention layer whose output is passed directly to a final sigmoid head that predicts start/end times and anomaly scores. No hierarchical Gaussian smoothing or adaptive binarization is applied at inference; the model produces event intervals in a single forward pass. A new figure (Figure 4) and accompanying text make the fusion and end-to-end nature explicit. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical audit and metric adaptation are self-contained

full rationale

The paper performs an empirical audit of frame-level labels in existing benchmarks (SHT, CHAD, NWPUC, HuVAD) to extract event structure, then applies standard tIoU-based matching and multi-threshold F1 from Temporal Action Localization without any fitted parameters or self-referential equations. The score-refinement pipeline and Dual-Branch Model are new proposals evaluated directly under the adapted protocol; reported gaps (frame AUC-ROC >52% vs. event precision <10% at tIoU=0.2) are computed outputs, not quantities defined by construction from the inputs. No self-citations, ansatzes, or uniqueness theorems are load-bearing. The derivation chain is observational and does not reduce to tautology.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that anomalies are contiguous temporal events and introduces new pipeline parameters and a model architecture without independent evidence for the entity beyond the proposal itself.

free parameters (2)
  • hierarchical Gaussian smoothing parameters
    Used in the score-refinement pipeline for temporal event localization.
  • adaptive binarization thresholds
    Parameters for converting refined scores into event detections.
axioms (1)
  • domain assumption Anomalies in video manifest as coherent contiguous temporal episodes with identifiable onset and duration rather than isolated frames.
    Invoked as the fundamental misalignment between traditional evaluation and real-world anomaly manifestation.
invented entities (1)
  • Dual-Branch Model no independent evidence
    purpose: End-to-end architecture that directly generates event-level detections instead of frame scores.
    New model component proposed to address the event localization task.

pith-pipeline@v0.9.0 · 5645 in / 1456 out tokens · 76158 ms · 2026-05-10T17:26:57.582368+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 1 internal anchor

  1. [1]

    Anomaly detection in healthcare monitoring survey

    Ayman A Ali, Ahmed Ashraf, and Kamel H Rahouma. Anomaly detection in healthcare monitoring survey. InAd- vanced Research Trends in Sustainable Solutions, Data Ana- lytics, and Security, pages 29–56. IGI Global Scientific Pub- lishing, 2025. 1

  2. [2]

    PhD thesis, The University of North Carolina at Charlotte, 2025

    Babak Rahimi Ardabili.Co-Creating Responsible Artificial Intelligence for Public Safety. PhD thesis, The University of North Carolina at Charlotte, 2025. 2

  3. [3]

    Exploring public’s perception of safety and video surveillance technology: A survey approach.Technology in Society, 78:102641, 2024

    Babak Rahimi Ardabili, Armin Danesh Pazho, Ghazal Alinezhad Noghre, Vinit Katariya, Gordon Hull, Shannon Reid, and Hamed Tabkhi. Exploring public’s perception of safety and video surveillance technology: A survey approach.Technology in Society, 78:102641, 2024. 2

  4. [4]

    A new comprehensive benchmark for semi-supervised video anomaly detection and anticipation

    Congqi Cao, Yue Lu, Peng Wang, and Yanning Zhang. A new comprehensive benchmark for semi-supervised video anomaly detection and anticipation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20392–20401, 2023. 1, 2, 3, 6, 7, 8

  5. [5]

    Xiaoyu Chen, Shichao Kan, Fanghui Zhang, Yigang Cen, Linna Zhang, and Damin Zhang. Multiscale spatial tempo- ral attention graph convolution network for skeleton-based anomaly behavior detection.Journal of Visual Communica- tion and Image Representation, 90:103707, 2023. 2

  6. [6]

    Chad: Charlotte anomaly dataset

    Armin Danesh Pazho, Ghazal Alinezhad Noghre, Babak Rahimi Ardabili, Christopher Neff, and Hamed Tabkhi. Chad: Charlotte anomaly dataset. InScandinavian Confer- ence on Image Analysis, pages 50–66. Springer, 2023. 1, 2, 3, 6, 7, 8

  7. [7]

    Online anomaly detection in surveillance videos with asymptotic bound on false alarm rate.Pattern Recognition, 114:107865, 2021

    Keval Doshi and Yasin Yilmaz. Online anomaly detection in surveillance videos with asymptotic bound on false alarm rate.Pattern Recognition, 114:107865, 2021. 2

  8. [8]

    Rethinking video anomaly detection - a continual learning approach

    Keval Doshi and Yasin Yilmaz. Rethinking video anomaly detection - a continual learning approach. InProceedings of the IEEE/CVF Winter Conference on Applications of Com- puter Vision (WACV), pages 3961–3970, 2022

  9. [9]

    Towards interpretable video anomaly detection

    Keval Doshi and Yasin Yilmaz. Towards interpretable video anomaly detection. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pages 2655–2664, 2023. 2

  10. [10]

    Anomaly detection in smart environments: A com- prehensive survey.IEEE access, 12:64006–64049, 2024

    Daniel F ¨ahrmann, Laura Mart ´ın, Luis S´anchez, and Naser Damer. Anomaly detection in smart environments: A com- prehensive survey.IEEE access, 12:64006–64049, 2024. 1

  11. [11]

    Anomaly detection in smart houses for healthcare: Recent advances, and future perspectives.SN Computer Science, 5(1):136, 2024

    Yves M Galv ˜ao, Let ´ıcia Castro, Janderson Ferreira, Fer- nando Buarque de Lima Neto, Roberta Andrade de Ara ´ujo Fagundes, and Bruno JT Fernandes. Anomaly detection in smart houses for healthcare: Recent advances, and future perspectives.SN Computer Science, 5(1):136, 2024. 1

  12. [12]

    Examining radiation therapy planning knowl- edge in large language models

    Ouldouz Ghorbani, Ahmed Helmy, Qingrong Jackie Wu, and Yaorong Ge. Examining radiation therapy planning knowl- edge in large language models. InProceedings of the 16th ACM International Conference on Bioinformatics, Compu- tational Biology, and Health Informatics, pages 1–1, 2025. 1

  13. [13]

    Skeleton motion words for unsupervised skeleton- based temporal action segmentation

    Uzay G ¨okay, Federico Spurio, Dominik R Bach, and Juer- gen Gall. Skeleton motion words for unsupervised skeleton- based temporal action segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12101–12111, 2025. 2

  14. [14]

    Normalizing flows for human pose anomaly detection

    Or Hirschorn and Shai Avidan. Normalizing flows for human pose anomaly detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 13545– 13554, 2023. 1

  15. [15]

    Normalizing flows for human pose anomaly detection

    Or Hirschorn and Shai Avidan. Normalizing flows for human pose anomaly detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 13545– 13554, 2023. 2, 6, 8

  16. [16]

    Hierarchical graph em- bedded pose regularity learning via spatio-temporal trans- former for abnormal behavior detection

    Chao Huang, Yabo Liu, Zheng Zhang, Chengliang Liu, Jie Wen, Yong Xu, and Yaowei Wang. Hierarchical graph em- bedded pose regularity learning via spatio-temporal trans- former for abnormal behavior detection. InProceedings of the 30th ACM International Conference on Multimedia, pages 307–315, 2022. 2

  17. [17]

    Posecvae: Anomalous human activity detection

    Yashswi Jain, Ashvini Kumar Sharma, Rajbabu Velmurugan, and Biplab Banerjee. Posecvae: Anomalous human activity detection. In2020 25th International Conference on Pattern Recognition (ICPR), pages 2927–2934. IEEE, 2021. 2

  18. [18]

    Anomaly detection in traffic surveillance videos using deep learning.Sensors, 22 (17):6563, 2022

    Sardar Waqar Khan, Qasim Hafeez, Muhammad Irfan Khalid, Roobaea Alroobaea, Saddam Hussain, Jawaid Iqbal, Jasem Almotiri, and Syed Sajid Ullah. Anomaly detection in traffic surveillance videos using deep learning.Sensors, 22 (17):6563, 2022. 1

  19. [19]

    Fu- ture frame prediction for anomaly detection – a new baseline

    Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. Fu- ture frame prediction for anomaly detection – a new baseline. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 1, 2, 3, 6, 7, 8

  20. [20]

    Graph embedded pose clustering for anomaly detection

    Amir Markovitz, Gilad Sharir, Itamar Friedman, Lihi Zelnik- Manor, and Shai Avidan. Graph embedded pose clustering for anomaly detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10539–10547, 2020. 2

  21. [21]

    Pratik K Mishra, Alex Mihailidis, and Shehroz S Khan. Skeletal video anomaly detection using deep learning: Sur- vey, challenges, and future directions.IEEE Transactions on Emerging Topics in Computational Intelligence, 8(2):1073– 1085, 2024. 1

  22. [22]

    Human-centric video anomaly detection through spatio-temporal pose tokenization and trans- former,

    Ghazal Alinezhad Noghre, Armin Danesh Pazho, and Hamed Tabkhi. Human-centric video anomaly detection through spatio-temporal pose tokenization and transformer. arXiv preprint arXiv:2408.15185, 2024. 1

  23. [23]

    An exploratory study on human-centric video anomaly detection through variational autoencoders and trajectory prediction

    Ghazal Alinezhad Noghre, Armin Danesh Pazho, and Hamed Tabkhi. An exploratory study on human-centric video anomaly detection through variational autoencoders and trajectory prediction. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 995–1004, 2024. 2, 6, 8

  24. [24]

    Human-centric video anomaly detection through spatio-temporal pose tokenization and transformer,

    Ghazal Alinezhad Noghre, Armin Danesh Pazho, and Hamed Tabkhi. Human-centric video anomaly detection through spatio-temporal pose tokenization and transformer,

  25. [25]

    Towards adaptive human-centric video anomaly de- tection: A comprehensive framework and a new benchmark

    Armin Danesh Pazho, Shanle Yao, Ghazal Alinezhad Noghre, Babak Rahimi Ardabili, Vinit Katariya, and Hamed 9 Tabkhi. Towards adaptive human-centric video anomaly de- tection: A comprehensive framework and a new benchmark. IEEE Transactions on Circuits and Systems for Video Tech- nology, 2025. 1, 2, 3, 6, 7, 8

  26. [26]

    Traffic den- sity control for heterogeneous highway systems with input constraints.IEEE Control Systems Letters, 8:2787–2792,

    Arash Rahmanidehkordi and Amir H Ghasemi. Traffic den- sity control for heterogeneous highway systems with input constraints.IEEE Control Systems Letters, 8:2787–2792,

  27. [27]

    Shopformer: Transformer-based framework for detecting shoplifting via human pose

    Narges Rashvand, Ghazal Alinezhad Noghre, Armin Danesh Pazho, Babak Rahimi Ardabili, and Hamed Tabkhi. Shopformer: Transformer-based framework for detecting shoplifting via human pose. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5761–5770, 2025. 1

  28. [28]

    Multi-timescale trajectory predic- tion for abnormal human activity detection

    Royston Rodrigues, Neha Bhargava, Rajbabu Velmurugan, and Subhasis Chaudhuri. Multi-timescale trajectory predic- tion for abnormal human activity detection. InProceedings of the IEEE/CVF winter conference on applications of com- puter vision, pages 2626–2634, 2020. 2

  29. [29]

    Evaluating the effectiveness of video anomaly detection in the wild: Online learning and inference for real-world deployment

    Shanle Yao, Ghazal Alinezhad Noghre, Armin Danesh Pazho, and Hamed Tabkhi. Evaluating the effectiveness of video anomaly detection in the wild: Online learning and inference for real-world deployment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4832–4841, 2024. 2

  30. [30]

    From lab to field: Real-world evaluation of an ai-driven smart video solution to enhance community safety.Internet of Things, page 101716, 2025

    Shanle Yao, Babak Rahimi Ardabili, Armin Danesh Pazho, Ghazal Alinezhad Noghre, Christopher Neff, Lauren Bourque, and Hamed Tabkhi. From lab to field: Real-world evaluation of an ai-driven smart video solution to enhance community safety.Internet of Things, page 101716, 2025

  31. [31]

    Alfred: An active learning framework for real-world semi-supervised anomaly detection with adaptive thresholds,

    Shanle Yao, Ghazal Alinezhad Noghre, Armin Danesh Pazho, and Hamed Tabkhi. Alfred: An active learning frame- work for real-world semi-supervised anomaly detection with adaptive thresholds.arXiv preprint arXiv:2508.09058, 2025. 2

  32. [32]

    Are multimodal llms ready for surveillance? a reality check on zero-shot anomaly detection in the wild

    Shanle Yao, Armin Danesh Pazho, Narges Rashvand, and Hamed Tabkhi. Are multimodal llms ready for surveillance? a reality check on zero-shot anomaly detection in the wild. arXiv preprint arXiv:2603.04727, 2026. 3

  33. [33]

    From offline to periodic adaptation for pose-based shoplifting detection in real-world retail security

    Shanle Yao, Narges Rashvand, Armin Danesh Pazho, and Hamed Tabkhi. From offline to periodic adaptation for pose-based shoplifting detection in real-world retail security. IEEE Internet of Things Journal, 2026. 2, 5

  34. [34]

    Regularity learning via explicit distribution model- ing for skeletal video anomaly detection.IEEE Transactions on Circuits and Systems for Video Technology, 2023

    Shoubin Yu, Zhongyin Zhao, Haoshu Fang, Andong Deng, Haisheng Su, Dongliang Wang, Weihao Gan, Cewu Lu, and Wei Wu. Regularity learning via explicit distribution model- ing for skeletal video anomaly detection.IEEE Transactions on Circuits and Systems for Video Technology, 2023. 2

  35. [35]

    Xianlin Zeng, Yalong Jiang, Wenrui Ding, Hongguang Li, Yafeng Hao, and Zifeng Qiu. A hierarchical spatio-temporal graph convolutional neural network for anomaly detection in videos.IEEE Transactions on Circuits and Systems for Video Technology, 33(1):200–212, 2021. 2

  36. [36]

    Actionformer: Lo- calizing moments of actions with transformers

    Chen-Lin Zhang, Jianxin Wu, and Yin Li. Actionformer: Lo- calizing moments of actions with transformers. InEuropean Conference on Computer Vision, pages 492–510. Springer,