Rethinking Weak Supervision in Anomaly Detection: A Comprehensive Benchmark

Chaochuan Hou; Hailiang Huang; Minqi Jiang; Shiping Wang; Shuang Liang; Siyuan Zhou; Songqiao Han; Xu Yao; Zhenbo Wu

arxiv: 2605.26068 · v3 · pith:RWLFDV6Tnew · submitted 2026-05-25 · 💻 cs.LG · cs.AI

Rethinking Weak Supervision in Anomaly Detection: A Comprehensive Benchmark

Xu Yao , Siyuan Zhou , Zhenbo Wu , Chaochuan Hou , Shuang Liang , Shiping Wang , Hailiang Huang , Songqiao Han

show 1 more author

Minqi Jiang

This is my paper

Pith reviewed 2026-06-29 22:24 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords weak supervisionanomaly detectionbenchmarklabel noisetabular foundation modelsincomplete supervisioninexact supervisioninaccurate supervision

0 comments

The pith

A single benchmark across weak supervision scenarios in anomaly detection finds strong intrinsic correlations between them and shows specialized methods lose to general models once labels exceed extreme scarcity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds WSADBench to test anomaly detection under incomplete, inexact, and inaccurate supervision within one evaluation framework instead of treating them as separate tracks. It runs more than 700,000 experiments that systematically change how many labels are available, how coarse they are, and how noisy they are, across dozens of algorithms and four data types. Results indicate the three supervision types share fundamental mechanisms rather than presenting distinct challenges. A sympathetic reader would care because this questions whether current research directions are pursuing truly independent problems and shows when investing in specialized anomaly detectors stops being worthwhile.

Core claim

WSADBench shows that the three primary weak supervision scenarios exhibit strong intrinsic correlations, that specialized WSAD algorithms only outperform others in the most extreme label-scarcity regimes and are quickly surpassed by tabular foundation models and general classification methods as supervision increases or in out-of-distribution cases, that unlabeled data yields inconsistent and marginal gains compared with label refinement, and that models display asymmetric sensitivity to different forms of label noise.

What carries the argument

WSADBench, the benchmark that applies standardized protocols for varying label quantity, granularity, and quality to compare 36 algorithms across four modalities in a unified way.

If this is right

Strong correlations between incomplete, inexact, and inaccurate supervision challenge the practice of isolating research on each direction.
Specialized WSAD algorithms are competitive only under extreme label scarcity and lose to foundation models and general classifiers otherwise or in OOD settings.
Unlabeled data provides inconsistent and smaller benefits than improving the quality or granularity of existing labels.
Model performance reacts differently to different kinds of label noise, with some noise types hurting more than others.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Research effort might shift toward methods designed to exploit the shared structure across supervision types rather than building separate tools for each.
Similar unified benchmarks could be useful in other weakly supervised domains to check whether apparent distinctions are real or artifacts of isolated evaluation.
In practice, teams may gain more by investing in label cleaning than by collecting additional unlabeled examples.
The observed asymmetry in noise sensitivity suggests targeted noise-robust training techniques could be developed for the most damaging noise types.

Load-bearing premise

The chosen collection of 36 algorithms, four modalities, and the particular ways of changing label quantity, granularity, and quality are representative enough to draw general conclusions about performance boundaries and correlations in real-world settings.

What would settle it

An independent replication that applies the same variation protocols to a substantially different set of algorithms or modalities and finds either no correlations between the supervision scenarios or continued dominance of specialized methods outside extreme scarcity would falsify the central claims.

Figures

Figures reproduced from arXiv: 2605.26068 by Chaochuan Hou, Hailiang Huang, Minqi Jiang, Shiping Wang, Shuang Liang, Siyuan Zhou, Songqiao Han, Xu Yao, Zhenbo Wu.

**Figure 1.** Figure 1: Overview of the WSADBench. It integrates datasets spanning diverse modalities and varied supervision scenarios into [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Tabular Foundation model performance (AUCPR) [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: AUCPR results on tabular datasets under varying [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Model sensitivity to label noise: AUCPR degradation [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Analysis of incomplete OOD across 3 Settings. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Anomaly score decision boundaries on Metal_nut [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 8.** Figure 8: Model ranking radar chart. outperform specialized inexact approaches on video MIL tasks: DeepSAD achieves mean AUCPR 0.453, surpassing the specialized MIL model AR-Net’s AUCPR 0.441. Conversely, specialized inexact methods show mixed transferability to tabular MIL. Sultani transfers relatively well and ranks second, but it is still outperformed by TabPFN, while GCN-Anomaly remains less effective in both s… view at source ↗

**Figure 9.** Figure 9: Critical Difference (CD) diagrams under extremely limited supervision for each modality. The top row (a-c) compares [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Visualization of incomplete OOD results. Row [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: Visualization of AUCROC and AUCPR distributions under three different incomplete OOD settings. [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

**Figure 12.** Figure 12: Heatmap of spearman correlations between dataset meta-features (denoted by symbols) and model performance [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: AUCPR (a) and AUCROC (b) results on Tabular datasets under varying labeled ( [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

**Figure 14.** Figure 14: Performance comparison under inaccurate conditions (label noise) on tabular datasets. [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗

**Figure 16.** Figure 16: 3D surfaces visualizing AUCPR degradation of different models under varying flip normal ratios (FNR) and flip [PITH_FULL_IMAGE:figures/full_fig_p022_16.png] view at source ↗

read the original abstract

Weakly supervised anomaly detection (WSAD) has developed in three primary directions: incomplete, inexact, and inaccurate supervision. However, these directions remain isolated, lacking a unified framework to assess whether they address unique challenges or share fundamental mechanisms. This paper introduces WSADBench, the first benchmark that unifies evaluation across distinct weakly supervised scenarios, benchmarking diverse approaches from specialized WSAD methods to advanced tabular foundation models. WSADBench establishes standardized protocols to evaluate 36 algorithms across 4 modalities by systematically varying label quantity, granularity, and quality, revealing the performance boundaries of various methods. Based on over 700K experiments, WSADBench reveals four critical insights: (i) Strong intrinsic correlations exist between these weak supervision scenarios, challenging the isolation of current research directions. (ii) Specialized WSAD algorithms excel only in extreme label-scarcity regimes but are quickly dominated by tabular foundation models and general classification methods as supervision increases or in OOD scenarios. (iii) Unlabeled data shows inconsistent utility across settings, with marginal gains compared to label refinement. (iv) Models exhibit asymmetric sensitivity to different types of label noise. We release WSADBench as an open-source benchmark with code and datasets to facilitate future WSAD research: https://github.com/SUFE-AILAB/WSADBench.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

WSADBench gives the first large-scale unified comparison across the three WSAD supervision types and releases the code, but its claims on intrinsic correlations and method dominance rest on one fixed set of 36 algorithms and simulation protocols.

read the letter

The main point is that this paper runs a single benchmark across incomplete, inexact, and inaccurate supervision in anomaly detection, using over 700k experiments to show correlations between the scenarios and that specialized WSAD methods only win in extreme low-label cases before general classifiers and foundation models take over.

What is new is the unified framework itself plus the systematic sweeps over label quantity, granularity, and quality on four modalities. The release of code and datasets is concrete and lets others extend the work. The four listed insights follow directly from those runs, and the scale is larger than the isolated studies that came before.

The soft spot is exactly the one the stress-test flags: everything rests on the particular 36 algorithms and the specific ways the weak labels were generated. If those choices under-represent recent self-supervised or graph methods, or if the shared base datasets and noise models create the observed correlations, then the "intrinsic" part and the dominance boundaries become tied to the experimental design rather than general properties. The abstract gives no numbers on statistical testing or selection criteria, so that needs checking in the full text.

This is for people already working on anomaly detection who want a consolidated reference point instead of hunting through separate papers. A reader who cares about empirical boundaries in weak supervision will get usable information from it. The artifact release and experiment volume are enough to justify sending it to a serious referee rather than desk-rejecting.

Referee Report

2 major / 1 minor

Summary. The paper introduces WSADBench, the first unified benchmark for weakly supervised anomaly detection across incomplete, inexact, and inaccurate supervision. It evaluates 36 algorithms across 4 modalities using standardized protocols that systematically vary label quantity, granularity, and quality, conducting over 700K experiments to derive four insights: strong intrinsic correlations between the supervision scenarios, specialized WSAD methods being dominated by tabular foundation models and general classifiers outside extreme scarcity or in OOD settings, inconsistent utility of unlabeled data relative to label refinement, and asymmetric model sensitivity to label noise types. Code and datasets are released.

Significance. If the experimental choices prove representative, the work supplies large-scale empirical evidence that isolated WSAD research directions may share mechanisms and that method superiority is regime-specific, which could guide future algorithm development and evaluation standards. The scale of the benchmark and open release are clear strengths for reproducibility.

major comments (2)

[Experimental setup] Experimental setup (protocols for label quantity/granularity/quality and algorithm selection): the four insights, particularly (i) on intrinsic correlations and (ii) on method dominance, rest on the assumption that the fixed set of 36 algorithms and simulation protocols are sufficiently representative. Reuse of the same base datasets and noise models across scenarios risks inducing the observed correlations as artifacts rather than intrinsic properties; a sensitivity analysis to alternative dataset families or method classes (e.g., recent graph-based WSAD) is needed to support generalizability.
[Results and analysis] Results and analysis sections: insight (iii) states unlabeled data shows 'marginal gains' and 'inconsistent utility,' yet no quantitative definition of marginal gain, statistical significance tests, or comparison baselines against label refinement are provided, weakening the claim that refinement is preferable.

minor comments (1)

[Abstract] Abstract and introduction: the claim of 'strong intrinsic correlations' would benefit from a brief parenthetical on the correlation metric (e.g., Spearman rank or Pearson on performance surfaces) used to establish them.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and recommendation for minor revision. We address the major comments point by point below.

read point-by-point responses

Referee: [Experimental setup] Experimental setup (protocols for label quantity/granularity/quality and algorithm selection): the four insights, particularly (i) on intrinsic correlations and (ii) on method dominance, rest on the assumption that the fixed set of 36 algorithms and simulation protocols are sufficiently representative. Reuse of the same base datasets and noise models across scenarios risks inducing the observed correlations as artifacts rather than intrinsic properties; a sensitivity analysis to alternative dataset families or method classes (e.g., recent graph-based WSAD) is needed to support generalizability.

Authors: Our selection of 36 algorithms spans specialized WSAD methods, general classifiers, and tabular foundation models across four modalities to represent core paradigms. Consistent base datasets and noise models are required to isolate the effects of supervision type and enable the benchmark's unification goal. The correlations and dominance patterns hold consistently across 700K experiments and multiple modalities, indicating intrinsic properties. We will add a dedicated paragraph in the revised manuscript discussing the scope of our algorithm and dataset choices and outlining directions for future sensitivity analyses. revision: partial
Referee: [Results and analysis] Results and analysis sections: insight (iii) states unlabeled data shows 'marginal gains' and 'inconsistent utility,' yet no quantitative definition of marginal gain, statistical significance tests, or comparison baselines against label refinement are provided, weakening the claim that refinement is preferable.

Authors: We agree that insight (iii) requires more rigorous quantification to be fully convincing. In the revision we will introduce an explicit definition of marginal gains (relative improvement below 5%), report statistical significance via paired tests across runs, and add direct side-by-side comparisons of unlabeled-data utility versus label-refinement baselines. revision: yes

Circularity Check

0 steps flagged

Empirical benchmark with no derivation chain or self-referential reductions

full rationale

The paper conducts a large-scale empirical evaluation of 36 algorithms on 4 modalities using standardized protocols for label quantity, granularity, and quality, generating results from external algorithm implementations and public datasets. No equations, fitted parameters, or predictions are defined in terms of the paper's own outputs. The four insights are direct summaries of experimental performance surfaces rather than reductions to self-definitions, self-citations, or ansatzes. Self-citations, if present, are not load-bearing for any claimed derivation. This is a standard non-circular benchmark paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an empirical benchmarking study with no mathematical derivations, fitted constants, or new postulated entities. It relies on the domain assumption that the selected algorithms and modalities adequately sample the space of WSAD methods.

axioms (1)

domain assumption The 36 algorithms and 4 modalities are representative of current WSAD practice.
Invoked when generalizing the four insights beyond the specific experimental runs.

pith-pipeline@v0.9.1-grok · 5787 in / 1218 out tokens · 38588 ms · 2026-06-29T22:24:11.101441+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

79 extracted references · 7 canonical work pages · 1 internal anchor

[1]

Moshira Abdalla, Sajid Javed, Muaz Al Radi, Anwaar Ulhaq, and Naoufel Werghi
[2]

Video anomaly detection in 10 years: A survey and outlook.Neural Computing and Applications37, 32 (2025), 26321–26364

2025
[3]

Andra Acsintoae, Andrei Florescu, Mariana-Iuliana Georgescu, Tudor Mare, Paul Sumedrea, Radu Tudor Ionescu, Fahad Shahbaz Khan, and Mubarak Shah
[4]

InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 20125–20135
[5]

Aggarwal

Charu C. Aggarwal. 2013.Outlier Analysis. Springer. doi:10.1007/978-1-4614- 6396-2

work page doi:10.1007/978-1-4614- 2013
[6]

Samet Akcay, Amir Atapour-Abarghouei, and Toby P Breckon. 2018. Ganomaly: Semi-supervised anomaly detection via adversarial training. InAsian conference on computer vision. Springer, 622–637

2018
[7]

Fabrizio Angiulli and Clara Pizzuti. 2002. Fast outlier detection in high dimen- sional spaces. InEuropean conference on principles of data mining and knowledge discovery. Springer, 15–27

2002
[8]

Jinan Bao, Hanshi Sun, Hanqiu Deng, Yinsheng Brennan He, Zhaoxiang Zhang, and Xingyu Li. 2024. BMAD: Benchmarks for Medical Anomaly Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4042–4053

2024
[9]

Mohamed Bouadi, Pratinav Seth, Aditya Tanna, and Vinay Kumar Sankarapu
[10]

arXiv preprint arXiv:2511.02818(2025)

Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning. arXiv preprint arXiv:2511.02818(2025)

work page arXiv 2025
[11]

Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. 2000. LOF: identifying density-based local outliers. InProceedings of the 2000 ACM SIGMOD international conference on Management of data. 93–104

2000
[12]

Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. Inproceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299–6308

2017
[13]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794

2016
[14]

Yingxian Chen, Zhengzhe Liu, Baoheng Zhang, Wilton Fok, Xiaojuan Qi, and Yik-Chung Wu. 2023. Mgfn: Magnitude-contrastive glance-and-focus network for weakly-supervised video anomaly detection. InProceedings of the AAAI conference on artificial intelligence, Vol. 37. 387–395

2023
[15]

Choubo Ding, Guansong Pang, and Chunhua Shen. 2022. Catching both gray and black swans: Open-set supervised anomaly detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7388–7398

2022
[16]

Yutao Dong, Qing Li, Richard O Sinnott, Yong Jiang, and Shutao Xia. 2021. ISP self-operated BGP anomaly detection based on weakly supervised learning. In 2021 IEEE 29th International Conference on Network Protocols (ICNP). IEEE, 1–11

2021
[17]

Marius Dragoi, Elena Burceanu, Emanuela Haller, Andrei Manolache, and Florin Brad. 2022. Anoshift: A distribution shift benchmark for unsupervised anomaly detection.Advances in Neural Information Processing Systems35 (2022), 32854– 32867

2022
[18]

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. Slow- fast networks for video recognition. InProceedings of the IEEE/CVF international conference on computer vision. 6202–6211

2019
[19]

Adam Goodge, Bryan Hooi, See-Kiong Ng, and Wee Siong Ng. 2022. Lunar: Uni- fying local outlier detection methods via graph neural networks. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 6737–6745

2022
[20]

Yury Gorishniy, Akim Kotelnikov, and Artem Babenko. 2025. Tabm: Advanc- ing tabular deep learning with parameter-efficient ensembling. InInternational Conference on Learning Representations, Vol. 2025. 77899–77935

2025
[21]

Yury Gorishniy, Ivan Rubachev, Nikolay Kartashev, Daniil Shlenskii, Akim Kotel- nikov, and Artem Babenko. 2024. Tabr: Tabular deep learning meets nearest neighbors. InInternational Conference on Learning Representations, Vol. 2024. 18209–18249

2024
[22]

Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, and Artem Babenko. 2021. Revisiting deep learning models for tabular data.Advances in neural information processing systems34 (2021), 18932–18943

2021
[23]

Sachin Goyal, Aditi Raghunathan, Moksh Jain, Harsha Vardhan Simhadri, and Prateek Jain. 2020. DROCC: Deep robust one-class classification. InInternational conference on machine learning. PMLR, 3711–3721

2020
[24]

Léo Grinsztajn, Klemens Flöge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Benjamin Jäger, Dominik Safaric, Simone Alessi, Adrian Hayler, et al. 2025. Tabpfn-2.5: Advancing the state of the art in tabular foundation models.arXiv preprint arXiv:2511.08667(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Weinberger

Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. On Calibration of Modern Neural Networks. InProceedings of the 34th International Conference on Machine Learning. PMLR, 1321–1330

2017
[26]

Songqiao Han, Xiyang Hu, Hailiang Huang, Minqi Jiang, and Yue Zhao. 2022. ADBench: Anomaly Detection Benchmark. InNeurIPS

2022
[27]

Zengyou He, Xiaofei Xu, and Shengchun Deng. 2003. Discovering cluster-based local outliers.Pattern recognition letters24, 9-10 (2003), 1641–1650

2003
[28]

Tin Kam Ho and Mitra Basu. 2002. Complexity measures of supervised classifica- tion problems.IEEE transactions on pattern analysis and machine intelligence24, 3 (2002), 289–300

2002
[29]

Xi Jiang, Jian Li, Hanqiu Deng, Yong Liu, Bin-Bin Gao, Yifeng Zhou, Jialin Li, Chengjie Wang, and Feng Zheng. 2025. Mmad: A comprehensive benchmark for multimodal large language models in industrial anomaly detection. InInterna- tional conference on learning representations. 87273–87295

2025
[30]

Kingma and Max Welling

Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations

2014
[31]

Elizaveta Levina and Peter Bickel. 2004. Maximum likelihood estimation of intrinsic dimension.Advances in neural information processing systems17 (2004)

2004
[32]

Yuangang Li, Jiaqi Li, Zhuo Xiao, Tiankai Yang, Yi Nian, Xiyang Hu, and Yue Zhao. 2025. NLP-ADBench: NLP Anomaly Detection Benchmark. InFindings of the Association for Computational Linguistics: EMNLP 2025. 2464–2474. doi:10. 18653/v1/2025.findings-emnlp.133

2025
[33]

Zhe Li, Chunhua Sun, et al. 2022. Dual-MGAN: An Efficient Approach for Semi- supervised Outlier Detection with Few Identified Anomalies.TKDD(2022)

2022
[34]

Zheng Li, Yue Zhao, Xiyang Hu, Nicola Botta, Cezar Ionescu, and George H Chen. 2022. Ecod: Unsupervised outlier detection using empirical cumulative distribution functions.IEEE Transactions on Knowledge and Data Engineering35, 12 (2022), 12181–12193

2022
[35]

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In2008 eighth ieee international conference on data mining. IEEE, 413–422

2008
[36]

Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. 2018. Future frame prediction for anomaly detection–a new baseline. InProceedings of the IEEE conference on computer vision and pattern recognition. 6536–6545

2018
[37]

Cresswell, Keyvan Golestan, Guangwei Yu, Anthony L

Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Alex Labach, Jesse C. Cresswell, Keyvan Golestan, Guangwei Yu, Anthony L. Caterini, and Maksims Volkovs. 2025. TabDPT: Scaling Tabular Foundation Models on Real Data. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

2025
[38]

Mahdi Pakdaman Naeini, Gregory Cooper, and Milos Hauskrecht. 2015. Obtaining Well Calibrated Probabilities Using Bayesian Binning. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 29. KDD ’26, August 09–13, 2026, Jeju Island, Republic of Korea Xu Yao et al

2015
[39]

Curtis Northcutt, Lu Jiang, and Isaac Chuang. 2021. Confident learning: Esti- mating uncertainty in dataset labels.Journal of Artificial Intelligence Research70 (2021), 1373–1411

2021
[40]

Guansong Pang, Longbing Cao, Ling Chen, and Huan Liu. 2018. Learning rep- resentations of ultrahigh-dimensional data for random distance-based outlier detection. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2041–2050

2018
[41]

Guansong Pang, Chunhua Shen, Longbing Cao, and Anton Van Den Hengel. 2021. Deep learning for anomaly detection: A review.ACM computing surveys (CSUR) 54, 2 (2021), 1–38

2021
[42]

Guansong Pang, Chunhua Shen, Huidong Jin, and Anton Van Den Hengel. 2023. Deep weakly-supervised anomaly detection. InProceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining. 1795–1807

2023
[43]

Guansong Pang, Chunhua Shen, and Anton Van Den Hengel. 2019. Deep anomaly detection with deviation networks. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 353–362

2019
[44]

Lorenzo Perini, Vincent Vercruyssen, and Jesse Davis. 2023. Learning from positive and unlabeled multi-instance bags in anomaly detection. InProceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining. 1897–1906

2023
[45]

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Doro- gush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical features.Advances in neural information processing systems31 (2018)

2018
[46]

Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. 2026. TabICLv2: A better, faster, scalable, and open tabular foundation model. (2026)

2026
[47]

Olivier Roy and Martin Vetterli. 2007. The effective rank: A measure of effective dimensionality. In2007 15th European signal processing conference. IEEE, 606–610

2007
[48]

Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. 2018. Deep one-class classification. InInternational conference on machine learning. PMLR, 4393–4402

2018
[49]

Vandermeulen, Nico Görnitz, Alexander Binder, Emmanuel Müller, Klaus-Robert Müller, and Marius Kloft

Lukas Ruff, Robert A. Vandermeulen, Nico Görnitz, Alexander Binder, Emmanuel Müller, Klaus-Robert Müller, and Marius Kloft. 2020. Deep Semi-Supervised Anomaly Detection. InInternational Conference on Learning Representations

2020
[50]

Timur Sattarov, Marco Schreyer, and Damian Borth. 2025. Diffusion-Scheduled Denoising Autoencoders for Anomaly Detection in Tabular Data. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

2025
[51]

Mei-Ling Shyu, Shu-Ching Chen, Kanoksri Sarinnapakorn, and LiWu Chang
[52]

Technical report, Miami Univ Coral Gables Fl Dept of Electrical and Computer Engineering

A novel anomaly detection scheme based on principal component classifier. Technical report, Miami Univ Coral Gables Fl Dept of Electrical and Computer Engineering
[53]

Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, and Jae-Gil Lee. 2022. Learning from noisy labels with deep neural networks: A survey.IEEE transactions on neural networks and learning systems34, 11 (2022), 8135–8153

2022
[54]

Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-world anomaly de- tection in surveillance videos. InProceedings of the IEEE conference on computer vision and pattern recognition. 6479–6488

2018
[55]

Bowen Tian, Qinliang Su, and Jian Yin. 2022. Anomaly Detection by Lever- aging Incomplete Anomalous Knowledge with Anomaly-Aware Bidirectional GANs. InProceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22. 2255–2261

2022
[56]

Yu Tian, Guansong Pang, Yuanhong Chen, Rajvinder Singh, Johan W Verjans, and Gustavo Carneiro. 2021. Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. InProceedings of the IEEE/CVF international conference on computer vision. 4975–4986

2021
[57]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri
[58]

In Proceedings of the IEEE international conference on computer vision

Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE international conference on computer vision. 4489–4497
[59]

Boyang Wan, Yuming Fang, Xue Xia, and Jiajie Mei. 2020. Weakly Supervised Video Anomaly Detection via Center-Guided Discriminative Learning. In2020 IEEE International Conference on Multimedia and Expo (ICME). 1–6. doi:10.1109/ ICME46284.2020.9102722

work page arXiv 2020
[60]

Peng Wu, Jing Liu, Yujia Shi, Yujia Sun, Fangtao Shao, Zhaoyang Wu, and Zhiwei Yang. 2020. Not only look, but also listen: Learning multimodal violence detection under weak supervision. InEuropean conference on computer vision. Springer, 322–339

2020
[61]

Peng Wu, Xuerong Zhou, Guansong Pang, Lingru Zhou, Qingsen Yan, Peng Wang, and Yanning Zhang. 2024. Vadclip: Adapting vision-language models for weakly supervised video anomaly detection. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 6074–6082

2024
[62]

Feng Xiao and Jicong Fan. 2025. Text-ADBench: Text Anomaly Detection Bench- mark based on LLMs Embedding.arXiv preprint arXiv:2507.12295(2025)

work page arXiv 2025
[63]

Hongzuo Xu, Yijie Wang, Guansong Pang, Songlei Jian, Ning Liu, and Yongjun Wang. 2023. RoSAS: Deep semi-supervised anomaly detection with contamination-resilient continuous supervision.Information Processing & Man- agement60, 5 (2023), 103459

2023
[64]

Yajun Xu, Huan Hu, Chuwen Huang, Yibing Nan, Yuyao Liu, Kai Wang, Zhaox- iang Liu, and Shiguo Lian. 2025. TAD: A Large-Scale Benchmark for Traffic Accidents Detection From Video Surveillance.IEEE Access13 (2025), 2018–2033

2025
[65]

Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenx- uan Peng, Haoqi Wang, Dilip Chen, Bo Li, Yiyou Sun, et al . 2022. OpenOOD: Benchmarking generalized out-of-distribution detection. InAdvances in Neural Information Processing Systems, Vol. 35. 32598–32611

2022
[66]

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals
[67]

Understanding deep learning (still) requires rethinking generalization. Commun. ACM64, 3 (2021), 107–115

2021
[68]

Jieyu Zhang, Yue Yu, Yinghao Li, Yujing Wang, Yaming Yang, Mao Yang, and Alexander Ratner. 2021. WRENCH: A Comprehensive Benchmark for Weak Supervision. InProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks

2021
[69]

Xiyuan Zhang et al. 2025. Mitra: Mixed Synthetic Priors for Enhancing Tabular Foundation Models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

2025
[70]

Xingxuan Zhang, Gang Ren, Han Yu, Hao Yuan, Hui Wang, Jiansheng Li, Ji- ayun Wu, Lang Mo, Li Mao, Mingchao Hao, et al . 2025. Limix: Unleashing structured-data modeling capability for generalist intelligence.arXiv preprint arXiv:2509.03505(2025)

work page arXiv 2025
[71]

Yue Zhao and Maciej K Hryniewicki. 2018. Xgbod: improving supervised outlier detection with unsupervised representation learning. In2018 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8

2018
[72]

Yue Zhao, Zain Nasrullah, and Zheng Li. 2019. Pyod: A python toolbox for scalable outlier detection.Journal of machine learning research20, 96 (2019), 1–7

2019
[73]

Yue Zhao, Guoqing Zheng, Subhabrata Mukherjee, Robert McCann, and Ahmed Awadallah. 2023. Admoe: Anomaly detection with mixture-of-experts from noisy labels. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 4937–4945

2023
[74]

Jia-Xing Zhong, Nannan Li, Weijie Kong, Shan Liu, Thomas H Li, and Ge Li. 2019. Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1237–1246

2019
[75]

Hang Zhou, Junqing Yu, and Wei Yang. 2023. Dual memory units with uncertainty regulation for weakly supervised video anomaly detection. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 3769–3777

2023
[76]

Yingjie Zhou, Xucheng Song, Yanru Zhang, Fanxing Liu, Ce Zhu, and Lingqiao Liu. 2021. Feature encoding with autoencoders for weakly supervised anomaly detection.IEEE Transactions on Neural Networks and Learning Systems33, 6 (2021), 2454–2465

2021
[77]

Zhi-Hua Zhou. 2018. A brief introduction to weakly supervised learning.National science review5, 1 (2018), 44–53

2018
[78]

Spatial Spar- sity Penalty

Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Daeki Cho, and Haifeng Chen. 2018. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. InInternational conference on learning represen- tations. A Benchmark Details A.1 Dataset Summaries and Processing Specs We detail the diverse collection of datasets evaluated ...

work page arXiv 2018
[79]

Table 24: The average ± standard deviation and ranking of AUCPR under different 𝑁𝑙𝑎 (=1, 3, 5, 10, 15, 20, 50) settings on tabular datasets

(p) TabR-S 0.5 0.25 0.1 0.05 0.01 FNR 0.5 0.25 0.1 0.05 0.01 FAR 0.2 0.4 0.6 0.8 1.0AUCPR 0.177 0.5 0.25 0.1 0.05 0.01 FNR 0.5 0.25 0.1 0.05 0.01 FAR 0.2 0.4 0.6 0.8 1.0AUCPR 0.277 0.5 0.25 0.1 0.05 0.01 FNR 0.5 0.25 0.1 0.05 0.01 FAR 0.2 0.4 0.6 0.8 1.0AUCPR 0.144 0.5 0.25 0.1 0.05 0.01 FNR 0.5 0.25 0.1 0.05 0.01 FAR 0.2 0.4 0.6 0.8 1.0AUCPR 0.141 (q) Ta...

2026

[1] [1]

Moshira Abdalla, Sajid Javed, Muaz Al Radi, Anwaar Ulhaq, and Naoufel Werghi

[2] [2]

Video anomaly detection in 10 years: A survey and outlook.Neural Computing and Applications37, 32 (2025), 26321–26364

2025

[3] [3]

Andra Acsintoae, Andrei Florescu, Mariana-Iuliana Georgescu, Tudor Mare, Paul Sumedrea, Radu Tudor Ionescu, Fahad Shahbaz Khan, and Mubarak Shah

[4] [4]

InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 20125–20135

[5] [5]

Aggarwal

Charu C. Aggarwal. 2013.Outlier Analysis. Springer. doi:10.1007/978-1-4614- 6396-2

work page doi:10.1007/978-1-4614- 2013

[6] [6]

Samet Akcay, Amir Atapour-Abarghouei, and Toby P Breckon. 2018. Ganomaly: Semi-supervised anomaly detection via adversarial training. InAsian conference on computer vision. Springer, 622–637

2018

[7] [7]

Fabrizio Angiulli and Clara Pizzuti. 2002. Fast outlier detection in high dimen- sional spaces. InEuropean conference on principles of data mining and knowledge discovery. Springer, 15–27

2002

[8] [8]

Jinan Bao, Hanshi Sun, Hanqiu Deng, Yinsheng Brennan He, Zhaoxiang Zhang, and Xingyu Li. 2024. BMAD: Benchmarks for Medical Anomaly Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4042–4053

2024

[9] [9]

Mohamed Bouadi, Pratinav Seth, Aditya Tanna, and Vinay Kumar Sankarapu

[10] [10]

arXiv preprint arXiv:2511.02818(2025)

Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning. arXiv preprint arXiv:2511.02818(2025)

work page arXiv 2025

[11] [11]

Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. 2000. LOF: identifying density-based local outliers. InProceedings of the 2000 ACM SIGMOD international conference on Management of data. 93–104

2000

[12] [12]

Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. Inproceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299–6308

2017

[13] [13]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794

2016

[14] [14]

Yingxian Chen, Zhengzhe Liu, Baoheng Zhang, Wilton Fok, Xiaojuan Qi, and Yik-Chung Wu. 2023. Mgfn: Magnitude-contrastive glance-and-focus network for weakly-supervised video anomaly detection. InProceedings of the AAAI conference on artificial intelligence, Vol. 37. 387–395

2023

[15] [15]

Choubo Ding, Guansong Pang, and Chunhua Shen. 2022. Catching both gray and black swans: Open-set supervised anomaly detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7388–7398

2022

[16] [16]

Yutao Dong, Qing Li, Richard O Sinnott, Yong Jiang, and Shutao Xia. 2021. ISP self-operated BGP anomaly detection based on weakly supervised learning. In 2021 IEEE 29th International Conference on Network Protocols (ICNP). IEEE, 1–11

2021

[17] [17]

Marius Dragoi, Elena Burceanu, Emanuela Haller, Andrei Manolache, and Florin Brad. 2022. Anoshift: A distribution shift benchmark for unsupervised anomaly detection.Advances in Neural Information Processing Systems35 (2022), 32854– 32867

2022

[18] [18]

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. Slow- fast networks for video recognition. InProceedings of the IEEE/CVF international conference on computer vision. 6202–6211

2019

[19] [19]

Adam Goodge, Bryan Hooi, See-Kiong Ng, and Wee Siong Ng. 2022. Lunar: Uni- fying local outlier detection methods via graph neural networks. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 6737–6745

2022

[20] [20]

Yury Gorishniy, Akim Kotelnikov, and Artem Babenko. 2025. Tabm: Advanc- ing tabular deep learning with parameter-efficient ensembling. InInternational Conference on Learning Representations, Vol. 2025. 77899–77935

2025

[21] [21]

Yury Gorishniy, Ivan Rubachev, Nikolay Kartashev, Daniil Shlenskii, Akim Kotel- nikov, and Artem Babenko. 2024. Tabr: Tabular deep learning meets nearest neighbors. InInternational Conference on Learning Representations, Vol. 2024. 18209–18249

2024

[22] [22]

Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, and Artem Babenko. 2021. Revisiting deep learning models for tabular data.Advances in neural information processing systems34 (2021), 18932–18943

2021

[23] [23]

Sachin Goyal, Aditi Raghunathan, Moksh Jain, Harsha Vardhan Simhadri, and Prateek Jain. 2020. DROCC: Deep robust one-class classification. InInternational conference on machine learning. PMLR, 3711–3721

2020

[24] [24]

Léo Grinsztajn, Klemens Flöge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Benjamin Jäger, Dominik Safaric, Simone Alessi, Adrian Hayler, et al. 2025. Tabpfn-2.5: Advancing the state of the art in tabular foundation models.arXiv preprint arXiv:2511.08667(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[25] [25]

Weinberger

Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. On Calibration of Modern Neural Networks. InProceedings of the 34th International Conference on Machine Learning. PMLR, 1321–1330

2017

[26] [26]

Songqiao Han, Xiyang Hu, Hailiang Huang, Minqi Jiang, and Yue Zhao. 2022. ADBench: Anomaly Detection Benchmark. InNeurIPS

2022

[27] [27]

Zengyou He, Xiaofei Xu, and Shengchun Deng. 2003. Discovering cluster-based local outliers.Pattern recognition letters24, 9-10 (2003), 1641–1650

2003

[28] [28]

Tin Kam Ho and Mitra Basu. 2002. Complexity measures of supervised classifica- tion problems.IEEE transactions on pattern analysis and machine intelligence24, 3 (2002), 289–300

2002

[29] [29]

Xi Jiang, Jian Li, Hanqiu Deng, Yong Liu, Bin-Bin Gao, Yifeng Zhou, Jialin Li, Chengjie Wang, and Feng Zheng. 2025. Mmad: A comprehensive benchmark for multimodal large language models in industrial anomaly detection. InInterna- tional conference on learning representations. 87273–87295

2025

[30] [30]

Kingma and Max Welling

Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations

2014

[31] [31]

Elizaveta Levina and Peter Bickel. 2004. Maximum likelihood estimation of intrinsic dimension.Advances in neural information processing systems17 (2004)

2004

[32] [32]

Yuangang Li, Jiaqi Li, Zhuo Xiao, Tiankai Yang, Yi Nian, Xiyang Hu, and Yue Zhao. 2025. NLP-ADBench: NLP Anomaly Detection Benchmark. InFindings of the Association for Computational Linguistics: EMNLP 2025. 2464–2474. doi:10. 18653/v1/2025.findings-emnlp.133

2025

[33] [33]

Zhe Li, Chunhua Sun, et al. 2022. Dual-MGAN: An Efficient Approach for Semi- supervised Outlier Detection with Few Identified Anomalies.TKDD(2022)

2022

[34] [34]

Zheng Li, Yue Zhao, Xiyang Hu, Nicola Botta, Cezar Ionescu, and George H Chen. 2022. Ecod: Unsupervised outlier detection using empirical cumulative distribution functions.IEEE Transactions on Knowledge and Data Engineering35, 12 (2022), 12181–12193

2022

[35] [35]

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In2008 eighth ieee international conference on data mining. IEEE, 413–422

2008

[36] [36]

Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. 2018. Future frame prediction for anomaly detection–a new baseline. InProceedings of the IEEE conference on computer vision and pattern recognition. 6536–6545

2018

[37] [37]

Cresswell, Keyvan Golestan, Guangwei Yu, Anthony L

Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Alex Labach, Jesse C. Cresswell, Keyvan Golestan, Guangwei Yu, Anthony L. Caterini, and Maksims Volkovs. 2025. TabDPT: Scaling Tabular Foundation Models on Real Data. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

2025

[38] [38]

Mahdi Pakdaman Naeini, Gregory Cooper, and Milos Hauskrecht. 2015. Obtaining Well Calibrated Probabilities Using Bayesian Binning. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 29. KDD ’26, August 09–13, 2026, Jeju Island, Republic of Korea Xu Yao et al

2015

[39] [39]

Curtis Northcutt, Lu Jiang, and Isaac Chuang. 2021. Confident learning: Esti- mating uncertainty in dataset labels.Journal of Artificial Intelligence Research70 (2021), 1373–1411

2021

[40] [40]

Guansong Pang, Longbing Cao, Ling Chen, and Huan Liu. 2018. Learning rep- resentations of ultrahigh-dimensional data for random distance-based outlier detection. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2041–2050

2018

[41] [41]

Guansong Pang, Chunhua Shen, Longbing Cao, and Anton Van Den Hengel. 2021. Deep learning for anomaly detection: A review.ACM computing surveys (CSUR) 54, 2 (2021), 1–38

2021

[42] [42]

Guansong Pang, Chunhua Shen, Huidong Jin, and Anton Van Den Hengel. 2023. Deep weakly-supervised anomaly detection. InProceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining. 1795–1807

2023

[43] [43]

Guansong Pang, Chunhua Shen, and Anton Van Den Hengel. 2019. Deep anomaly detection with deviation networks. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 353–362

2019

[44] [44]

Lorenzo Perini, Vincent Vercruyssen, and Jesse Davis. 2023. Learning from positive and unlabeled multi-instance bags in anomaly detection. InProceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining. 1897–1906

2023

[45] [45]

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Doro- gush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical features.Advances in neural information processing systems31 (2018)

2018

[46] [46]

Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. 2026. TabICLv2: A better, faster, scalable, and open tabular foundation model. (2026)

2026

[47] [47]

Olivier Roy and Martin Vetterli. 2007. The effective rank: A measure of effective dimensionality. In2007 15th European signal processing conference. IEEE, 606–610

2007

[48] [48]

Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. 2018. Deep one-class classification. InInternational conference on machine learning. PMLR, 4393–4402

2018

[49] [49]

Vandermeulen, Nico Görnitz, Alexander Binder, Emmanuel Müller, Klaus-Robert Müller, and Marius Kloft

Lukas Ruff, Robert A. Vandermeulen, Nico Görnitz, Alexander Binder, Emmanuel Müller, Klaus-Robert Müller, and Marius Kloft. 2020. Deep Semi-Supervised Anomaly Detection. InInternational Conference on Learning Representations

2020

[50] [50]

Timur Sattarov, Marco Schreyer, and Damian Borth. 2025. Diffusion-Scheduled Denoising Autoencoders for Anomaly Detection in Tabular Data. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

2025

[51] [51]

Mei-Ling Shyu, Shu-Ching Chen, Kanoksri Sarinnapakorn, and LiWu Chang

[52] [52]

Technical report, Miami Univ Coral Gables Fl Dept of Electrical and Computer Engineering

A novel anomaly detection scheme based on principal component classifier. Technical report, Miami Univ Coral Gables Fl Dept of Electrical and Computer Engineering

[53] [53]

Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, and Jae-Gil Lee. 2022. Learning from noisy labels with deep neural networks: A survey.IEEE transactions on neural networks and learning systems34, 11 (2022), 8135–8153

2022

[54] [54]

Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-world anomaly de- tection in surveillance videos. InProceedings of the IEEE conference on computer vision and pattern recognition. 6479–6488

2018

[55] [55]

Bowen Tian, Qinliang Su, and Jian Yin. 2022. Anomaly Detection by Lever- aging Incomplete Anomalous Knowledge with Anomaly-Aware Bidirectional GANs. InProceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22. 2255–2261

2022

[56] [56]

Yu Tian, Guansong Pang, Yuanhong Chen, Rajvinder Singh, Johan W Verjans, and Gustavo Carneiro. 2021. Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. InProceedings of the IEEE/CVF international conference on computer vision. 4975–4986

2021

[57] [57]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri

[58] [58]

In Proceedings of the IEEE international conference on computer vision

Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE international conference on computer vision. 4489–4497

[59] [59]

Boyang Wan, Yuming Fang, Xue Xia, and Jiajie Mei. 2020. Weakly Supervised Video Anomaly Detection via Center-Guided Discriminative Learning. In2020 IEEE International Conference on Multimedia and Expo (ICME). 1–6. doi:10.1109/ ICME46284.2020.9102722

work page arXiv 2020

[60] [60]

Peng Wu, Jing Liu, Yujia Shi, Yujia Sun, Fangtao Shao, Zhaoyang Wu, and Zhiwei Yang. 2020. Not only look, but also listen: Learning multimodal violence detection under weak supervision. InEuropean conference on computer vision. Springer, 322–339

2020

[61] [61]

Peng Wu, Xuerong Zhou, Guansong Pang, Lingru Zhou, Qingsen Yan, Peng Wang, and Yanning Zhang. 2024. Vadclip: Adapting vision-language models for weakly supervised video anomaly detection. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 6074–6082

2024

[62] [62]

Feng Xiao and Jicong Fan. 2025. Text-ADBench: Text Anomaly Detection Bench- mark based on LLMs Embedding.arXiv preprint arXiv:2507.12295(2025)

work page arXiv 2025

[63] [63]

Hongzuo Xu, Yijie Wang, Guansong Pang, Songlei Jian, Ning Liu, and Yongjun Wang. 2023. RoSAS: Deep semi-supervised anomaly detection with contamination-resilient continuous supervision.Information Processing & Man- agement60, 5 (2023), 103459

2023

[64] [64]

Yajun Xu, Huan Hu, Chuwen Huang, Yibing Nan, Yuyao Liu, Kai Wang, Zhaox- iang Liu, and Shiguo Lian. 2025. TAD: A Large-Scale Benchmark for Traffic Accidents Detection From Video Surveillance.IEEE Access13 (2025), 2018–2033

2025

[65] [65]

Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenx- uan Peng, Haoqi Wang, Dilip Chen, Bo Li, Yiyou Sun, et al . 2022. OpenOOD: Benchmarking generalized out-of-distribution detection. InAdvances in Neural Information Processing Systems, Vol. 35. 32598–32611

2022

[66] [66]

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals

[67] [67]

Understanding deep learning (still) requires rethinking generalization. Commun. ACM64, 3 (2021), 107–115

2021

[68] [68]

Jieyu Zhang, Yue Yu, Yinghao Li, Yujing Wang, Yaming Yang, Mao Yang, and Alexander Ratner. 2021. WRENCH: A Comprehensive Benchmark for Weak Supervision. InProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks

2021

[69] [69]

Xiyuan Zhang et al. 2025. Mitra: Mixed Synthetic Priors for Enhancing Tabular Foundation Models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

2025

[70] [70]

Xingxuan Zhang, Gang Ren, Han Yu, Hao Yuan, Hui Wang, Jiansheng Li, Ji- ayun Wu, Lang Mo, Li Mao, Mingchao Hao, et al . 2025. Limix: Unleashing structured-data modeling capability for generalist intelligence.arXiv preprint arXiv:2509.03505(2025)

work page arXiv 2025

[71] [71]

Yue Zhao and Maciej K Hryniewicki. 2018. Xgbod: improving supervised outlier detection with unsupervised representation learning. In2018 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8

2018

[72] [72]

Yue Zhao, Zain Nasrullah, and Zheng Li. 2019. Pyod: A python toolbox for scalable outlier detection.Journal of machine learning research20, 96 (2019), 1–7

2019

[73] [73]

Yue Zhao, Guoqing Zheng, Subhabrata Mukherjee, Robert McCann, and Ahmed Awadallah. 2023. Admoe: Anomaly detection with mixture-of-experts from noisy labels. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 4937–4945

2023

[74] [74]

Jia-Xing Zhong, Nannan Li, Weijie Kong, Shan Liu, Thomas H Li, and Ge Li. 2019. Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1237–1246

2019

[75] [75]

Hang Zhou, Junqing Yu, and Wei Yang. 2023. Dual memory units with uncertainty regulation for weakly supervised video anomaly detection. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 3769–3777

2023

[76] [76]

Yingjie Zhou, Xucheng Song, Yanru Zhang, Fanxing Liu, Ce Zhu, and Lingqiao Liu. 2021. Feature encoding with autoencoders for weakly supervised anomaly detection.IEEE Transactions on Neural Networks and Learning Systems33, 6 (2021), 2454–2465

2021

[77] [77]

Zhi-Hua Zhou. 2018. A brief introduction to weakly supervised learning.National science review5, 1 (2018), 44–53

2018

[78] [78]

Spatial Spar- sity Penalty

Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Daeki Cho, and Haifeng Chen. 2018. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. InInternational conference on learning represen- tations. A Benchmark Details A.1 Dataset Summaries and Processing Specs We detail the diverse collection of datasets evaluated ...

work page arXiv 2018

[79] [79]

Table 24: The average ± standard deviation and ranking of AUCPR under different 𝑁𝑙𝑎 (=1, 3, 5, 10, 15, 20, 50) settings on tabular datasets

(p) TabR-S 0.5 0.25 0.1 0.05 0.01 FNR 0.5 0.25 0.1 0.05 0.01 FAR 0.2 0.4 0.6 0.8 1.0AUCPR 0.177 0.5 0.25 0.1 0.05 0.01 FNR 0.5 0.25 0.1 0.05 0.01 FAR 0.2 0.4 0.6 0.8 1.0AUCPR 0.277 0.5 0.25 0.1 0.05 0.01 FNR 0.5 0.25 0.1 0.05 0.01 FAR 0.2 0.4 0.6 0.8 1.0AUCPR 0.144 0.5 0.25 0.1 0.05 0.01 FNR 0.5 0.25 0.1 0.05 0.01 FAR 0.2 0.4 0.6 0.8 1.0AUCPR 0.141 (q) Ta...

2026