arxiv: 2604.08301 · v1 · submitted 2026-04-09 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

GroundingAnomaly: Spatially-Grounded Diffusion for Few-Shot Anomaly Synthesis

Yishen Liu , Hongcang Chen , Pengcheng Zhao , Yunfan Bao , Yuxi Tian , Jieming Zhang , Hao Chen , Zheng Zhi

show 3 more authors

Yongchun Liu Ying Li Dongpu Cao

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:23 UTC · model grok-4.3

classification 💻 cs.CV

keywords anomaly synthesisdiffusion modelsfew-shot generationspatial conditioninganomaly detectionindustrial inspectionsemantic mapsfrozen U-Net

0 comments

The pith

A new diffusion framework generates high-quality anomalies from few examples by conditioning on per-pixel semantic maps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Scarcity of real anomalous samples limits the effectiveness of visual anomaly inspection in industrial quality control. Existing synthesis approaches often fail to blend anomalies naturally or supply accurate location masks. GroundingAnomaly introduces a Spatial Conditioning Module that supplies per-pixel semantic maps for exact placement of anomalies and pairs it with a Gated Self-Attention Module that adapts a frozen U-Net for few-shot learning while retaining its original priors. Evaluations on MVTec AD and VisA show the generated anomalies improve results on anomaly detection, segmentation, and instance-level detection tasks.

Core claim

GroundingAnomaly is a few-shot anomaly image generation framework that employs a Spatial Conditioning Module leveraging per-pixel semantic maps to enable precise spatial control over synthesized anomalies and a Gated Self-Attention Module to inject conditioning tokens into a frozen U-Net, thereby preserving pretrained priors while ensuring stable adaptation and producing high-quality anomalies that yield state-of-the-art performance on downstream inspection tasks.

What carries the argument

The Spatial Conditioning Module that uses per-pixel semantic maps for location-specific anomaly placement, combined with the Gated Self-Attention Module that injects conditioning tokens into a frozen diffusion U-Net.

If this is right

The generated anomalies integrate naturally into normal images and come with accurate masks.
Few-shot adaptation succeeds while the underlying U-Net priors remain intact.
State-of-the-art results appear across anomaly detection, segmentation, and instance-level detection on MVTec AD and VisA.
The approach enlarges training sets for industrial inspection without requiring large numbers of real anomalies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same spatial-grounding pattern could be applied to other data-scarce image-editing tasks such as defect repair or object insertion.
If the gated adaptation proves robust across domains, it offers a general recipe for conditioning large pretrained diffusion models without full fine-tuning.
Accurate synthetic masks produced alongside the images may reduce the annotation burden for training segmentation models in manufacturing.

Load-bearing premise

The Spatial Conditioning Module and Gated Self-Attention Module together deliver precise spatial control and stable few-shot adaptation without introducing artifacts or mode collapse that would harm downstream task performance.

What would settle it

Downstream anomaly detectors trained on GroundingAnomaly outputs perform no better than those trained on outputs from prior synthesis methods when tested on the MVTec AD dataset.

Figures

Figures reproduced from arXiv: 2604.08301 by Dongpu Cao, Hao Chen, Hongcang Chen, Jieming Zhang, Pengcheng Zhao, Ying Li, Yishen Liu, Yongchun Liu, Yunfan Bao, Yuxi Tian, Zheng Zhi.

**Figure 1.** Figure 1: Anomaly Generation methods inpaint anomalies onto normal images; Anomaly Image Generation methods jointly generate anomalies with products and predict masks after generation; our GroundingAnomaly grounds anomalies with semantic maps and generate the whole anomalous images. These limitations motivate us to develop a diffusion framework that provides precise spatial grounding for anomaly image generation w… view at source ↗

**Figure 2.** Figure 2: Proposed framework of GroundingAnomaly: (i) Spatial Conditioning Module that encodes a pixel-wise semantic map fuses with disentangled product and anomaly tokens; (ii) Gated Self-Attention Module, which injects the spatial conditioning into a frozen U-Net. (iii) The framework is trained on mixed batches of normal and anomalous images to leverage cross-domain appearance priors, and it generates diverse, hi… view at source ↗

**Figure 3.** Figure 3: (i) Spatial Conditioning Module; (ii) Gated Self-Attention Module. 3.3 Gated Self-Attention Module Few-shot anomaly synthesis poses a trade-off: prior methods either freeze the U-Net and train only textual embeddings [16], which limits adaptation capacity, or fine-tune the U-Net [18,40], which risks overwriting pretrained knowledge. To [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Left: Comparison on the generation results on MVTec AD and VisA. Cable, grid and leather to MVTec AD and candle and macaroni2 belong to VisA. Our method generates high-quality anomaly images that are accurately grounded with masks. Right: Visualization of U-Net segmentation results on MVTec AD and VisA. The detection model trained on GroundingAnomaly-generated data exhibits more robust performance. 4.3 A… view at source ↗

**Figure 5.** Figure 5: Left: Qualitative analysis of SFF. The red outline denotes the ground-truth mask. While SeaS [40] suffers from spatial misalignment, our model without SFF (w/o SFF) achieves accurate bounding. Our full model (Ours) leverages SFF to achieve both precise spatial grounding and high-fidelity anomaly synthesis. Right: Unseen anomaly generation results on MVTec AD and VisA. GroundingAnomaly is trained on anomalo… view at source ↗

**Figure 6.** Figure 6: Multi-anomaly generation result of GroundingAnomaly. Multi-Class anomaly generation. A key advantage of GroundingAnomaly is its ability to synthesize multiple, diverse defects of different classes within a single image via grounding anomalies with a semantic map, rather than treating defect combinations as a single rigid combined category. The semantic map is constructed by S=S_1+S_2 , where S_1 and S_2… view at source ↗

**Figure 7.** Figure 7: Illustration of ablations on GSM. (i) Gated Cross-Attention Module; (ii) Discarding Visual Tokens; (iii) Ungated Self-Attention Module; (iv) Ours. Ablation on spatial conditioning. To validate the spatial grounding paradigm of GroundingAnomaly, we conduct a comprehensive ablation study comparing our architecture with controllable diffusion baselines ControlNet [39] and GLIGEN [23] using two conditioning … view at source ↗

**Figure 8.** Figure 8: Qualitative comparison of spatial conditioning. GroundingAnomaly achieves the highest fidelity and conditioning accuracy. Futhermore, models conditioned on binary masks, by contrast, often produce anomalies that are less semantically similar to the expected anomaly class [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

**Figure 9.** Figure 9: Examples of 1-shot and 2-shot generated images [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: P-AUROC of U-Nets trained by synthesized data. Analysis on data scaling. Previous experiments used 1,000 image–mask pairs per anomaly type. To evaluate the effect of scaling the synthesized training set, we generate 500, 1000, 2000, 3000, 4000 images per anomaly type for each synthesis method and train U-Nets on MVTec AD. We report the pixel-level AUROC (P-AUROC) in [PITH_FULL_IMAGE:figures/full_fig_p… view at source ↗

**Figure 11.** Figure 11: Examples of anomaly generation on MVTec AD [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗

**Figure 12.** Figure 12: Examples of anomaly generation on MVTec AD [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗

**Figure 13.** Figure 13: Examples of anomaly generation on MVTec AD [PITH_FULL_IMAGE:figures/full_fig_p027_13.png] view at source ↗

**Figure 14.** Figure 14: Examples of anomaly generation on VisA [PITH_FULL_IMAGE:figures/full_fig_p028_14.png] view at source ↗

**Figure 15.** Figure 15: Examples of anomaly generation on VisA [PITH_FULL_IMAGE:figures/full_fig_p029_15.png] view at source ↗

read the original abstract

The performance of visual anomaly inspection in industrial quality control is often constrained by the scarcity of real anomalous samples. Consequently, anomaly synthesis techniques have been developed to enlarge training sets and enhance downstream inspection. However, existing methods either suffer from poor integration caused by inpainting or fail to provide accurate masks. To address these limitations, we propose GroundingAnomaly, a novel few-shot anomaly image generation framework. Our framework introduces a Spatial Conditioning Module that leverages per-pixel semantic maps to enable precise spatial control over the synthesized anomalies. Furthermore, a Gated Self-Attention Module is designed to inject conditioning tokens into a frozen U-Net via gated attention layers. This carefully preserves pretrained priors while ensuring stable few-shot adaptation. Extensive evaluations on the MVTec AD and VisA datasets demonstrate that GroundingAnomaly generates high-quality anomalies and achieves state-of-the-art performance across multiple downstream tasks, including anomaly detection, segmentation, and instance-level detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GroundingAnomaly adds per-pixel semantic conditioning and gated attention to diffusion for few-shot anomaly synthesis, with internally consistent results on MVTec and VisA.

read the letter

This paper gives a concrete way to synthesize anomalies with spatial precision from just a few examples, using a frozen diffusion U-Net. The core additions are the Spatial Conditioning Module that feeds per-pixel semantic maps and the Gated Self-Attention Module that injects tokens without destroying the pretrained priors. Both the architecture description and the reported ablations line up with the claim that this produces usable anomalies and lifts downstream detection, segmentation, and instance detection scores to SOTA on the two standard benchmarks. That is the main new piece: controlled placement of defects while staying few-shot and stable. The work does a solid job on the practical side. Industrial anomaly detection really does suffer from scarce real defects, and generating both images and masks that integrate cleanly is a direct response to that bottleneck. The method avoids the usual inpainting artifacts mentioned in the abstract and keeps the base model frozen, which is a sensible engineering choice. The stress-test note confirms no hidden data leakage or contradictory assumptions in the training protocol. Soft spots are modest. All the numbers come from MVTec AD and VisA, so the gains could shrink on noisier or less curated factory data. The semantic maps themselves require some form of prior labeling or generation, which adds a step that pure image-only methods skip. No evidence is given on compute cost or how sensitive the gated layers are to the choice of conditioning strength. This is for researchers who build or improve inspection pipelines and for CV groups that work on conditional diffusion. It is not a broad theoretical advance, but the empirical support and clean design make it worth a referee's time. I would send it to peer review.

Referee Report

0 major / 3 minor

Summary. The paper proposes GroundingAnomaly, a few-shot anomaly synthesis framework based on diffusion models. It introduces a Spatial Conditioning Module that uses per-pixel semantic maps to achieve precise spatial control over generated anomalies, along with a Gated Self-Attention Module that injects conditioning tokens into a frozen U-Net to enable stable adaptation while preserving pretrained priors. Extensive experiments on MVTec AD and VisA datasets are reported to show high-quality anomaly generation and state-of-the-art results on downstream tasks including anomaly detection, segmentation, and instance-level detection.

Significance. If the results hold, this approach could meaningfully advance few-shot anomaly synthesis for industrial inspection by addressing limitations of inpainting-based methods and providing accurate masks through spatial grounding. The design choice of freezing the U-Net and using gated attention for adaptation is a strength, as it supports stable few-shot learning without mode collapse or loss of generative quality, potentially benefiting other conditional diffusion applications.

minor comments (3)

§3.1: The description of how per-pixel semantic maps are derived from the few-shot normal and anomalous examples should be expanded with a concrete example or pseudocode to clarify the input preparation pipeline.
Figure 3: The visualization of synthesized anomalies would be clearer if the corresponding ground-truth masks were shown side-by-side for direct comparison of spatial accuracy.
§4.1: The training protocol mentions 'few-shot' settings but does not specify the exact number of shots used in the main experiments; this detail should be stated explicitly in the experimental setup.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the thorough and positive review of our manuscript on GroundingAnomaly. We are encouraged by the recognition of our framework's contributions to few-shot anomaly synthesis via spatial grounding and gated adaptation in diffusion models, as well as the potential impact on industrial inspection tasks. The recommendation for minor revision is noted.

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper describes an architectural framework (Spatial Conditioning Module + Gated Self-Attention Module) inserted into a frozen U-Net for few-shot anomaly synthesis. No equations, predictions, or first-principles derivations are presented that reduce to fitted parameters or self-definitions by construction. Performance claims rest on external benchmarks (MVTec AD, VisA) rather than internal fits renamed as predictions. No self-citation chains, uniqueness theorems, or ansatz smuggling are load-bearing in the provided abstract and method summary. The derivation is self-contained as a set of engineering choices evaluated downstream, consistent with a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no equations, hyperparameters, or explicit assumptions; free parameters, axioms, and invented entities cannot be enumerated.

pith-pipeline@v0.9.0 · 5493 in / 1097 out tokens · 74185 ms · 2026-05-10T18:23:34.909711+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our framework introduces a Spatial Conditioning Module that leverages per-pixel semantic maps... Gated Self-Attention Module... frozen U-Net via gated attention layers.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Extensive evaluations on the MVTec AD and VisA datasets demonstrate... state-of-the-art performance

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 10 canonical work pages · 3 internal anchors

[1]

Mvtec AD - A comprehensive real-world dataset for unsupervised anomaly detection

Bergmann, P., Fauser, M., Sattlegger, D., Steger, C.: Mvtec ad — a comprehen- sive real-world dataset for unsupervised anomaly detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 9584–9592 (2019).https://doi.org/10.1109/CVPR.2019.00982

work page doi:10.1109/cvpr.2019.00982 2019
[2]

arXiv preprint arXiv:2401.16402 (2024)

Cao, Y., Xu, X., Zhang, J., Cheng, Y., Huang, X., Pang, G., Shen, W.: A survey on visual anomaly detection: Challenge, approach, and prospect. arXiv preprint arXiv:2401.16402 (2024)

work page arXiv 2024
[3]

In: European conference on computer vision

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End- to-end object detection with transformers. In: European conference on computer vision. pp. 213–229. Springer (2020)

2020
[4]

In: European Conference on Computer Vision

Chen, Q., Luo, H., Lv, C., Zhang, Z.: A unified anomaly synthesis strategy with gradient ascent for industrial anomaly detection and localization. In: European Conference on Computer Vision. pp. 37–54. Springer (2024)

2024
[5]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Deng, H., Li, X.: Anomaly detection via reverse distillation from one-class embed- ding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9737–9746 (2022)

2022
[6]

In: AAAI (2023)

Duan, Y., Hong, Y., Niu, L., Zhang, L.: Few-shot defect image generation via defect-aware feature manipulation. In: AAAI (2023)

2023
[7]

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Gal, R., Alaluf, Y., Atzmon, Y., Patashnik, O., Bermano, A.H., Chechik, G., Cohen-Or, D.: An image is worth one word: Personalizing text-to-image gener- ation using textual inversion. arXiv preprint arXiv:2208.01618 (2022)

work page internal anchor Pith review arXiv 2022
[8]

Advances in neural in- formation processing systems27(2014)

Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Advances in neural in- formation processing systems27(2014)

2014
[9]

Gui, G., Gao, B.B., Liu, J., Wang, C., Wu, Y.: Few-shot anomaly-driven generation foranomalyclassificationandsegmentation.In:EuropeanConferenceonComputer Vision. pp. 210–226. Springer (2024)

2024
[10]

arXiv preprint arXiv:2510.17611 (2025)

Guo,J.,Lu,S.,Fan,L.,Li,Z.,Di,D.,Song,Y.,Zhang,W.,Zhu,W.,Yan,H.,Chen, F., et al.: One dinomaly2 detect them all: A unified framework for full-spectrum unsupervised anomaly detection. arXiv preprint arXiv:2510.17611 (2025)

work page arXiv 2025
[11]

Advances in Neural Information Processing Systems37, 71162– 71187 (2024)

He, H., Bai, Y., Zhang, J., He, Q., Chen, H., Gan, Z., Wang, C., Li, X., Tian, G., Xie, L.: Mambaad: Exploring state space models for multi-class unsupervised anomaly detection. Advances in Neural Information Processing Systems37, 71162– 71187 (2024)

2024
[12]

He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

2016
[13]

The mvtec ad 2 dataset: Advanced scenarios for unsupervised anomaly detection.arXiv preprint arXiv:2503.21622,

Heckler-Kram, L., Neudeck, J.H., Scheler, U., König, R., Steger, C.: The mvtec ad 2 dataset: Advanced scenarios for unsupervised anomaly detection. arXiv preprint arXiv:2503.21622 (2025)

work page arXiv 2025
[14]

Advances in neural information processing systems33, 6840–6851 (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)

2020
[15]

In: International Con- ference on Learning Representations (2022),https://openreview.net/forum?id= nZeVKeeFYf9

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: LoRA: Low-rank adaptation of large language models. In: International Con- ference on Learning Representations (2022),https://openreview.net/forum?id= nZeVKeeFYf9

2022
[16]

Liu et al

Hu, T., Zhang, J., Yi, R., Du, Y., Chen, X., Liu, L., Wang, Y., Wang, C.: Anomaly- diffusion:Few-shotanomalyimagegenerationwithdiffusionmodel.In:Proceedings of the AAAI Conference on Artificial Intelligence (2024) 16 Y. Liu et al

2024
[17]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Jeong, J., Zou, Y., Kim, T., Zhang, D., Ravichandran, A., Dabeer, O.: Winclip: Zero-/few-shot anomaly classification and segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 19606–19616 (June 2023)

2023
[18]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Jin,Y.,Peng,J.,He,Q.,Hu,T.,Wu,J.,Chen,H.,Wang,H.,Zhu,W.,Chi,M.,Liu, J., et al.: Dual-interrelated diffusion model for few-shot anomaly image generation. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 30420–30429 (2025)

2025
[19]

https://github.com/ultralytics/yolov5(Oct 2020).https://doi.org/10

Jocher, G.: ultralytics/yolov5: v3.1 - bug fixes and performance improvements. https://github.com/ultralytics/yolov5(Oct 2020).https://doi.org/10. 5281/zenodo.4154370,https://doi.org/10.5281/zenodo.4154370

work page doi:10.5281/zenodo.4154370 2020
[20]

In: Proc

Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. In: Proc. NeurIPS (2020)

2020
[21]

Auto-Encoding Variational Bayes

Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[22]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Li, C.L., Sohn, K., Yoon, J., Pfister, T.: Cutpaste: Self-supervised learning for anomaly detection and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9664–9674 (2021)

2021
[23]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Li,Y.,Liu,H.,Wu,Q.,Mu,F.,Yang,J.,Gao,J.,Li,C.,Lee,Y.J.:Gligen:Open-set grounded text-to-image generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 22511–22521 (2023)

2023
[24]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

2022
[25]

In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S

Lu, R., Wu, Y., Tian, L., Wang, D., Chen, B., Liu, X., Hu, R.: Hierarchical vector quantized transformer for multi-class unsupervised anomaly detection. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems. vol. 36, pp. 8487–8500. Curran Associates, Inc. (2023),https://proceedin...

2023
[26]

In: Proceedings of the Computer Vision and Pattern Recognition Con- ference (CVPR)

Luo, W., Cao, Y., Yao, H., Zhang, X., Lou, J., Cheng, Y., Shen, W., Yu, W.: Exploring intrinsic normal prototypes within a single image for universal anomaly detection. In: Proceedings of the Computer Vision and Pattern Recognition Con- ference (CVPR). pp. 9974–9983 (June 2025)

2025
[27]

Advances in neural information processing systems28(2015)

Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object de- tection with region proposal networks. Advances in neural information processing systems28(2015)

2015
[28]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022)

2022
[29]

In: International Conference on Medical image computing and computer-assisted intervention

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedi- cal image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. pp. 234–241. Springer (2015)

2015
[30]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Roth, K., Pemula, L., Zepeda, J., Schölkopf, B., Brox, T., Gehler, P.: Towards total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 14318– 14328 (June 2022)

2022
[31]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Roth, K., Pemula, L., Zepeda, J., Schölkopf, B., Brox, T., Gehler, P.: Towards total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14318–14328 (2022) GroundingAnomaly 17

2022
[32]

Denoising Diffusion Implicit Models

Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[33]

In: Proceedings of the European conference on computer vision (ECCV)

Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Proceedings of the European conference on computer vision (ECCV). pp. 418–434 (2018)

2018
[34]

Advances in neural information processing systems34, 12077–12090 (2021)

Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in neural information processing systems34, 12077–12090 (2021)

2021
[35]

In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A

You, Z., Cui, L., Shen, Y., Yang, K., Lu, X., Zheng, Y., Le, X.: A uni- fied model for multi-class anomaly detection. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural In- formation Processing Systems. vol. 35, pp. 4571–4584. Curran Associates, Inc. (2022),https : / / proceedings . neurips . cc / paper _ fil...

2022
[36]

International journal of computer vision129(11), 3051–3068 (2021)

Yu, C., Gao, C., Wang, J., Yu, G., Shen, C., Sang, N.: Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. International journal of computer vision129(11), 3051–3068 (2021)

2021
[37]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Zavrtanik, V., Kristan, M., Skočaj, D.: Draem - a discriminatively trained re- construction embedding for surface anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 8330–8339 (October 2021)

2021
[38]

Exploring plain vit reconstruction for multi-class unsupervised anomaly detection.arXiv preprint arXiv:2312.07495, 2023

Zhang, J., Chen, X., Wang, Y., Wang, C., Liu, Y., Li, X., Yang, M.H., Tao, D.: Exploring plain vit reconstruction for multi-class unsupervised anomaly detection. arXiv preprint arXiv:2312.07495 (2023)

work page arXiv 2023
[39]

In: Proceedings of the IEEE/CVF international conference on computer vision

Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 3836–3847 (2023)

2023
[40]

In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (2025)

Zhewei,D.,Shilei,Z.,Haotian,L.,Xurui,L.,Feng,X.,Yu,Z.:Seas:Few-shotindus- trial anomaly image generation with separation and sharing fine-tuning. In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (2025)

2025
[41]

In: The Twelfth International Conference on Learning Representations (2023)

Zhou, Q., Pang, G., Tian, Y., He, S., Chen, J.: Anomalyclip: Object-agnostic prompt learning for zero-shot anomaly detection. In: The Twelfth International Conference on Learning Representations (2023)

2023
[42]

In: European conference on computer vision

Zou, Y., Jeong, J., Pemula, L., Zhang, D., Dabeer, O.: Spot-the-difference self- supervised pre-training for anomaly detection and segmentation. In: European conference on computer vision. pp. 392–408. Springer (2022) GroundingAnomaly 1 A Appendix This supplementary material consists of: –More Implementation Details (Sec. A.1). –More Ablation Studies & An...

work page arXiv 2022