arxiv: 2604.02905 · v1 · submitted 2026-04-03 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

UniSpector: Towards Universal Open-set Defect Recognition via Spectral-Contrastive Visual Prompting

Geonuk Kim , Minhoi Kim , Kangil Lee , Minsu Kim , Hyeonseong Jeon , Jeonghoon Han , Hyoungjoon Lim , Junho Yim

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:17 UTC · model grok-4.3

classification 💻 cs.CV

keywords open-set defect recognitionvisual promptingcontrastive learningindustrial inspectionanomaly detectionangular manifoldspatial-spectral encodingretraining-free learning

0 comments

The pith

UniSpector structures visual prompts into a semantically organized angular manifold to detect novel industrial defects without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that visual prompting can scale to open-set defect recognition if the prompt space is deliberately organized rather than matched naively to image regions. Existing methods collapse under high intra-class variance and subtle inter-class differences in defect images. UniSpector counters this with a Spatial-Spectral Prompt Encoder that produces orientation-invariant fine-grained features and a Contrastive Prompt Encoder that regularizes those features into an angular manifold. Prompt-guided Query Selection then aligns object queries to the structured prompts. On the new Inspect Anything benchmark the approach raises AP50b and AP50m by at least 19.7 and 15.8 points over baselines while remaining retraining-free.

Core claim

UniSpector shifts visual prompting from direct region matching to the design of a transferable prompt topology. The Spatial-Spectral Prompt Encoder extracts orientation-invariant representations; the Contrastive Prompt Encoder explicitly arranges these representations on a semantically organized angular manifold; and Prompt-guided Query Selection produces adaptive queries aligned with that manifold. The resulting system performs open-set defect localization on the Inspect Anything benchmark at substantially higher AP50b and AP50m than prior prompting baselines.

What carries the argument

The Spatial-Spectral Prompt Encoder paired with the Contrastive Prompt Encoder, which together prevent embedding collapse and enforce a semantically organized angular manifold for prompts.

If this is right

Industrial inspection systems can add new defect classes by updating prompts alone rather than retraining entire models.
Localization accuracy improves by at least 19 percent on open-set benchmarks without sacrificing closed-set performance.
The same prompt topology design can be reused across multiple inspection sites or product lines.
Prompt-based pipelines become viable for continuously evolving manufacturing environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The angular-manifold construction may transfer to other prompt-based open-set tasks such as medical imaging or remote sensing if similar variance issues appear.
Explicit contrastive regularization of prompt space could become a standard module in future visual-prompting architectures beyond defect detection.
Real-time factory deployment would require measuring whether the manifold remains stable under lighting changes or camera drift not present in the benchmark.

Load-bearing premise

That spatial-spectral encoding plus contrastive regularization can still produce distinct angular clusters when defect images contain high intra-class variation and only subtle inter-class differences.

What would settle it

A controlled test set of defect images engineered with greater intra-class variance and finer inter-class distinctions than those in Inspect Anything, on which prompt embeddings are measured for collapse or loss of semantic separation.

Figures

Figures reproduced from arXiv: 2604.02905 by Geonuk Kim, Hyeonseong Jeon, Hyoungjoon Lim, Jeonghoon Han, Junho Yim, Kangil Lee, Minhoi Kim, Minsu Kim.

**Figure 2.** Figure 2: Examples from the InsA benchmark. Top: Samples from the same defect class showing high intra-class appearance variance. Bottom: Samples from different classes exhibiting similar visual patterns, resulting in low inter-class separability. Such ambiguities highlight the inherent difficulty of defect recognition. 2. Related Works 2.1. Traditional defect detection and segmentation Modern industrial defect in… view at source ↗

**Figure 3.** Figure 3: Overview of UniSpector, an open-set defect detection and segmentation framework. The Spatial–Spectral Prompt Encoder extracts orientation-invariant spectral cues fused with spatial features to distinguish visually similar defects. Building on these, Contrastive Prompt Encoding regularizes the prompt embedding space into a structured manifold for robust open-set generalization. A Prompt-guided Query Select… view at source ↗

**Figure 4.** Figure 4: 3D PCA projection of L2-normalized prompt embeddings learned by [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Intra-class cosine similarity comparison across seen and [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Effect of the number of prompt samples per defect class. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Distribution of Prompt-to-Target Ratios across Defect Classes. It illustrates the prompt-to-target ratio for each defect class (with [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: UniSpector is capable of recognizing unseen defects via visual prompts. (a) Orange box: user-specified prompt region. (b) Blue box: corresponding ground-truth in the target image. (c) Green boxes: correct predictions by UniSpector, accurately localizing subtle defects. (d) Red boxes: DINOv predictions, showing failure to localize the prompted defect [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: Detailed view of the inference phase. Unlike the training [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: Robustness against prompt annotation (averaged over [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

read the original abstract

Although industrial inspection systems should be capable of recognizing unprecedented defects, most existing approaches operate under a closed-set assumption, which prevents them from detecting novel anomalies. While visual prompting offers a scalable alternative for industrial inspection, existing methods often suffer from prompt embedding collapse due to high intra-class variance and subtle inter-class differences. To resolve this, we propose UniSpector, which shifts the focus from naive prompt-to-region matching to the principled design of a semantically structured and transferable prompt topology. UniSpector employs the Spatial-Spectral Prompt Encoder to extract orientation-invariant, fine-grained representations; these serve as a solid basis for the Contrastive Prompt Encoder to explicitly regularize the prompt space into a semantically organized angular manifold. Additionally, Prompt-guided Query Selection generates adaptive object queries aligned with the prompt. We introduce Inspect Anything, the first benchmark for visual-prompt-based open-set defect localization, where UniSpector significantly outperforms baselines by at least 19.7% and 15.8% in AP50b and AP50m, respectively. These results show that our method enable a scalable, retraining-free inspection paradigm for continuously evolving industrial environments, while offering critical insights into the design of generic visual prompting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UniSpector adds a new Inspect Anything benchmark and a spatial-spectral plus contrastive prompt setup for open-set defect localization, but the core claim about a stable angular manifold lacks direct supporting measurements.

read the letter

The paper's main contribution is the Inspect Anything benchmark for visual-prompt-based open-set defect localization plus the UniSpector architecture that pairs a Spatial-Spectral Prompt Encoder with a Contrastive Prompt Encoder and adds Prompt-guided Query Selection. This targets a practical gap where closed-set industrial models cannot handle novel defects without retraining. The framing is straightforward and the claimed gains of at least 19.7% AP50b and 15.8% AP50m over baselines on the new benchmark are the clearest quantitative signal. The two-stage prompt design and the query selection step are concrete engineering choices that extend standard visual prompting techniques to this domain. The work is new in the specific combination of spectral features, contrastive regularization on prompts, and the benchmark itself. The paper does a reasonable job laying out the motivation and component descriptions. The stress-test concern is fair: the abstract and available details supply no intra-class versus inter-class cosine similarity numbers, angular separation stats, or embedding visualizations to confirm that the contrastive step actually prevents collapse and produces a semantically organized manifold under high intra-class variance. Without those checks the performance lift could trace to the query selection module or to benchmark tuning rather than the claimed topology. The full manuscript may contain ablations and implementation details that address this, but the current evidence leaves the load-bearing assumption under-supported. This paper is aimed at researchers working on open-set recognition and industrial vision systems. It is coherent enough on its own terms to merit a serious referee who can check the experiments, baselines, and any additional manifold diagnostics. I would send it to peer review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The paper proposes UniSpector for universal open-set defect recognition in industrial inspection via visual prompting. It introduces a Spatial-Spectral Prompt Encoder to extract orientation-invariant fine-grained features and a Contrastive Prompt Encoder to regularize the prompt space into a semantically organized angular manifold that resists collapse under high intra-class variance. A Prompt-guided Query Selection module generates adaptive queries, and the method is evaluated on a newly introduced Inspect Anything benchmark for visual-prompt-based open-set defect localization, where it reports gains of at least 19.7% AP50b and 15.8% AP50m over baselines, supporting a retraining-free inspection paradigm.

Significance. If the central claims hold, the work could meaningfully advance scalable, open-set industrial inspection by addressing prompt collapse in visual prompting and enabling detection of novel defects without retraining. The introduction of the Inspect Anything benchmark is a constructive contribution that could facilitate future research in prompt-based open-set localization.

major comments (2)

[Method (Contrastive Prompt Encoder) and Experiments] The manuscript attributes the reported performance gains to the Contrastive Prompt Encoder creating a semantically organized angular manifold that prevents embedding collapse, yet provides no quantitative verification of this property (e.g., intra-class vs. inter-class cosine similarity statistics, angular separation metrics, or embedding visualizations) in the prompt space. This is load-bearing for the central claim, as the abstract and method description leave open the possibility that gains arise instead from the Prompt-guided Query Selection or benchmark-specific factors.
[Experiments and Abstract] The abstract states significant outperformance on the Inspect Anything benchmark but the manuscript supplies insufficient experimental details on baseline implementations, ablation studies isolating each module's contribution, or controls for prompt collapse. Without these, the data-to-claim connection for the 19.7% AP50b and 15.8% AP50m improvements cannot be assessed.

minor comments (2)

[Abstract] Define AP50b and AP50m explicitly (e.g., average precision at IoU threshold 0.5 for bounding boxes and masks) at first use.
[Method] Clarify the exact loss formulation and temperature parameters in the contrastive regularization to allow reproduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We agree that the current manuscript would benefit from additional quantitative verification of the prompt space properties and expanded experimental details. We will revise the paper accordingly to strengthen the evidence for our claims.

read point-by-point responses

Referee: [Method (Contrastive Prompt Encoder) and Experiments] The manuscript attributes the reported performance gains to the Contrastive Prompt Encoder creating a semantically organized angular manifold that prevents embedding collapse, yet provides no quantitative verification of this property (e.g., intra-class vs. inter-class cosine similarity statistics, angular separation metrics, or embedding visualizations) in the prompt space. This is load-bearing for the central claim, as the abstract and method description leave open the possibility that gains arise instead from the Prompt-guided Query Selection or benchmark-specific factors.

Authors: We acknowledge that the manuscript currently relies on the architectural description and overall performance gains without providing direct quantitative metrics on the prompt embeddings. In the revised version, we will add intra-class versus inter-class cosine similarity statistics, angular separation metrics (such as mean angular distances), and embedding visualizations (e.g., t-SNE or PCA plots) of the prompt space both with and without the Contrastive Prompt Encoder. These additions will explicitly demonstrate the formation of the semantically organized angular manifold and help rule out alternative explanations for the gains. revision: yes
Referee: [Experiments and Abstract] The abstract states significant outperformance on the Inspect Anything benchmark but the manuscript supplies insufficient experimental details on baseline implementations, ablation studies isolating each module's contribution, or controls for prompt collapse. Without these, the data-to-claim connection for the 19.7% AP50b and 15.8% AP50m improvements cannot be assessed.

Authors: We agree that the experimental section needs to be expanded for full reproducibility and to isolate contributions. In the revision, we will include complete implementation details for all baselines (including any modifications for the open-set setting), full ablation tables breaking down the impact of the Spatial-Spectral Prompt Encoder, Contrastive Prompt Encoder, and Prompt-guided Query Selection individually, and targeted controls for prompt collapse (e.g., variants with and without contrastive regularization, along with collapse metrics such as embedding variance). These will directly link the reported improvements to the proposed components. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical evaluation of novel architecture

full rationale

The paper introduces Spatial-Spectral Prompt Encoder and Contrastive Prompt Encoder as new components to organize prompt embeddings into an angular manifold, then reports performance gains on the newly introduced Inspect Anything benchmark. No equations, derivations, or self-citations are shown that reduce the claimed AP50 improvements to quantities defined by fitted parameters, self-referential normalizations, or prior author work. The derivation chain is self-contained: the method is proposed, implemented, and measured against baselines without any step that renames a fit as a prediction or imports uniqueness via self-citation. This matches the expected non-finding for papers whose central contribution is architectural and benchmark-driven.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the approach rests on standard computer-vision assumptions about prompt embeddings and contrastive regularization; no explicit free parameters, new physical entities, or ad-hoc axioms are stated.

axioms (1)

domain assumption Visual prompting can serve as a scalable alternative to closed-set training for industrial defect recognition
The abstract positions visual prompting as the starting point and proposes modifications to it.

pith-pipeline@v0.9.0 · 5543 in / 1218 out tokens · 46303 ms · 2026-05-13T20:17:31.340006+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CPE explicitly regularizes the prompt embedding space to establish a semantically structured topology within an angular manifold through contrastive learning... angular-margin loss with margin m: LCPE = −1/N ∑ log[exp(α·cos(θyk,k+m)) / (exp(α·cos(θyk,k+m)) + ∑c≠yk exp(α·cos(θc,k)))]
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SSPE... radial frequency... Contrastive Prompt Encoder... semantically organized angular manifold

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 1 internal anchor

[1]

Vision datasets: A bench- mark for vision-based industrial inspection.arXiv preprint arXiv:2306.07890, 2023

Haoping Bai, Shancong Mou, Tatiana Likhomanenko, Ra- mazan Gokberk Cinbis, Oncel Tuzel, Ping Huang, Jiulong Shan, Jianjun Shi, and Meng Cao. Vision datasets: A bench- mark for vision-based industrial inspection.arXiv preprint arXiv:2306.07890, 2023. 6, 9, 10

work page arXiv 2023
[2]

Efficien- tad: Accurate visual anomaly detection at millisecond-level latencies

Kilian Batzner, Lars Heckler, and Rebecca K ¨onig. Efficien- tad: Accurate visual anomaly detection at millisecond-level latencies. InProceedings of the IEEE/CVF winter confer- ence on applications of computer vision, pages 128–138,

work page
[3]

Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection

Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9592–9600, 2019. 6, 9, 10

work page 2019
[4]

Masked-attention mask transformer for universal image segmentation

Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexan- der Kirillov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1290–1299, 2022. 8

work page 2022
[5]

Yolo-world: Real-time open-vocabulary object detection

Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xing- gang Wang, and Ying Shan. Yolo-world: Real-time open-vocabulary object detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16901–16911, 2024. 3, 5, 6

work page 2024
[6]

Padim: a patch distribution modeling framework for anomaly detection and localization

Thomas Defard, Aleksandr Setkov, Angelique Loesch, and Romaric Audigier. Padim: a patch distribution modeling framework for anomaly detection and localization. InInter- national conference on pattern recognition, pages 475–489. Springer, 2021. 1

work page 2021
[7]

Arcface: Additive angular margin loss for deep face recognition

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 4690–4699, 2019. 4

work page 2019
[8]

Lerenet: Eliminating intra-class differences for metal surface defect few-shot semantic segmentation.arXiv preprint arXiv:2403.11122, 2024

Hanze Ding, Zhangkai Wu, Jiyan Zhang, Ming Ping, and Yanfang Liu. Lerenet: Eliminating intra-class differences for metal surface defect few-shot semantic segmentation.arXiv preprint arXiv:2403.11122, 2024. 1, 3

work page arXiv 2024
[9]

Text-guided visual prompt dino for generic segmentation

Yuchen Guan, Chong Sun, Canmiao Fu, Zhipeng Huang, Chun Yuan, and Chen Li. Text-guided visual prompt dino for generic segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 21288– 21298, 2025. 3

work page 2025
[10]

Category rela- tionship enhancement transformer for industrial defect seg- mentation.Knowledge-Based Systems, 326:114059, 2025

Zican Hu, Jiaxiang Luo, and Zixiang Hong. Category rela- tionship enhancement transformer for industrial defect seg- mentation.Knowledge-Based Systems, 326:114059, 2025. 1, 3

work page 2025
[11]

Surface defect saliency of magnetic tile.The Visual Computer, 36(1):85–96,

Yibin Huang, Congying Qiu, and Kui Yuan. Surface defect saliency of magnetic tile.The Visual Computer, 36(1):85–96,

work page
[12]

T-rex2: Towards generic object detection via text-visual prompt synergy

Qing Jiang, Feng Li, Zhaoyang Zeng, Tianhe Ren, Shilong Liu, and Lei Zhang. T-rex2: Towards generic object detection via text-visual prompt synergy. InEuropean Conference on Computer Vision, pages 38–57. Springer, 2024. 1, 3, 4, 5, 6, 8

work page 2024
[13]

Cin- former: Transformer network with multi-stage cnn feature injection for surface defect segmentation.arXiv preprint arXiv:2309.12639, 2023

Xiaoheng Jiang, Kaiyi Guo, Yang Lu, Feng Yan, Hao Liu, Jiale Cao, Mingliang Xu, and Dacheng Tao. Cin- former: Transformer network with multi-stage cnn feature injection for surface defect segmentation.arXiv preprint arXiv:2309.12639, 2023. 2

work page arXiv 2023
[14]

Ultralytics yolo11, 2024

Glenn Jocher and Jing Qiu. Ultralytics yolo11, 2024. 12

work page 2024
[15]

Few-shot object detection via feature reweighting

Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, and Trevor Darrell. Few-shot object detection via feature reweighting. InProceedings of the IEEE/CVF international conference on computer vision, pages 8420–8429, 2019. 10

work page 2019
[16]

YOLOv11: An Overview of the Key Architectural Enhancements

Rahima Khanam and Muhammad Hussain. Yolov11: An overview of the key architectural enhancements.arXiv preprint arXiv:2410.17725, 2024. 8

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

Mask dino: Towards a unified transformer-based framework for object detection and segmentation

Feng Li, Hao Zhang, Huaizhe Xu, Shilong Liu, Lei Zhang, Lionel M Ni, and Heung-Yeung Shum. Mask dino: Towards a unified transformer-based framework for object detection and segmentation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 3041–3050, 2023. 8, 12

work page 2023
[18]

Visual in-context prompting

Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Jianwei Yang, Chunyuan Li, et al. Visual in-context prompting. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12861–12871, 2024. 1, 3, 4, 5, 6, 8

work page 2024
[19]

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. InEuro- pean conference on computer vision, pages 38–55. Springer,

work page
[20]

An adaptive image segmen- tation network for surface defect detection.IEEE Trans- actions on Neural Networks and Learning Systems, 35(6): 8510–8523, 2022

Taiheng Liu, Zhaoshui He, Zhijie Lin, Guang-Zhong Cao, Wenqing Su, and Shengli Xie. An adaptive image segmen- tation network for surface defect detection.IEEE Trans- actions on Neural Networks and Learning Systems, 35(6): 8510–8523, 2022. 1, 2

work page 2022
[21]

Exploring few-shot defect segmentation in general industrial scenarios with metric learning and vision foundation models.arXiv preprint arXiv:2502.01216, 2025

Tongkun Liu, Bing Li, Xiao Jin, Yupeng Shi, Qiuying Li, and Xiang Wei. Exploring few-shot defect segmentation in general industrial scenarios with metric learning and vision foundation models.arXiv preprint arXiv:2502.01216, 2025. 1, 3

work page arXiv 2025
[22]

A simple image segmentation framework via in-context examples.Advances in Neural Information Processing Systems, 37:25095–25119,

Yang Liu, Chenchen Jing, Hengtao Li, Muzhi Zhu, Hao Chen, Xinlong Wang, and Chunhua Shen. A simple image segmentation framework via in-context examples.Advances in Neural Information Processing Systems, 37:25095–25119,

work page
[23]

Unified open-world segmentation with multi-modal prompts

Yang Liu, Yufei Yin, Chenchen Jing, Muzhi Zhu, Hao Chen, Yuling Xi, Bo Feng, Hao Wang, Shiyu Li, and Chunhua Shen. Unified open-world segmentation with multi-modal prompts. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 21557–21567, 2025. 3, 4

work page 2025
[24]

Deep metallic surface defect detection: The new bench- mark and detection network.Sensors, 20(6):1562, 2020

Xiaoming Lv, Fajie Duan, Jia-jia Jiang, Xiao Fu, and Lin Gan. Deep metallic surface defect detection: The new bench- mark and detection network.Sensors, 20(6):1562, 2020. 6, 9, 10

work page 2020
[25]

U- net: Convolutional networks for biomedical image segmen- tation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. InInternational Conference on Medical image com- puting and computer-assisted intervention, pages 234–241. Springer, 2015. 1, 2

work page 2015
[26]

Towards total recall in industrial anomaly detection

Karsten Roth, Latha Pemula, Joaquin Zepeda, Bernhard Sch¨olkopf, Thomas Brox, and Peter Gehler. Towards total recall in industrial anomaly detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14318–14328, 2022. 1

work page 2022
[27]

Yoloe: Real-time seeing anything.arXiv preprint arXiv:2503.07465, 2025

Ao Wang, Lihao Liu, Hui Chen, Zijia Lin, Jungong Han, and Guiguang Ding. Yoloe: Real-time seeing anything.arXiv preprint arXiv:2503.07465, 2025. 1, 3, 5, 6

work page arXiv 2025
[28]

Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detec- tion

Chengjie Wang, Wenbing Zhu, Bin-Bin Gao, Zhenye Gan, Jiangning Zhang, Zhihao Gu, Shuguang Qian, Mingang Chen, and Lizhuang Ma. Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detec- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 22883–22892,

work page
[29]

Ssd-faster net: A hybrid network for industrial defect inspection.arXiv preprint arXiv:2207.00589, 2022

Jingyao Wang and Naigong Yu. Ssd-faster net: A hybrid network for industrial defect inspection.arXiv preprint arXiv:2207.00589, 2022. 2

work page arXiv 2022
[30]

Defect transformer: An efficient hybrid trans- former architecture for surface defect detection.Measure- ment, 211:112614, 2023

Junpu Wang, Guili Xu, Fuju Yan, Jinjin Wang, and Zheng- sheng Wang. Defect transformer: An efficient hybrid trans- former architecture for surface defect detection.Measure- ment, 211:112614, 2023. 1, 2

work page 2023
[31]

Frustratingly simple few-shot object detection.arXiv preprint arXiv:2003.06957, 2020

Xin Wang, Thomas E Huang, Trevor Darrell, Joseph E Gon- zalez, and Fisher Yu. Frustratingly simple few-shot object detection.arXiv preprint arXiv:2003.06957, 2020. 10

work page arXiv 2003
[32]

Seggpt: Towards seg- menting everything in context

Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, and Tiejun Huang. Seggpt: Towards seg- menting everything in context. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1130–1140, 2023. 1, 5, 6

work page 2023
[33]

Wavelet and prototype augmented query- based transformer for pixel-level surface defect detection

Feng Yan, Xiaoheng Jiang, Yang Lu, Jiale Cao, Dong Chen, and Mingliang Xu. Wavelet and prototype augmented query- based transformer for pixel-level surface defect detection. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 23860–23869, 2025. 1, 2

work page 2025
[34]

Meta r-cnn: Towards general solver for instance-level low-shot learning

Xiaopeng Yan, Ziliang Chen, Anni Xu, Xiaoxi Wang, Xi- aodan Liang, and Liang Lin. Meta r-cnn: Towards general solver for instance-level low-shot learning. InProceedings of the IEEE/CVF international conference on computer vi- sion, pages 9577–9586, 2019. 10

work page 2019
[35]

3cad: A large-scale real- world 3c product dataset for unsupervised anomaly detec- tion

Enquan Yang, Peng Xing, Hanyang Sun, Wenbo Guo, Yuan- wei Ma, Zechao Li, and Dan Zeng. 3cad: A large-scale real- world 3c product dataset for unsupervised anomaly detec- tion. InProceedings of the AAAI Conference on Artificial Intelligence, pages 9175–9183, 2025. 6, 9, 10

work page 2025
[36]

Dual wavelet attention networks for image classifi- cation.IEEE Transactions on Circuits and Systems for Video Technology, 33(4):1899–1910, 2022

Yuting Yang, Licheng Jiao, Xu Liu, Fang Liu, Shuyuan Yang, Lingling Li, Puhua Chen, Xiufang Li, and Zhongjian Huang. Dual wavelet attention networks for image classifi- cation.IEEE Transactions on Circuits and Systems for Video Technology, 33(4):1899–1910, 2022. 2

work page 1910
[37]

De- trs beat yolos on real-time object detection

Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, and Jie Chen. De- trs beat yolos on real-time object detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16965–16974, 2024. 8

work page 2024
[38]

Segment everything everywhere all at once.Advances in neural information processing systems, 36:19769–19782,

Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, Jianfeng Wang, Lijuan Wang, Jianfeng Gao, and Yong Jae Lee. Segment everything everywhere all at once.Advances in neural information processing systems, 36:19769–19782,

work page
[39]

Spot-the-difference self-supervised pre- training for anomaly detection and segmentation.arXiv preprint arXiv:2207.14315, 2022

Yang Zou, Jongheon Jeong, Latha Pemula, Dongqing Zhang, and Onkar Dabeer. Spot-the-difference self-supervised pre- training for anomaly detection and segmentation.arXiv preprint arXiv:2207.14315, 2022. 6, 9, 10

work page arXiv 2022