pith. machine review for the scientific record. sign in

arxiv: 2605.07149 · v1 · submitted 2026-05-08 · 💻 cs.CV

Recognition: no theorem link

Real-IAD MVN: A Multi-View Normal Vector Dataset and Benchmark for High-Fidelity Industrial Anomaly Detection

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:12 UTC · model grok-4.3

classification 💻 cs.CV
keywords industrial anomaly detectionsurface normal mapsmulti-view datasetgeometric defectsmultimodal fusionanomaly detection benchmark
0
0 comments X

The pith

High-fidelity multi-view surface normal maps improve detection of subtle geometric defects over sparse 3D point clouds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Real-IAD-MVN, a dataset of high-fidelity surface normal maps captured from five viewpoints to overcome limits in current industrial anomaly detection. RGB images often miss geometric flaws while sparse point clouds lack resolution for tiny defects such as scratches or pits. Experiments show that these dense multi-view normals make side-wall and occluded defects visible and deliver better detection results than point-cloud baselines. A reconstruction method that learns unified prototypes across image and normal streams also beats standard multimodal fusion techniques. If the results hold, quality-control systems can shift from texture-focused or coarse-shape data to richer geometric representations for manufacturing inspection.

Core claim

By upgrading the acquisition system, Real-IAD-MVN replaces sparse 3D point cloud data with dense multi-view pseudo-3D surface normal maps from five viewpoints. This representation makes previously invisible micro-defects explicitly detectable. A baseline reconstruction approach that extracts cross-modal unified prototypes from image and normal map streams surpasses existing state-of-the-art multimodal fusion methods on the new dataset.

What carries the argument

The multi-view normal vector dataset that supplies dense geometric surface information from five angles, paired with a reconstruction baseline that learns cross-modal unified prototypes from image and normal streams.

If this is right

  • Side-wall and occluded micro-defects such as scratches and pits become detectable where they were previously missed.
  • Dense multi-view normals produce higher detection performance than sparse 3D point clouds.
  • Unified prototypes extracted from image and normal streams outperform prior multimodal fusion techniques.
  • The dataset supplies a concrete benchmark for advancing geometric anomaly detection methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Existing 2D inspection lines could add normal-map channels to catch more defects without switching to full 3D scanners.
  • The same multi-view normal capture strategy might transfer to anomaly detection in non-industrial domains that involve fine surface geometry.
  • Tests on object categories outside the current collection would reveal how far the performance advantage extends.

Load-bearing premise

The upgraded system must generate artifact-free normals that faithfully capture real micro-defects, and the reported gains must generalize beyond the chosen dataset splits and baseline code.

What would settle it

Apply the baseline to a fresh set of industrial parts containing independently measured micro-defects and check whether detection rates match the numbers reported on the original splits.

Figures

Figures reproduced from arXiv: 2605.07149 by Bo Peng, Jianghui Zhang, Jianing Liang, Linjie Cheng, Mingmin Chi, Qingwang Yan, Wenbing Zhu, Yudong Cheng, Yurui Pan, Zhuhao Chen.

Figure 1
Figure 1. Figure 1: Overview of the Real-IAD-MVN dataset acquisition and annotation pipeline. (a) Material preparation, showing examples of [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Statistical overview of the Real-IAD MVN dataset in comparison to MVTec 3D-AD and Real3D-AD, illustrating sample counts, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of our CPRN baseline architecture. Multi [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative results of our single-view CPRN model applied independently to multiple views from the Real-IAD-MVN dataset. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison with state-of-the-art methods on various categories from the Real-IAD-MVN dataset (top-down view). [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Industrial Anomaly Detection (IAD) is critical for quality control, but existing methods struggle with subtle, geometric defects. Standard 2D (RGB) images are sensitive to texture and lighting but often miss fine geometric anomalies. While 3D point clouds capture macro-shape, they are typically too sparse to detect micro-defects like scratches or pits. We address this fundamental data limitation by introducing Real-IAD-MVN (Multi-View Normal), a large-scale industrial dataset. By upgrading our acquisition system, Real-IAD-MVN captures high-fidelity surface normal maps from five distinct viewpoints, replacing sparse 3D data entirely. This provides a comprehensive geometric representation at a micro-detail level, making previously invisible side-wall and occluded defects explicitly detectable. Our experiments, conducted on this new dataset, first provide evidence that incorporating dense, multi-view pseudo-3D (surface normals) yields significantly better detection performance than using sparse 3D point cloud data. To further validate the dataset and provide a strong benchmark, we introduce a baseline method based on reconstruction, which learns to extract cross-modal unified prototypes from the image and normal map streams. We demonstrate that this unified prototype approach surpasses existing state-of-the-art multimodal fusion methods, highlighting the rich potential of our new dataset for advancing geometric anomaly detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces Real-IAD-MVN, a large-scale industrial anomaly detection dataset that upgrades an acquisition system to capture high-fidelity multi-view surface normal maps from five viewpoints, replacing sparse 3D point clouds. It claims this dense pseudo-3D geometric representation enables detection of previously invisible micro-defects (e.g., side-wall scratches and pits), provides experimental evidence that multi-view normals outperform point clouds, and introduces a reconstruction-based baseline that learns unified cross-modal prototypes from RGB and normal streams, surpassing existing SOTA multimodal fusion methods.

Significance. If the performance claims hold after proper validation, the dataset would address a key limitation in IAD by supplying dense micro-scale geometric cues absent from standard 2D RGB or sparse point-cloud modalities, potentially improving detection of subtle manufacturing defects. The unified-prototype baseline offers a concrete starting point for cross-modal learning research on this new data type.

major comments (3)
  1. [Abstract] Abstract: the central claim that 'incorporating dense, multi-view pseudo-3D (surface normals) yields significantly better detection performance than using sparse 3D point cloud data' is asserted without any reported metrics (AUROC, F1, etc.), ablation tables, dataset statistics, or controls, rendering the superiority statement unverifiable from the manuscript.
  2. [Experimental section] Experimental section (and methods): no quantitative fidelity check (e.g., mean angular error against calibrated reference geometry) or artifact analysis is provided for the captured normal maps, which is load-bearing for the claim that the upgraded system produces artifact-free high-fidelity normals capable of revealing real industrial micro-defects rather than reconstruction artifacts.
  3. [Experiments] Experiments / baseline description: the assertion that the unified-prototype method 'surpasses existing state-of-the-art multimodal fusion methods' lacks details on baseline re-implementations, hyper-parameter controls, statistical significance tests, or cross-validation splits, preventing assessment of whether gains are attributable to the new modality or to implementation differences.
minor comments (1)
  1. [Abstract] The term 'pseudo-3D' is used for surface normals without a clear definition or comparison to true 3D geometry in the text.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, indicating where revisions will be made to improve clarity and verifiability.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'incorporating dense, multi-view pseudo-3D (surface normals) yields significantly better detection performance than using sparse 3D point cloud data' is asserted without any reported metrics (AUROC, F1, etc.), ablation tables, dataset statistics, or controls, rendering the superiority statement unverifiable from the manuscript.

    Authors: The full experimental section contains the supporting AUROC/F1 metrics, ablation tables, dataset statistics, and controls that underpin the abstract claim. The abstract is intentionally concise as a summary. To make the superiority statement directly verifiable at the abstract level, we will revise it to include brief quantitative highlights (e.g., specific AUROC improvements over point-cloud baselines). revision: yes

  2. Referee: [Experimental section] Experimental section (and methods): no quantitative fidelity check (e.g., mean angular error against calibrated reference geometry) or artifact analysis is provided for the captured normal maps, which is load-bearing for the claim that the upgraded system produces artifact-free high-fidelity normals capable of revealing real industrial micro-defects rather than reconstruction artifacts.

    Authors: We agree this quantitative validation would strengthen the fidelity claims. The current manuscript supports normal-map quality via qualitative inspection and downstream anomaly-detection gains. In revision we will add a dedicated fidelity subsection reporting mean angular error against calibrated reference geometry plus artifact analysis. revision: yes

  3. Referee: [Experiments] Experiments / baseline description: the assertion that the unified-prototype method 'surpasses existing state-of-the-art multimodal fusion methods' lacks details on baseline re-implementations, hyper-parameter controls, statistical significance tests, or cross-validation splits, preventing assessment of whether gains are attributable to the new modality or to implementation differences.

    Authors: We will expand the experimental section with explicit re-implementation details (including pseudocode or repository links), full hyper-parameter tables, statistical significance tests (e.g., paired t-tests with p-values), and clarification of the train/validation/test splits used. This will allow readers to attribute performance differences to the modality rather than implementation variance. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset and benchmark paper with no derivations

full rationale

The paper is a dataset contribution introducing Real-IAD-MVN with multi-view normal maps, accompanied by empirical benchmarks comparing detection performance against point clouds and multimodal baselines. No equations, parameter fittings, uniqueness theorems, or derivation chains are present in the abstract or described content. All claims rest on reported experimental metrics from the new data splits, which are independent of any self-referential constructs or self-citations for core premises. This is a standard self-contained empirical work with no load-bearing reductions to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work rests on standard computer vision assumptions about the informativeness of surface normals for geometry and the value of reconstruction-based anomaly detection. No free parameters, ad-hoc axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5568 in / 986 out tokens · 42676 ms · 2026-05-11T01:12:50.056883+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    Improv- ing unsupervised defect segmentation by applying structural similarity to autoencoders,

    Paul Bergmann, Sindy L ¨owe, Michael Fauser, David Sattleg- ger, and Carsten Steger. Improving unsupervised defect seg- mentation by applying structural similarity to autoencoders. arXiv preprint arXiv:1807.02011, 2018. 2

  2. [2]

    Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection

    Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9592–9600, 2019. 1

  3. [3]

    The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization.arXiv preprint arXiv:2112.09045, 2021

    Paul Bergmann, Xin Jin, David Sattlegger, and Carsten Ste- ger. The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization.arXiv preprint arXiv:2112.09045,

  4. [4]

    Padim: a patch distribution modeling framework for anomaly detection and localization

    Thomas Defard, Aleksandr Setkov, Angelique Loesch, and Romaric Audigier. Padim: a patch distribution modeling framework for anomaly detection and localization. InInter- national Conference on Pattern Recognition, pages 475–489. Springer, 2021. 2, 4

  5. [5]

    Learning to detect multi-class anomalies with just one normal image prompt

    Bin-Bin Gao. Learning to detect multi-class anomalies with just one normal image prompt. InEuropean Conference on Computer Vision, pages 454–470. Springer, 2024. 2

  6. [6]

    Cflow-ad: Real-time unsupervised anomaly detection with compact neural flow.IEEE Transac- tions on Artificial Intelligence, page 2022, 2022

    Denis Gudovskiy et al. Cflow-ad: Real-time unsupervised anomaly detection with compact neural flow.IEEE Transac- tions on Artificial Intelligence, page 2022, 2022. 6

  7. [7]

    Dinomaly: The less is more philoso- phy in multi-class unsupervised anomaly detection.arXiv preprint arXiv:2405.14325, 2024

    Jia Guo, Shuai Lu, Weihang Zhang, Fang Chen, Hongen Liao, and Huiqi Li. Dinomaly: The less is more philoso- phy in multi-class unsupervised anomaly detection.arXiv preprint arXiv:2405.14325, 2024. 6

  8. [8]

    Inp- former: A new paradigm for training-free anomaly detec- tion

    Jaemin Lee, Dong-ju Lee, Jihun Lee, Hyunjin Song, Jong- heon Kim, Sung-eui Park, and Chan-hyung Choi. Inp- former: A new paradigm for training-free anomaly detec- tion. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 562–572, 2024. 4

  9. [9]

    Cutpaste: Self-supervised learning for anomaly de- tection and localization

    Chun-Liang Li, Kihyuk Sohn, Jinsung Yoon, and Tomas Pfister. Cutpaste: Self-supervised learning for anomaly de- tection and localization. InProceedings of the IEEE/CVF conference on CVPR, pages 9664–9674, 2021. 2

  10. [10]

    arXiv preprint arXiv:2506.18882 , year=

    Hong Li, Houyuan Chen, Chongjie Ye, Zhaoxi Chen, Bo- han Li, Shaocong Xu, Xianda Guo, Xuhui Liu, Yikai Wang, Baochang Zhang, et al. Light of normals: Unified fea- ture representation for universal photometric stereo.arXiv preprint arXiv:2506.18882, 2025. 1, 2, 3

  11. [11]

    Real3d- ad: A dataset of point cloud anomaly detection.Advances in Neural Information Processing Systems, 36, 2024

    Jiaqi Liu, Guoyang Xie, Ruitao Chen, Xinpeng Li, Jinbao Wang, Yong Liu, Chengjie Wang, and Feng Zheng. Real3d- ad: A dataset of point cloud anomaly detection.Advances in Neural Information Processing Systems, 36, 2024. 1, 2, 3

  12. [12]

    Simplenet: A simple network for image anomaly detection and localization

    Zhikang Liu, Yiming Zhou, Yuansheng Xu, and Zilei Wang. Simplenet: A simple network for image anomaly detection and localization. InProceedings of the IEEE/CVF confer- ence on CVPR, pages 20402–20411, 2023. 2, 6

  13. [13]

    Exploring intrinsic normal prototypes within a single im- age for universal anomaly detection

    Wei Luo, Yunkang Cao, Haiming Yao, Xiaotian Zhang, Jianan Lou, Yuqi Cheng, Weiming Shen, and Wenyong Yu. Exploring intrinsic normal prototypes within a single im- age for universal anomaly detection. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 9974–9983, 2025. 4, 6, 7, 8

  14. [14]

    Towards to- tal recall in industrial anomaly detection

    Karsten Roth, Latha Pemula, Joaquin Zepeda, Bernhard Sch¨olkopf, Thomas Brox, and Peter Gehler. Towards to- tal recall in industrial anomaly detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14318–14328, 2022. 2, 4, 6

  15. [15]

    Unsupervised anomaly detection with generative adversarial networks to guide marker discovery

    Thomas Schlegl, Philipp Seeb ¨ock, Sebastian M Waldstein, Ursula Schmidt-Erfurth, and Georg Langs. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. InInternational conference on in- formation processing in medical imaging, pages 146–157. Springer, 2017. 2

  16. [16]

    Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detec- tion

    Chengjie Wang, Wenbing Zhu, Bin-Bin Gao, Zhenye Gan, Jiangning Zhang, Zhihao Gu, Shuguang Qian, Mingang Chen, and Lizhuang Ma. Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detec- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 22883–22892,

  17. [17]

    M3dm-nr: Rgb-3d noisy-resistant industrial anomaly detec- tion via multimodal denoising.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 2025

    Chengjie Wang, Haokun Zhu, Jinlong Peng, Yue Wang, Ran Yi, Yunsheng Wu, Lizhuang Ma, and Jiangning Zhang. M3dm-nr: Rgb-3d noisy-resistant industrial anomaly detec- tion via multimodal denoising.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 2025. 2

  18. [18]

    Multimodal industrial anomaly detection via hybrid fusion

    Yue Wang, Jinlong Peng, Jiangning Zhang, Ran Yi, Yabiao Wang, and Chengjie Wang. Multimodal industrial anomaly detection via hybrid fusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8032–8041, 2023. 2, 6, 7, 8

  19. [19]

    Photometric method for determining surface orientation from multiple images.Optical engineer- ing, 19(1):139–144, 1980

    Robert J Woodham. Photometric method for determining surface orientation from multiple images.Optical engineer- ing, 19(1):139–144, 1980. 3

  20. [20]

    A unified model for multi-class anomaly detection.Advances in Neural Information Pro- cessing Systems, 35:4571–4584, 2022

    Zhiyuan You, Lei Cui, Yujun Shen, Kai Yang, Xin Lu, Yu Zheng, and Xinyi Le. A unified model for multi-class anomaly detection.Advances in Neural Information Pro- cessing Systems, 35:4571–4584, 2022. 2

  21. [21]

    Draem- a discriminatively trained reconstruction embedding for sur- face anomaly detection

    Vitjan Zavrtanik, Matej Kristan, and Danijel Skoˇcaj. Draem- a discriminatively trained reconstruction embedding for sur- face anomaly detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 8330– 8339, 2021. 2

  22. [22]

    Real-iad d3: A real-world 2d/pseudo-3d/3d dataset for industrial anomaly detection

    Wenbing Zhu, Lidong Wang, Ziqing Zhou, Chengjie Wang, Yurui Pan, Ruoyi Zhang, Zhuhao Chen, Linjie Cheng, Bin- Bin Gao, Jiangning Zhang, et al. Real-iad d3: A real-world 2d/pseudo-3d/3d dataset for industrial anomaly detection. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 15214–15223, 2025. 1, 2, 3, 5, 6

  23. [23]

    Spot-the- difference: A novel benchmark for image anomaly detection in industrial inspection

    Zongyi Zou, Qiang Qiu, and Weiming Shen. Spot-the- difference: A novel benchmark for image anomaly detection in industrial inspection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2608– 2616, 2022. 1 A. Generative Potential of Real-IAD-MVN To showcase the richness of the learned representation in our new dataset and to ...

  24. [24]

    A forward process gradually adds Gaussian noise to a ground truth latent vector, and aDenoising Unetis trained to reverse this process

    Diffusion-based Latent Generation.We adopt a conditional denoising diffusion probabilistic model (DDPM) to generate a latent representation of the geom- etry. A forward process gradually adds Gaussian noise to a ground truth latent vector, and aDenoising Unetis trained to reverse this process. The reverse process is con- ditioned on the fused features fro...

  25. [25]

    Qualitative comparison of anomaly segmentation from the generative module’s output on the Real-IAD D³ dataset

    Adversarial Dense Generation.The denoised la- tent representation from the Unet is passed to a finalDe- codernetwork, which generates the high-resolution, single- Figure A1. Qualitative comparison of anomaly segmentation from the generative module’s output on the Real-IAD D³ dataset. The right panel (+Pseudo-3D) shows that generating geometry from normal ...