pith. sign in

arxiv: 2605.27748 · v1 · pith:55RGRDWDnew · submitted 2026-05-26 · 💻 cs.CV · cs.AI· cs.LG

Mahalanobis PatchCore: Covariance-Aware and Streaming-Compatible Industrial Anomaly Detection

Pith reviewed 2026-06-29 17:52 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords industrial anomaly detectionMahalanobis distancePatchCorestreaming computationcovariance estimationmemory efficiencyvisual inspection
0
0 comments X

The pith

Mahalanobis PatchCore maintains PatchCore accuracy in anomaly detection while halving peak memory through incremental covariance estimation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Mahalanobis PatchCore to add covariance awareness to PatchCore-style retrieval without storing the entire collection of normal patches. It estimates a regularized covariance matrix in reduced feature space, whitens the embeddings, and then performs standard nearest-neighbor search to realize Mahalanobis distances. This matters because industrial visual inspection systems must operate accurately when defects are rare and hardware memory is limited. The method replaces offline full-bank construction with a bounded-memory pipeline that uses incremental dimensionality reduction, online covariance updates, and streaming aggregation. On a public 15-category benchmark it keeps most of the original image-level performance while cutting memory from 5.41 GB to 2.78 GB; on three real industrial inspection tasks it raises mean image AUC from 0.981 to 0.986.

Core claim

Mahalanobis PatchCore estimates a regularised covariance model in reduced feature space and whitens embeddings so that Euclidean nearest-neighbour search after transformation implements Mahalanobis retrieval. A bounded-memory, re-iterable training pipeline builds the memory bank without storing all normal patches at once, using incremental dimensionality reduction, online covariance estimation, and streaming aggregation. This preserves most offline PatchCore image-level performance on the public benchmark while reducing peak memory from 5.41 to 2.78 GB, and improves the selected industrial mean image area under the receiver operating characteristic curve from 0.981 to 0.986.

What carries the argument

regularised covariance estimation in reduced feature space with whitening, combined with incremental dimensionality reduction and streaming aggregation to build the memory bank

If this is right

  • The method enables accurate anomaly detection under practical memory limits in automated industrial inspection.
  • It achieves comparable or higher image-level AUC than standard PatchCore on both public and proprietary industrial data.
  • The streaming pipeline avoids materialising the complete patch pool before subsampling.
  • Covariance-aware retrieval becomes feasible without requiring the full offline memory bank.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same incremental covariance technique could be applied to other distance-based one-class detectors that currently ignore feature correlations.
  • The whitening transformation might be computed once on an existing memory bank to retrofit covariance awareness without retraining.
  • If normal data distributions drift over time, the online covariance update could support periodic refresh without restarting from scratch.

Load-bearing premise

The regularized covariance estimated incrementally in reduced feature space remains sufficiently accurate to produce Mahalanobis distances that match the quality of an offline full-covariance model.

What would settle it

A side-by-side run of the full offline PatchCore with true Mahalanobis distances on the same datasets that shows substantially lower AUC than the streaming incremental version would falsify the accuracy claim.

Figures

Figures reproduced from arXiv: 2605.27748 by Evelina Lamma, Niccol\`o Ferrari, Oligert Osmani.

Figure 1
Figure 1. Figure 1: Overview of MH-PatchCore training and inference. Nominal single-channel or three-channel inspection images are mapped by a frozen encoder to locally aware patch descriptors, reduced incrementally, whitened by a regularised covariance model, and sum￾marised into a bounded-memory k-center bank. At inference, the same encoder, reduction, and whitening transform are applied before Euclidean nearest-neighbour r… view at source ↗
Figure 2
Figure 2. Figure 2: Conceptual comparison between the offline PatchCore baseline and the stream￾ing covariance-aware variant studied in this paper. The diagram contrasts the point at which the support set is built and the geometry used for nearest-neighbour retrieval. 10 [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Meniscus qualitative examples from the canonical streaming MH-PatchCore baseline. Each row shows one sample as the original image, anomaly-score heatmap overlay, and predicted segmentation overlay. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Bottom qualitative examples from the canonical streaming MH-PatchCore baseline. Each row shows one sample as the original image, anomaly-score heatmap overlay, and predicted segmentation overlay. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Lyo qualitative examples from the canonical streaming MH-PatchCore baseline. Each row shows one sample as the original image, anomaly-score heatmap overlay, and predicted segmentation overlay. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Meniscus RAM growth for original PatchCore and the selected GeoReS stream￾ing trade-off. Original PatchCore improves by retaining more normal images, but its resident memory grows with the retained subset, whereas the bounded-memory construc￾tion stays near a fixed RAM band. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_6.png] view at source ↗
Figure 7
Figure 7. Figure 7 [PITH_FULL_IMAGE:figures/full_fig_p043_7.png] view at source ↗
read the original abstract

Industrial visual anomaly detection is usually one-class: normal images are abundant, while defects are rare, heterogeneous, and often unavailable during system design. PatchCore-style retrieval suits this setting because it scores test images from a memory bank of normal patch features, but the standard Euclidean geometry ignores feature correlations and its offline construction materialises the full patch pool before subsampling. We introduce Mahalanobis PatchCore, a covariance-aware, streaming-compatible extension of PatchCore. Its artificial intelligence contribution is a retrieval detector that estimates a regularised covariance model in reduced feature space and whitens embeddings, so Euclidean nearest-neighbour search after transformation implements Mahalanobis retrieval. A bounded-memory, re-iterable training pipeline builds the memory bank without storing all normal patches at once, using incremental dimensionality reduction, online covariance estimation, and streaming aggregation. The engineering application is automated industrial inspection, where visual anomaly detection must remain accurate under practical memory limits. We evaluate the method on a public 15-category industrial anomaly-detection benchmark and three industrial datasets covering blow-fill-seal strip-ampoule meniscus inspection, amber-glass-ampoule bottom inspection, and lyophilised-cake vial inspection. Mahalanobis PatchCore preserves most offline PatchCore image-level performance on the public benchmark while reducing peak memory from 5.41 to 2.78 GB, and improves the selected industrial mean image area under the receiver operating characteristic curve from 0.981 to 0.986.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents Mahalanobis PatchCore as a covariance-aware extension of PatchCore for one-class industrial anomaly detection. It estimates a regularized covariance in reduced feature space to implement Mahalanobis retrieval via whitening and Euclidean NN, while using incremental dimensionality reduction and online covariance estimation for a streaming, bounded-memory training pipeline. On a public 15-category benchmark it reduces peak memory from 5.41 GB to 2.78 GB while preserving image-level AUC, and on three industrial datasets it raises mean image AUC from 0.981 to 0.986.

Significance. Should the incremental covariance estimate prove sufficiently faithful to the batch version, the approach would offer a practical way to incorporate feature correlations into memory-bank retrieval methods without prohibitive memory costs, which is valuable for real-world industrial inspection systems operating under hardware constraints.

major comments (3)
  1. [Abstract] Abstract: The concrete numeric claims (memory reduction to 2.78 GB and AUC increase to 0.986) are presented without derivation, error bars, or ablation of the regularization strength (a free parameter), leaving the central performance assertions unsupported by visible evidence.
  2. [§3 (Method)] §3 (Method): The description of the online covariance estimation in reduced space lacks a quantitative validation (e.g., distance distribution comparison or nearest-neighbor ranking preservation) against the offline full-covariance baseline; this is load-bearing for the claim that Mahalanobis distances match offline quality.
  3. [§4 (Experiments)] §4 (Experiments): No ablation study or sensitivity analysis is shown for the choice of reduced dimension or regularization parameter, despite these being critical to both memory savings and detection quality.
minor comments (2)
  1. [§2] §2: Notation for the whitening transformation could be made more explicit by defining the transformation matrix in terms of the estimated covariance.
  2. [Figure 1] Figure 1: The pipeline diagram would benefit from explicit indication of which steps are performed incrementally versus in batch.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments identify opportunities to improve the clarity of result derivation and the validation of the streaming components. We respond point-by-point below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The concrete numeric claims (memory reduction to 2.78 GB and AUC increase to 0.986) are presented without derivation, error bars, or ablation of the regularization strength (a free parameter), leaving the central performance assertions unsupported by visible evidence.

    Authors: The memory and AUC figures are obtained directly from the peak-memory measurements and image-level AUC computations reported in §4 on the 15-category benchmark and the three industrial datasets. We will revise the abstract to include explicit cross-references to the relevant tables and sections. Error bars from repeated runs with different seeds can be added in revision. A brief note on the regularization parameter (chosen for positive-definiteness) and a limited sensitivity check will also be included. revision: partial

  2. Referee: [§3 (Method)] §3 (Method): The description of the online covariance estimation in reduced space lacks a quantitative validation (e.g., distance distribution comparison or nearest-neighbor ranking preservation) against the offline full-covariance baseline; this is load-bearing for the claim that Mahalanobis distances match offline quality.

    Authors: End-to-end benchmark performance preservation currently serves as the primary support for the online estimate. We agree that a direct quantitative comparison would strengthen the claim and will add, in the revised §3, a side-by-side evaluation of distance distributions and nearest-neighbor rank preservation between the incremental and batch covariance versions. revision: yes

  3. Referee: [§4 (Experiments)] §4 (Experiments): No ablation study or sensitivity analysis is shown for the choice of reduced dimension or regularization parameter, despite these being critical to both memory savings and detection quality.

    Authors: The reduced dimension and regularization values were selected to satisfy memory bounds while maintaining numerical stability. We acknowledge the absence of a dedicated sensitivity study and will add an ablation table in the revised §4 examining the effect of these hyperparameters on memory and AUC. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical extension with independent benchmark results

full rationale

The paper introduces Mahalanobis PatchCore as an additive method that augments PatchCore with regularized covariance estimation in reduced space, whitening, and streaming aggregation. Performance numbers (AUC preservation on public benchmark, memory reduction from 5.41 GB to 2.78 GB, industrial mean AUC lift from 0.981 to 0.986) are presented as outcomes of empirical evaluation on external datasets, not as quantities algebraically forced by the same fitted parameters or self-citations. No load-bearing step equates a prediction to its own input by construction, and the derivation chain remains self-contained against the stated benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review; the method rests on standard assumptions of covariance estimation and dimensionality reduction whose validity cannot be audited without the full text.

free parameters (1)
  • regularization strength for covariance
    The abstract states a 'regularised covariance model' whose strength must be chosen; no value is given.
axioms (1)
  • domain assumption Covariance structure in the reduced feature space is stable enough for incremental online estimation to approximate the offline matrix
    Required for the whitening step to implement correct Mahalanobis retrieval under streaming constraints.

pith-pipeline@v0.9.1-grok · 5803 in / 1426 out tokens · 32539 ms · 2026-06-29T17:52:49.770222+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 20 canonical work pages · 1 internal anchor

  1. [1]

    Bergmann, M

    P. Bergmann, M. Fauser, D. Sattlegger, C. Steger, MVTec AD – A comprehensive real-world dataset for unsupervised anomaly detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9592–9600. doi:10.1109/CVPR.2019. 00982. URL https://openaccess.thecvf.com/content_CVPR_2019/html/B ergmann_MVTec_AD_--_A_Compre...

  2. [2]

    Gudovskiy, S

    D. Gudovskiy, S. Ishizaka, K. Kozuka, CFLOW-AD: Real-time unsuper- vised anomaly detection with localization via conditional normalizing flows, in: Proceedings of the IEEE/CVF Winter Conference on Applica- tions of Computer Vision, 2022, pp. 98–107. doi:10.1109/WACV51458. 2022.00188. URL https://openaccess.thecvf.com/content/WACV2022/html/Gu dovskiy_CFLOW...

  3. [3]

    Batzner, L

    K. Batzner, L. Heckler, R. K¨ onig, EfficientAD: Accurate visual anomaly detection at millisecond-level latencies, in: Proceedings of the 47 IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 128–138.doi:10.1109/WACV57701.2024.00020. URL https://openaccess.thecvf.com/content/WACV2024/html/Ba tzner_EfficientAD_Accurate_Visual_Anomaly_...

  4. [4]

    K. Roth, L. Pemula, J. Zepeda, B. Sch¨ olkopf, T. Brox, P. Gehler, To- wards total recall in industrial anomaly detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14318–14328.doi:10.1109/CVPR52688.2022.01392. URL https://openaccess.thecvf.com/content/CVPR2022/html/Ro th_Towards_Total_Recall_in_Indus...

  5. [5]

    Cohen, Y

    N. Cohen, Y. Hoshen, Sub-image anomaly detection with deep pyramid correspondences (2020). arXiv:2005.02357 , doi:10.48550/arXiv.2 005.02357. URLhttps://arxiv.org/abs/2005.02357

  6. [6]

    Zanfir, E

    T. Reiss, N. Cohen, L. Bergman, Y. Hoshen, PANDA: Adapting pre- trained features for anomaly detection and segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, 2021, pp. 2806–2814.doi:10.1109/CVPR46437.2021.00283. URL https://openaccess.thecvf.com/content/CVPR2021/html/Re iss_PANDA_Adapting_Pretrained_Feat...

  7. [7]

    Defard, A

    T. Defard, A. Setkov, A. Loesch, R. Audigier, PaDiM: A patch distri- bution modeling framework for anomaly detection and localization, in: Pattern Recognition. ICPR International Workshops and Challenges, Vol. 12664 of Lecture Notes in Computer Science, Springer, 2021, pp. 475–489.doi:10.1007/978-3-030-68799-1_35. URLhttps://doi.org/10.1007/978-3-030-68799-1_35

  8. [8]

    Ak¸ cay, A

    S. Ak¸ cay, A. Atapour-Abarghouei, T. P. Breckon, GANomaly: Semi- supervised anomaly detection via adversarial training, in: Computer Vision – ACCV 2018, Lecture Notes in Computer Science, Springer, 2019, pp. 622–637.doi:10.1007/978-3-030-20893-6_39. URLhttps://doi.org/10.1007/978-3-030-20893-6_39 48

  9. [9]

    Ak¸ cay, A

    S. Ak¸ cay, A. Atapour-Abarghouei, T. P. Breckon, Skip-GANomaly: Skip connected and adversarially trained encoder-decoder anomaly detection, in: International Joint Conference on Neural Networks (IJCNN), 2019, pp. 1–8.doi:10.1109/IJCNN.2019.8851808. URLhttps://ieeexplore.ieee.org/document/8851808

  10. [10]

    H. Deng, X. Li, Anomaly detection via reverse distillation from one-class embedding, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9737–9746. doi:10.1109/CV PR52688.2022.00951. URL https://openaccess.thecvf.com/content/CVPR2022/html/De ng_Anomaly_Detection_via_Reverse_Distillation_From_One-Cla ss_Embeddin...

  11. [11]

    Zhang, Y

    V. Zavrtanik, M. Kristan, D. Skoˇ caj, DRAEM – A discriminatively trained reconstruction embedding for surface anomaly detection, in: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 8330–8339.doi:10.1109/ICCV48922.2021.00822. URL https://openaccess.thecvf.com/content/ICCV2021/html/Za vrtanik_DRAEM_-_A_Discriminatively_...

  12. [12]

    Ferrari, M

    N. Ferrari, M. Fraccaroli, E. Lamma, Grd-net: Generative-reconstructive- discriminative anomaly detection with region of interest attention mod- ule, International Journal of Intelligent Systems 2023 (1) (2023) 7773481. doi:10.1155/2023/7773481. URL https://onlinelibrary.wiley.com/doi/abs/10.1155/2023/7 773481

  13. [13]

    Ferrari, N

    N. Ferrari, N. Zanarini, M. Fraccaroli, A. Bizzarri, E. Lamma, Inte- gration of deep generative Anomaly Detection algorithm in high-speed industrial line, updated and expanded version of SSRN preprint 4858664 (2026).arXiv:2603.07577,doi:10.48550/arXiv.2603.07577. URLhttps://arxiv.org/abs/2603.07577

  14. [14]

    Q. Chen, H. Luo, C. Lv, Z. Zhang, A unified anomaly synthesis strategy with gradient ascent for industrial anomaly detection and localization, in: Computer Vision – ECCV 2024, Springer, 2024, pp. 37–54. doi: 10.1007/978-3-031-72855-6_3. 49 URL https://www.ecva.net/papers/eccv_2024/papers_ECCV/html /8382_ECCV_2024_paper.php

  15. [15]

    P. C. Mahalanobis, On the generalised distance in statistics, Proceedings of the National Institute of Sciences of India 2 (1) (1936) 49–55

  16. [16]

    P. J. Rousseeuw, B. C. van Zomeren, Unmasking multivariate outliers and leverage points, Journal of the American Statistical Association 85 (411) (1990) 633–639.doi:10.1080/01621459.1990.10474920

  17. [17]

    De Maesschalck, D

    R. De Maesschalck, D. Jouan-Rimbaud, D. L. Massart, The mahalanobis distance, Chemometrics and Intelligent Laboratory Systems 50 (1) (2000) 1–18.doi:10.1016/S0169-7439(99)00047-7

  18. [18]

    Rippel, P

    O. Rippel, P. Mertens, E. K¨ onig, D. Merhof, Gaussian anomaly detec- tion by modeling the distribution of normal data in pretrained deep features, IEEE Transactions on Instrumentation and Measurement 70 (2021) 5014213.doi:10.1109/TIM.2021.3098381. URLhttps://publications.rwth-aachen.de/record/834048

  19. [19]

    A. Dini, E. Rahtu, Visual anomaly detection and localization with a patch-wise transformer and convolutional model, in: Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP) – Volume 5: VISAPP, SciTePress, 2023, pp. 144–152. doi:10.5220/0011669400 003417. URL https://ww...

  20. [20]

    Johnson, M

    J. Johnson, M. Douze, H. J´ egou, Billion-scale similarity search with GPUs, IEEE Transactions on Big Data 7 (3) (2021) 535–547

  21. [21]

    Wide Residual Networks

    S. Zagoruyko, N. Komodakis, Wide residual networks, arXiv preprint arXiv:1605.07146 (2016)

  22. [22]

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: A large-scale hierarchical image database, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248– 255. 50

  23. [23]

    Y. Chen, A. Wiesel, Y. C. Eldar, A. O. Hero, Shrinkage algorithms for MMSE covariance estimation, IEEE Transactions on Signal Processing 58 (10) (2010) 5016–5029

  24. [24]

    Ledoit, M

    O. Ledoit, M. Wolf, A well-conditioned estimator for large-dimensional covariance matrices, Journal of Multivariate Analysis 88 (2) (2004) 365– 411

  25. [25]

    Sculley, Web-scale K-means clustering, in: Proceedings of the 19th International Conference on World Wide Web, ACM, 2010, pp

    D. Sculley, Web-scale K-means clustering, in: Proceedings of the 19th International Conference on World Wide Web, ACM, 2010, pp. 1177– 1178.doi:10.1145/1772690.1772862

  26. [26]

    B. P. Welford, Note on a method for calculating corrected sums of squares and products, Technometrics 4 (3) (1962) 419–420

  27. [27]

    T. F. Chan, G. H. Golub, R. J. LeVeque, Updating formulae and a pairwise algorithm for computing sample variances, Tech. Rep. STAN- CS-79-773, Stanford University (1979). 51 Appendix A. Notation Summary Symbol Meaning xInput image. P(x) = {u(j)}nx j=1 Patch set extracted from imagex. u∈R d0 Patch embedding before reduction. RFitted dimensionality reducer,...