Flow Mismatching: Unsupervised Anomaly Detection via Velocity Discrepancies in Flow Matching Models
Pith reviewed 2026-05-25 05:18 UTC · model grok-4.3
The pith
Flow matching models detect anomalies by measuring velocity disagreements between learned normal dynamics and geometric paths to test images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that anomalies induce strong local disagreement between the model-predicted velocity, which follows normal generative dynamics, and the geometric velocity toward the target, which includes any anomalous content. Aggregating the mismatch over different time steps and multiple paths yields pixel-wise heatmaps and image-level scores. The population mismatch decomposes into an irreducible denoising term and a Fisher-divergence term between the test-path and normal-path score functions, identifying the score-gap component that drives anomaly separation.
What carries the argument
Velocity mismatch between the learned normal velocity field and the geometric velocity along affine paths from Gaussian noise to the target image.
If this is right
- Pixel-wise anomaly heatmaps and image-level scores are obtained directly from aggregated velocity mismatches.
- The method operates without test-time optimization, feature memories, or additional calibration.
- The mismatch decomposes into a denoising term and a Fisher-divergence term that isolates the component driving anomaly separation.
- The approach outperforms prior reconstruction-based and flow-matching anomaly detection methods on MVTec-AD and VisA.
Where Pith is reading between the lines
- The same mismatch principle could be tested on sequential data such as video by extending paths through time.
- Different path samplings beyond affine lines might reduce variance in the aggregated scores.
- The Fisher-divergence term suggests that score-function estimation quality directly limits detection sensitivity.
Load-bearing premise
The velocity field learned exclusively from normal images will produce detectable disagreement specifically attributable to anomalous content when compared to geometric velocities along paths to test images.
What would settle it
On MVTec-AD or VisA images containing known anomalies, compute mismatch scores after replacing anomalous regions with normal content; if scores do not drop substantially to match normal images, the separation claim fails.
Figures
read the original abstract
We propose Flow Mismatching, an unsupervised anomaly detection method that deliberately avoids reconstruction-based paradigms. Instead, we treat flow matching as geometric dynamics and leverage a key insight: anomalies occur at places where the learned normal flow disagrees with the geometric path toward a test image. Given a flow matching model trained only on normal images, we probe its learned velocity field along affine paths from Gaussian noise to a target image. Along each path, we compare the model-predicted velocity, which follows normal generative dynamics, with the geometric velocity toward the target, which includes any anomalous content. Anomalies induce strong local disagreement between these velocities. Aggregating the mismatch over different time steps and multiple paths yields pixel-wise heatmaps and image-level scores without test-time optimization, feature memories, or additional calibration. Our analysis shows that the population mismatch decomposes into an irreducible denoising term and a Fisher-divergence term between the test-path and normal-path score functions, which identifies the score-gap component that drives anomaly separation and explains the effectiveness of robust path aggregation. Extensive experiments on MVTec-AD and VisA demonstrate superior performance compared with SOTA reconstruction-based and recent flow matching-based approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Flow Mismatching, an unsupervised anomaly detection method that trains a flow matching model exclusively on normal images and then probes its learned velocity field along affine paths from Gaussian noise to a target test image. Model-predicted velocities (following normal generative dynamics) are compared to geometric velocities toward the target (which incorporate anomalous content); local disagreements are aggregated over time steps and multiple paths to produce pixel-wise heatmaps and image-level scores without test-time optimization, feature memories, or calibration. A population-level analysis decomposes the mismatch into an irreducible denoising term plus a Fisher-divergence term between test-path and normal-path score functions, which is claimed to drive anomaly separation. Experiments on MVTec-AD and VisA report superior performance over reconstruction-based and recent flow-matching baselines.
Significance. If the central claims hold, the work introduces a non-reconstruction paradigm for anomaly detection grounded in generative dynamics, with an explicit population decomposition that explains why velocity mismatches can isolate anomalies. Strengths include the absence of test-time optimization or auxiliary memories and the provision of a theoretical account (via the Fisher term) for the effectiveness of path aggregation. This could influence future generative-model-based detection methods by shifting focus from reconstruction error to velocity discrepancies, provided the population-to-instance translation is secured.
major comments (2)
- [§4] §4 (Mismatch Decomposition): The population-level decomposition of mismatch into an irreducible denoising term and a Fisher-divergence term between test-path and normal-path score functions is derived, but no explicit bound, concentration inequality, or per-image argument is supplied showing that the Fisher term remains dominant and separable when anomalous content is localized within a single test image rather than averaged over the population. This gap directly affects the reliability of the claimed pixel-wise heatmaps and image-level scores.
- [§3, §5] §3 (Method) and §5 (Experiments): The robust path aggregation is presented as mitigating the mismatch, yet no ablation quantifies how sensitive the reported gains are to the specific choice of affine paths, number of paths, or aggregation operator; without this, it is unclear whether the superiority over baselines stems from the core velocity-discrepancy idea or from post-hoc aggregation tuning.
minor comments (2)
- [§2] Notation for the geometric velocity and model velocity should be introduced with explicit symbols in §2 or §3 to avoid ambiguity when the decomposition is later referenced.
- [Abstract, §1] The abstract and §1 claim 'without additional calibration,' but the precise meaning of calibration (e.g., threshold selection on validation normals) should be clarified to distinguish from standard practice in anomaly detection.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the scope of our theoretical and empirical contributions. We address each major point below.
read point-by-point responses
-
Referee: [§4] §4 (Mismatch Decomposition): The population-level decomposition of mismatch into an irreducible denoising term and a Fisher-divergence term between test-path and normal-path score functions is derived, but no explicit bound, concentration inequality, or per-image argument is supplied showing that the Fisher term remains dominant and separable when anomalous content is localized within a single test image rather than averaged over the population. This gap directly affects the reliability of the claimed pixel-wise heatmaps and image-level scores.
Authors: We agree that the decomposition is strictly population-level and that the manuscript does not supply a concentration inequality or per-image argument establishing dominance of the Fisher term for localized anomalies. The population analysis is intended to identify the score-gap mechanism that motivates the method, while the pixel-wise heatmaps rely on empirical aggregation of mismatches. We will revise §4 to explicitly state this scope limitation and note that the per-instance reliability is supported by the reported experiments rather than by a formal bound. revision: partial
-
Referee: [§3, §5] §3 (Method) and §5 (Experiments): The robust path aggregation is presented as mitigating the mismatch, yet no ablation quantifies how sensitive the reported gains are to the specific choice of affine paths, number of paths, or aggregation operator; without this, it is unclear whether the superiority over baselines stems from the core velocity-discrepancy idea or from post-hoc aggregation tuning.
Authors: We acknowledge that the current experiments do not include ablations on the number of paths, the affine-path parameterization, or the aggregation operator. In the revised manuscript we will add these ablations (varying path count from 1 to 16, comparing affine versus alternative interpolants, and testing mean versus median aggregation) to quantify sensitivity and isolate the contribution of the velocity-discrepancy signal from the aggregation procedure. revision: yes
Circularity Check
No significant circularity; derivation is self-contained mathematical decomposition
full rationale
The paper's central analysis decomposes population-level mismatch between model velocity and geometric velocity into an irreducible denoising term plus a Fisher-divergence term between score functions. This is presented as an explanatory identity rather than a fitted or self-defined quantity. No equations reduce a prediction to its own inputs by construction, no load-bearing self-citations are invoked for uniqueness, and the per-image aggregation step is described as direct computation without requiring the target result as an assumption. The method is therefore not circular; it applies a trained flow-matching velocity field to new paths and measures discrepancy.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Anomaly detection neural network with dual auto-encoders GAN and its industrial inspection applications , author=. Sensors , volume=. 2020 , publisher=
work page 2020
-
[2]
Textile Research Journal , volume=
Attention-based feature fusion generative adversarial network for yarn-dyed fabric defect detection , author=. Textile Research Journal , volume=. 2023 , publisher=
work page 2023
-
[3]
Sixteenth International Conference on Quality Control by Artificial Vision , volume=
f-AnoGAN for non-destructive testing in industrial anomaly detection , author=. Sixteenth International Conference on Quality Control by Artificial Vision , volume=. 2023 , organization=
work page 2023
-
[4]
arXiv preprint arXiv:2501.11310 , year=
Anomaly detection for industrial applications, its challenges, solutions, and future directions: A review , author=. arXiv preprint arXiv:2501.11310 , year=
-
[5]
European conference on computer vision , pages=
Transfusion--a transparency-based diffusion model for anomaly detection , author=. European conference on computer vision , pages=. 2024 , organization=
work page 2024
-
[6]
arXiv preprint arXiv:2502.19200 , year=
HDM: Hybrid Diffusion Model for Unified Image Anomaly Detection , author=. arXiv preprint arXiv:2502.19200 , year=
-
[7]
International Journal of Machine Learning and Cybernetics , volume=
Industrial product surface defect detection via the fast denoising diffusion implicit model , author=. International Journal of Machine Learning and Cybernetics , volume=. 2024 , publisher=
work page 2024
-
[8]
Latent Diffusion Models to Enhance the Performance of Visual Defect Segmentation Networks in Steel Surface Inspection , author=. Sensors , volume=. 2024 , publisher=
work page 2024
-
[9]
Transactions on Machine Learning Research , year=
Error bounds for flow matching methods , author=. Transactions on Machine Learning Research , year=
-
[10]
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=
MVTec AD—A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection , author=. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=. 2019 , organization=
work page 2019
-
[11]
European Conference on Computer Vision , pages=
SPot-the-Difference Self-supervised Pre-training for Anomaly Detection and Segmentation , author=. European Conference on Computer Vision , pages=
-
[12]
30th IEEE/IES International Symposium on Industrial Electronics (ISIE) , year =
Mishra, Pankaj and Verk, Riccardo and Fornasier, Daniele and Piciarelli, Claudio and Foresti, Gian Luca , title =. 30th IEEE/IES International Symposium on Industrial Electronics (ISIE) , year =
-
[13]
How and Why: Taming Flow Matching for Unsupervised Anomaly Detection and Localization , author=. arXiv preprint arXiv:2508.05461 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Diverging Flows: Detecting Out-of-Distribution Inputs in Conditional Generation , author=. Under Review at ICLR , year=
-
[15]
Medical Image Computing and Computer Assisted Intervention (MICCAI) , year=
Reflect: Rectified Flows for Efficient Brain Anomaly Correction Transport , author=. Medical Image Computing and Computer Assisted Intervention (MICCAI) , year=
-
[16]
Advances in Neural Information Processing Systems (NeurIPS) , year=
Scalable, Explainable and Provably Robust Anomaly Detection with One-Step Flow Matching , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=
-
[17]
arXiv preprint arXiv:2508.11594 , year=
It's not a FAD: first results in using Flows for unsupervised Anomaly Detection at 40 MHz at the Large Hadron Collider , author=. arXiv preprint arXiv:2508.11594 , year=
-
[18]
Advances in Neural Information Processing Systems (NeurIPS) , year=
MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=
-
[19]
No-MambAAD: Revitalizing Conv-Only Networks for Unsupervised Anomaly Detection , author=. CVPR Workshops (VAND) , year=
-
[20]
arXiv preprint arXiv:2512.23818 , year=
Energy-Tweedie: Score meets Score, Energy meets Energy , author=. arXiv preprint arXiv:2512.23818 , year=
-
[21]
arXiv preprint arXiv:2511.00540 , year=
Real-IAD Variety: Pushing Industrial Anomaly Detection Dataset to a Modern Era , author=. arXiv preprint arXiv:2511.00540 , year=
-
[22]
Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
Training-Free Industrial Defect Generation with Diffusion Models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
-
[23]
Flow Matching for Generative Modeling
Flow Matching for Generative Modeling , author=. arXiv preprint arXiv:2210.02747 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[24]
IEEE Transactions on Systems, Man, and Cybernetics , volume=
A Threshold Selection Method from Gray-Level Histograms , author=. IEEE Transactions on Systems, Man, and Cybernetics , volume=. 1979 , publisher=
work page 1979
-
[25]
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=
Anomaly Detection via Reverse Distillation from One-Class Embedding , author=. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=. 2022 , organization=
work page 2022
-
[26]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
SimpleNet: A Simple Network for Image Anomaly Detection and Localization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[27]
2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=
DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection , author=. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=. 2023 , organization=
work page 2023
-
[28]
Advances in Neural Information Processing Systems , volume=
A unified model for multi-class anomaly detection , author=. Advances in Neural Information Processing Systems , volume=
-
[29]
Advances in Neural Information Processing Systems , volume=
Recontrast: Domain-specific anomaly detection via contrastive reconstruction , author=. Advances in Neural Information Processing Systems , volume=
-
[30]
Proceedings of the AAAI conference on artificial intelligence , volume=
A diffusion-based framework for multi-class anomaly detection , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
-
[31]
Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
Dinomaly: The less is more philosophy in multi-class unsupervised anomaly detection , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
-
[32]
Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
Exploring intrinsic normal prototypes within a single image for universal anomaly detection , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
-
[33]
Proceedings of the 41st International Conference on Machine Learning , pages=
D-Flow: differentiating through flows for controlled generation , author=. Proceedings of the 41st International Conference on Machine Learning , pages=
-
[34]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Flowgrad: Controlling the output of generative odes with gradients , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
- [35]
-
[36]
International Joint Conference on Artificial Intelligence , pages=
Ddpm-moco: Advancing industrial surface defect generation and detection with generative and contrastive learning , author=. International Joint Conference on Artificial Intelligence , pages=. 2024 , organization=
work page 2024
-
[37]
IEEE Transactions on Pattern Analysis and Machine Intelligence , year=
DiffusionAD: Norm-guided one-step denoising diffusion for anomaly detection , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=
-
[38]
Exploring Plain ViT Reconstruction for Multi-class Unsupervised Anomaly Detection , author=. CVIU , year=
-
[39]
Proceedings of the 37th International Conference on Neural Information Processing Systems , pages=
Hierarchical vector quantized transformer for multi-class unsupervised anomaly detection , author=. Proceedings of the 37th International Conference on Neural Information Processing Systems , pages=
-
[40]
Forty-second International Conference on Machine Learning , pages=
OmiAD: One-step adaptive masked diffusion model for multi-class anomaly detection via adversarial distillation , author=. Forty-second International Conference on Machine Learning , pages=
-
[41]
Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=
Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows , author=. Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=
-
[42]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
Revisiting reverse distillation for anomaly detection , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[43]
Forty-second International Conference on Machine Learning , pages=
An error analysis of flow matching for deep generative modeling , author=. Forty-second International Conference on Machine Learning , pages=
-
[44]
Proceedings of the IEEE/CVF international conference on computer vision , pages=
Emerging properties in self-supervised vision transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
-
[45]
International Conference on Learning Representations , year=
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author=. International Conference on Learning Representations , year=
-
[46]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Mamba: Linear-Time Sequence Modeling with Selective State Spaces , author=. arXiv preprint arXiv:2312.00752 , year=
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.