pith. sign in

arxiv: 2605.23070 · v1 · pith:T7B3O44Onew · submitted 2026-05-21 · 💻 cs.CV

Flow Mismatching: Unsupervised Anomaly Detection via Velocity Discrepancies in Flow Matching Models

Pith reviewed 2026-05-25 05:18 UTC · model grok-4.3

classification 💻 cs.CV
keywords anomaly detectionflow matchingvelocity mismatchunsupervised learninggenerative modelsimage anomaly localizationMVTec-ADFisher divergence
0
0 comments X

The pith

Flow matching models detect anomalies by measuring velocity disagreements between learned normal dynamics and geometric paths to test images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Flow Mismatching to find anomalies without relying on reconstruction. It shows that a flow matching model trained only on normal images produces strong local disagreement between its predicted velocity and the straight geometric velocity toward a test image that contains anomalies. Aggregating the mismatch across time steps and multiple paths from noise yields pixel-wise heatmaps and image scores. This approach requires no test-time optimization, stored features, or extra calibration steps. A reader would care because it turns the generative velocity field itself into a direct anomaly signal.

Core claim

The central claim is that anomalies induce strong local disagreement between the model-predicted velocity, which follows normal generative dynamics, and the geometric velocity toward the target, which includes any anomalous content. Aggregating the mismatch over different time steps and multiple paths yields pixel-wise heatmaps and image-level scores. The population mismatch decomposes into an irreducible denoising term and a Fisher-divergence term between the test-path and normal-path score functions, identifying the score-gap component that drives anomaly separation.

What carries the argument

Velocity mismatch between the learned normal velocity field and the geometric velocity along affine paths from Gaussian noise to the target image.

If this is right

  • Pixel-wise anomaly heatmaps and image-level scores are obtained directly from aggregated velocity mismatches.
  • The method operates without test-time optimization, feature memories, or additional calibration.
  • The mismatch decomposes into a denoising term and a Fisher-divergence term that isolates the component driving anomaly separation.
  • The approach outperforms prior reconstruction-based and flow-matching anomaly detection methods on MVTec-AD and VisA.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same mismatch principle could be tested on sequential data such as video by extending paths through time.
  • Different path samplings beyond affine lines might reduce variance in the aggregated scores.
  • The Fisher-divergence term suggests that score-function estimation quality directly limits detection sensitivity.

Load-bearing premise

The velocity field learned exclusively from normal images will produce detectable disagreement specifically attributable to anomalous content when compared to geometric velocities along paths to test images.

What would settle it

On MVTec-AD or VisA images containing known anomalies, compute mismatch scores after replacing anomalous regions with normal content; if scores do not drop substantially to match normal images, the separation claim fails.

Figures

Figures reproduced from arXiv: 2605.23070 by Hao Yan, Kamran Paynabar, Mehrdad Moradi, Shengzhe Chen.

Figure 1
Figure 1. Figure 1: Velocity mismatch in the (x, t) plane. Light arrows denote the learned flow vθ(x, t) from prior to normal data, while colored affine paths from the same x0 target either a normal sample (green) or an anomalous sample (red). At selected times, the predicted velocity vθ(xt, t) aligns with the geometric velocity y − x0 for normal targets but disagrees for anomalous targets, revealing the anomaly signal. Theor… view at source ↗
Figure 2
Figure 2. Figure 2: Accuracy–speed trade-offs under varying numbers of paths [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of Flow Mismatching. Top: normalized mismatch maps of ∥vθ(xt, t) − (y − x0)∥ 2 over (x1, x2) at several t and a t 2 -weighted aggregate (right); the blue arc is the normal support and scores grow off-manifold. Small t is smooth but weakly discriminative; large t tracks geometry better but is noisier; multi-t weighting combines both. Bottom: flows and affine paths for on-manifold y vs. off-mani… view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of FM-based heatmaps. Ground truths are shown by white lines. that rely on test-time detection, including D-Flow [Ben-Hamu et al., 2024] accelerated by Flow￾Grad Liu et al. [2023a], and ReconFlow, a reconstruction-style counterpart of our method. For controlled comparison, we adapt all FM baselines to the same image-domain U-Net backbone and identical training/inference protocol. Therefore th… view at source ↗
Figure 5
Figure 5. Figure 5: Half-moon toy analysis: anomaly score maps under different [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Toy example 2: time-dependent flow mismatch score maps and weighted aggregation. [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Toy example 3: flow mismatching under a pooled category case. [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Training-epoch ablation on MVTec-AD. We evaluate checkpoints from epoch 100 to epoch [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Failure cases of our method. D.3 Limitation. In our perspective, there are three main limitations of the current work. First, Flow Mismatching requires O(KT) forward evaluations per test image, where K is the number of sampled paths and T is the number of time steps. Thus, its throughput depends directly on the chosen test-time compute budget. To make this trade-off explicit, we report performance across d… view at source ↗
Figure 10
Figure 10. Figure 10: Per-category ablation on MVTec-AD with fixed [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Per-category ablation on MVTec-AD with fixed [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Per-category ablation on MVTec-AD under joint test-time compute scaling with [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Qualitative results of Flow Mismatching on VisA dataset. [PITH_FULL_IMAGE:figures/full_fig_p034_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Qualitative results of Flow Mismatching on MVTec dataset. [PITH_FULL_IMAGE:figures/full_fig_p035_14.png] view at source ↗
read the original abstract

We propose Flow Mismatching, an unsupervised anomaly detection method that deliberately avoids reconstruction-based paradigms. Instead, we treat flow matching as geometric dynamics and leverage a key insight: anomalies occur at places where the learned normal flow disagrees with the geometric path toward a test image. Given a flow matching model trained only on normal images, we probe its learned velocity field along affine paths from Gaussian noise to a target image. Along each path, we compare the model-predicted velocity, which follows normal generative dynamics, with the geometric velocity toward the target, which includes any anomalous content. Anomalies induce strong local disagreement between these velocities. Aggregating the mismatch over different time steps and multiple paths yields pixel-wise heatmaps and image-level scores without test-time optimization, feature memories, or additional calibration. Our analysis shows that the population mismatch decomposes into an irreducible denoising term and a Fisher-divergence term between the test-path and normal-path score functions, which identifies the score-gap component that drives anomaly separation and explains the effectiveness of robust path aggregation. Extensive experiments on MVTec-AD and VisA demonstrate superior performance compared with SOTA reconstruction-based and recent flow matching-based approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Flow Mismatching, an unsupervised anomaly detection method that trains a flow matching model exclusively on normal images and then probes its learned velocity field along affine paths from Gaussian noise to a target test image. Model-predicted velocities (following normal generative dynamics) are compared to geometric velocities toward the target (which incorporate anomalous content); local disagreements are aggregated over time steps and multiple paths to produce pixel-wise heatmaps and image-level scores without test-time optimization, feature memories, or calibration. A population-level analysis decomposes the mismatch into an irreducible denoising term plus a Fisher-divergence term between test-path and normal-path score functions, which is claimed to drive anomaly separation. Experiments on MVTec-AD and VisA report superior performance over reconstruction-based and recent flow-matching baselines.

Significance. If the central claims hold, the work introduces a non-reconstruction paradigm for anomaly detection grounded in generative dynamics, with an explicit population decomposition that explains why velocity mismatches can isolate anomalies. Strengths include the absence of test-time optimization or auxiliary memories and the provision of a theoretical account (via the Fisher term) for the effectiveness of path aggregation. This could influence future generative-model-based detection methods by shifting focus from reconstruction error to velocity discrepancies, provided the population-to-instance translation is secured.

major comments (2)
  1. [§4] §4 (Mismatch Decomposition): The population-level decomposition of mismatch into an irreducible denoising term and a Fisher-divergence term between test-path and normal-path score functions is derived, but no explicit bound, concentration inequality, or per-image argument is supplied showing that the Fisher term remains dominant and separable when anomalous content is localized within a single test image rather than averaged over the population. This gap directly affects the reliability of the claimed pixel-wise heatmaps and image-level scores.
  2. [§3, §5] §3 (Method) and §5 (Experiments): The robust path aggregation is presented as mitigating the mismatch, yet no ablation quantifies how sensitive the reported gains are to the specific choice of affine paths, number of paths, or aggregation operator; without this, it is unclear whether the superiority over baselines stems from the core velocity-discrepancy idea or from post-hoc aggregation tuning.
minor comments (2)
  1. [§2] Notation for the geometric velocity and model velocity should be introduced with explicit symbols in §2 or §3 to avoid ambiguity when the decomposition is later referenced.
  2. [Abstract, §1] The abstract and §1 claim 'without additional calibration,' but the precise meaning of calibration (e.g., threshold selection on validation normals) should be clarified to distinguish from standard practice in anomaly detection.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the scope of our theoretical and empirical contributions. We address each major point below.

read point-by-point responses
  1. Referee: [§4] §4 (Mismatch Decomposition): The population-level decomposition of mismatch into an irreducible denoising term and a Fisher-divergence term between test-path and normal-path score functions is derived, but no explicit bound, concentration inequality, or per-image argument is supplied showing that the Fisher term remains dominant and separable when anomalous content is localized within a single test image rather than averaged over the population. This gap directly affects the reliability of the claimed pixel-wise heatmaps and image-level scores.

    Authors: We agree that the decomposition is strictly population-level and that the manuscript does not supply a concentration inequality or per-image argument establishing dominance of the Fisher term for localized anomalies. The population analysis is intended to identify the score-gap mechanism that motivates the method, while the pixel-wise heatmaps rely on empirical aggregation of mismatches. We will revise §4 to explicitly state this scope limitation and note that the per-instance reliability is supported by the reported experiments rather than by a formal bound. revision: partial

  2. Referee: [§3, §5] §3 (Method) and §5 (Experiments): The robust path aggregation is presented as mitigating the mismatch, yet no ablation quantifies how sensitive the reported gains are to the specific choice of affine paths, number of paths, or aggregation operator; without this, it is unclear whether the superiority over baselines stems from the core velocity-discrepancy idea or from post-hoc aggregation tuning.

    Authors: We acknowledge that the current experiments do not include ablations on the number of paths, the affine-path parameterization, or the aggregation operator. In the revised manuscript we will add these ablations (varying path count from 1 to 16, comparing affine versus alternative interpolants, and testing mean versus median aggregation) to quantify sensitivity and isolate the contribution of the velocity-discrepancy signal from the aggregation procedure. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained mathematical decomposition

full rationale

The paper's central analysis decomposes population-level mismatch between model velocity and geometric velocity into an irreducible denoising term plus a Fisher-divergence term between score functions. This is presented as an explanatory identity rather than a fitted or self-defined quantity. No equations reduce a prediction to its own inputs by construction, no load-bearing self-citations are invoked for uniqueness, and the per-image aggregation step is described as direct computation without requiring the target result as an assumption. The method is therefore not circular; it applies a trained flow-matching velocity field to new paths and measures discrepancy.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities can be extracted from the provided text.

pith-pipeline@v0.9.0 · 5749 in / 1149 out tokens · 20305 ms · 2026-05-25T05:18:23.339552+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 3 internal anchors

  1. [1]

    Sensors , volume=

    Anomaly detection neural network with dual auto-encoders GAN and its industrial inspection applications , author=. Sensors , volume=. 2020 , publisher=

  2. [2]

    Textile Research Journal , volume=

    Attention-based feature fusion generative adversarial network for yarn-dyed fabric defect detection , author=. Textile Research Journal , volume=. 2023 , publisher=

  3. [3]

    Sixteenth International Conference on Quality Control by Artificial Vision , volume=

    f-AnoGAN for non-destructive testing in industrial anomaly detection , author=. Sixteenth International Conference on Quality Control by Artificial Vision , volume=. 2023 , organization=

  4. [4]

    arXiv preprint arXiv:2501.11310 , year=

    Anomaly detection for industrial applications, its challenges, solutions, and future directions: A review , author=. arXiv preprint arXiv:2501.11310 , year=

  5. [5]

    European conference on computer vision , pages=

    Transfusion--a transparency-based diffusion model for anomaly detection , author=. European conference on computer vision , pages=. 2024 , organization=

  6. [6]

    arXiv preprint arXiv:2502.19200 , year=

    HDM: Hybrid Diffusion Model for Unified Image Anomaly Detection , author=. arXiv preprint arXiv:2502.19200 , year=

  7. [7]

    International Journal of Machine Learning and Cybernetics , volume=

    Industrial product surface defect detection via the fast denoising diffusion implicit model , author=. International Journal of Machine Learning and Cybernetics , volume=. 2024 , publisher=

  8. [8]

    Sensors , volume=

    Latent Diffusion Models to Enhance the Performance of Visual Defect Segmentation Networks in Steel Surface Inspection , author=. Sensors , volume=. 2024 , publisher=

  9. [9]

    Transactions on Machine Learning Research , year=

    Error bounds for flow matching methods , author=. Transactions on Machine Learning Research , year=

  10. [10]

    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

    MVTec AD—A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection , author=. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=. 2019 , organization=

  11. [11]

    European Conference on Computer Vision , pages=

    SPot-the-Difference Self-supervised Pre-training for Anomaly Detection and Segmentation , author=. European Conference on Computer Vision , pages=

  12. [12]

    30th IEEE/IES International Symposium on Industrial Electronics (ISIE) , year =

    Mishra, Pankaj and Verk, Riccardo and Fornasier, Daniele and Piciarelli, Claudio and Foresti, Gian Luca , title =. 30th IEEE/IES International Symposium on Industrial Electronics (ISIE) , year =

  13. [13]

    Time-reversed Flow Matching with Worst Transport in High-dimensional Latent Space for Image Anomaly Detection

    How and Why: Taming Flow Matching for Unsupervised Anomaly Detection and Localization , author=. arXiv preprint arXiv:2508.05461 , year=

  14. [14]

    Under Review at ICLR , year=

    Diverging Flows: Detecting Out-of-Distribution Inputs in Conditional Generation , author=. Under Review at ICLR , year=

  15. [15]

    Medical Image Computing and Computer Assisted Intervention (MICCAI) , year=

    Reflect: Rectified Flows for Efficient Brain Anomaly Correction Transport , author=. Medical Image Computing and Computer Assisted Intervention (MICCAI) , year=

  16. [16]

    Advances in Neural Information Processing Systems (NeurIPS) , year=

    Scalable, Explainable and Provably Robust Anomaly Detection with One-Step Flow Matching , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

  17. [17]

    arXiv preprint arXiv:2508.11594 , year=

    It's not a FAD: first results in using Flows for unsupervised Anomaly Detection at 40 MHz at the Large Hadron Collider , author=. arXiv preprint arXiv:2508.11594 , year=

  18. [18]

    Advances in Neural Information Processing Systems (NeurIPS) , year=

    MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

  19. [19]

    CVPR Workshops (VAND) , year=

    No-MambAAD: Revitalizing Conv-Only Networks for Unsupervised Anomaly Detection , author=. CVPR Workshops (VAND) , year=

  20. [20]

    arXiv preprint arXiv:2512.23818 , year=

    Energy-Tweedie: Score meets Score, Energy meets Energy , author=. arXiv preprint arXiv:2512.23818 , year=

  21. [21]

    arXiv preprint arXiv:2511.00540 , year=

    Real-IAD Variety: Pushing Industrial Anomaly Detection Dataset to a Modern Era , author=. arXiv preprint arXiv:2511.00540 , year=

  22. [22]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Training-Free Industrial Defect Generation with Diffusion Models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  23. [23]

    Flow Matching for Generative Modeling

    Flow Matching for Generative Modeling , author=. arXiv preprint arXiv:2210.02747 , year=

  24. [24]

    IEEE Transactions on Systems, Man, and Cybernetics , volume=

    A Threshold Selection Method from Gray-Level Histograms , author=. IEEE Transactions on Systems, Man, and Cybernetics , volume=. 1979 , publisher=

  25. [25]

    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

    Anomaly Detection via Reverse Distillation from One-Class Embedding , author=. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=. 2022 , organization=

  26. [26]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    SimpleNet: A Simple Network for Image Anomaly Detection and Localization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  27. [27]

    2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

    DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection , author=. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=. 2023 , organization=

  28. [28]

    Advances in Neural Information Processing Systems , volume=

    A unified model for multi-class anomaly detection , author=. Advances in Neural Information Processing Systems , volume=

  29. [29]

    Advances in Neural Information Processing Systems , volume=

    Recontrast: Domain-specific anomaly detection via contrastive reconstruction , author=. Advances in Neural Information Processing Systems , volume=

  30. [30]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    A diffusion-based framework for multi-class anomaly detection , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  31. [31]

    Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

    Dinomaly: The less is more philosophy in multi-class unsupervised anomaly detection , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

  32. [32]

    Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

    Exploring intrinsic normal prototypes within a single image for universal anomaly detection , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

  33. [33]

    Proceedings of the 41st International Conference on Machine Learning , pages=

    D-Flow: differentiating through flows for controlled generation , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

  34. [34]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Flowgrad: Controlling the output of generative odes with gradients , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  35. [35]

    2024 , eprint=

    Flow Matching Guide and Code , author=. 2024 , eprint=

  36. [36]

    International Joint Conference on Artificial Intelligence , pages=

    Ddpm-moco: Advancing industrial surface defect generation and detection with generative and contrastive learning , author=. International Joint Conference on Artificial Intelligence , pages=. 2024 , organization=

  37. [37]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

    DiffusionAD: Norm-guided one-step denoising diffusion for anomaly detection , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

  38. [38]

    CVIU , year=

    Exploring Plain ViT Reconstruction for Multi-class Unsupervised Anomaly Detection , author=. CVIU , year=

  39. [39]

    Proceedings of the 37th International Conference on Neural Information Processing Systems , pages=

    Hierarchical vector quantized transformer for multi-class unsupervised anomaly detection , author=. Proceedings of the 37th International Conference on Neural Information Processing Systems , pages=

  40. [40]

    Forty-second International Conference on Machine Learning , pages=

    OmiAD: One-step adaptive masked diffusion model for multi-class anomaly detection via adversarial distillation , author=. Forty-second International Conference on Machine Learning , pages=

  41. [41]

    Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=

    Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows , author=. Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=

  42. [42]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Revisiting reverse distillation for anomaly detection , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  43. [43]

    Forty-second International Conference on Machine Learning , pages=

    An error analysis of flow matching for deep generative modeling , author=. Forty-second International Conference on Machine Learning , pages=

  44. [44]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Emerging properties in self-supervised vision transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  45. [45]

    International Conference on Learning Representations , year=

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author=. International Conference on Learning Representations , year=

  46. [46]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces , author=. arXiv preprint arXiv:2312.00752 , year=