arxiv: 2605.02438 · v2 · submitted 2026-05-04 · 💻 cs.CV · cs.LG

Recognition: unknown

Mixture Prototype Flow Matching for Open-Set Supervised Anomaly Detection

Dan Wang, Fuyun Wang, Hui Yan, Sujia Huang, Tong Zhang, Xin Liu, Xu Guo, Yuanzhi Wang, Zhen Cui

Pith reviewed 2026-05-14 20:57 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords open-set anomaly detectionflow matchingprototype learningGaussian mixturesupervised anomaly detectionmutual information regularization

0 comments

The pith

Mixture Prototype Flow Matching models the velocity field as a Gaussian mixture prior to capture multi-modality in normal data for open-set anomaly detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that open-set supervised anomaly detection improves when normal data is modeled with multiple modes rather than a single Gaussian prior. Existing prototype methods produce blurred boundaries because they ignore the distinct classes within normal examples. MPFM instead learns a continuous flow that maps normal features to a target space structured as a Gaussian mixture, with each component assigned to one normal class. A mutual information regularizer keeps the components separate and increases the gap to anomalies. If the approach holds, it delivers stronger detection of anomalies never seen in training across standard benchmarks.

Core claim

MPFM learns a continuous transformation from normal feature distributions to a structured Gaussian mixture prototype space by explicitly modeling the velocity field as a Gaussian mixture prior where each component corresponds to a distinct normal class. This design facilitates mode-aware and semantically coherent distribution transport. A Mutual Information Maximization Regularizer is added to prevent prototype collapse and maximize normal-anomaly separability.

What carries the argument

The mixture velocity field in flow matching, with each Gaussian component corresponding to a distinct normal class.

Load-bearing premise

Normal data inherently exhibits multi-modality that is best captured by a Gaussian mixture in prototype space.

What would settle it

A controlled test on a dataset with clearly separated normal classes where replacing the mixture velocity field with a single Gaussian yields equal or higher anomaly detection scores would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.02438 by Dan Wang, Fuyun Wang, Hui Yan, Sujia Huang, Tong Zhang, Xin Liu, Xu Guo, Yuanzhi Wang, Zhen Cui.

**Figure 1.** Figure 1: (a) Existing method assumes a unimodal normal distribution, overlooking intrinsic multi-modality and causing false positives. (b) Our method learns a continuous multi-modal prototype space via flow, capturing intra-class diversity, concentrating normal density, and enlarging the margin to true anomalies. 2024) and few-shot AD (FSAD) (Pang et al., 2021; Hu et al., 2024) focus on modeling normal distributi… view at source ↗

**Figure 2.** Figure 2: Ablation study for MPFL and MIMR under the general settings and hard settings view at source ↗

read the original abstract

Open-set supervised anomaly detection (OSAD) aims to identify unseen anomalies using limited anomalous supervision. However, existing prototype-based methods typically model normal data via a unimodal Gaussian prior, failing to capture inherent multi-modality and resulting in blurred decision boundaries. To address this, we propose Mixture Prototype Flow Matching (MPFM), a framework that learns a continuous transformation from normal feature distributions to a structured Gaussian mixture prototype space. Departing from traditional flow-based approaches that rely on a single velocity vector, MPFM explicitly models the velocity field as a Gaussian mixture prior where each component corresponds to a distinct normal class. This design facilitates mode-aware and semantically coherent distribution transport. Furthermore, we introduce a Mutual Information Maximization Regularizer (MIMR) to prevent prototype collapse and maximize normal-anomaly separability. Extensive experiments demonstrate that MPFM achieves state-of-the-art performance across diverse benchmarks under both single- and multi-anomaly settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MPFM adds a mixture velocity field to flow matching for open-set anomaly detection but skips the derivation needed to confirm it produces a valid transport path.

read the letter

MPFM introduces a mixture velocity field inside flow matching for open-set supervised anomaly detection, along with a mutual information maximization regularizer to keep prototypes distinct. The goal is to better capture multi-modal normal data instead of forcing a single Gaussian prior that blurs boundaries. The paper does a good job spotting how unimodal assumptions hurt prototype methods when normals come from several classes or modes. Using flow matching as the base is sensible since it can model continuous changes, and tying each mixture component to a normal class prototype is a direct way to address the issue. The MIMR addition aims to maximize separability, which fits the open-set setting where unseen anomalies need clear distinction from normals. The soft spot is the missing link between the mixture velocity and a proper probability path. Flow matching works by learning a velocity that follows the continuity equation along some path from the starting distribution to the target. Simply setting the velocity to a Gaussian mixture prior does not by itself ensure the integrated flow reaches the right marginals at t=1 or preserves the measure without collapse. The abstract gives no derivation or explicit path construction like linear interpolation between modes, so it's unclear if the ODE actually delivers mode-aware transport or just regresses to something that happens to work on the benchmarks. Experiments are said to show SOTA results on diverse datasets for single and multi-anomaly cases, but without visible ablations on the number of components or error analysis, it's hard to tell how robust the gains are. The number of mixture components is a free parameter that could require tuning per dataset. This paper is for computer vision folks working on anomaly detection, particularly those using prototypes or generative flows. A reader in that area might find the design choices practical to try, even if the theory needs more support. It shows honest engagement with the limitations of prior work, so it is worth a serious referee. I would recommend sending it to peer review rather than desk rejecting it.

Referee Report

2 major / 2 minor

Summary. The paper proposes Mixture Prototype Flow Matching (MPFM) for open-set supervised anomaly detection. It replaces the unimodal Gaussian prior used in existing prototype-based methods with a velocity field explicitly modeled as a Gaussian mixture prior (one component per normal class) to transport normal features to a structured prototype space. A Mutual Information Maximization Regularizer (MIMR) is introduced to avoid prototype collapse. The abstract claims state-of-the-art results on diverse benchmarks under both single- and multi-anomaly settings.

Significance. If the mixture velocity field can be shown to induce a valid probability path whose marginals match the normal feature distribution at t=0 and the target Gaussian mixture at t=1 without mode collapse, the approach would address a genuine limitation of unimodal prototypes in multi-modal normal data. Reproducible code and clear ablation isolating the mixture velocity from MIMR would strengthen the contribution to flow-matching methods for anomaly detection.

major comments (2)

[Abstract and §3] Abstract and §3 (Method): the statement that modeling the velocity field directly as a Gaussian mixture prior “facilitates mode-aware and semantically coherent distribution transport” is not supported by a derivation showing that the chosen v_t(x) satisfies the continuity equation for the required marginals p_0 (normal features) and p_1 (structured mixture). No path construction (linear interpolation, OT, or otherwise) is specified, so it is unclear whether the integrated ODE remains measure-preserving or avoids boundary blurring.
[§4] §4 (Experiments): the SOTA claims are presented without ablations that isolate the mixture velocity field from the MIMR regularizer, without error bars or statistical significance tests across runs, and without analysis of failure cases (e.g., mode collapse on particular datasets). This leaves open whether performance gains are attributable to the central modeling choice.

minor comments (2)

[§3] Notation: define the precise parameterization of the mixture velocity field (weights, means, covariances) and how they are learned or fixed.
[Figures] Figure clarity: ensure that any velocity-field or transport visualizations include both source normal features and target prototypes so readers can assess mode preservation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the theoretical foundations and experimental rigor of our work. We address each major comment point by point below.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Method): the statement that modeling the velocity field directly as a Gaussian mixture prior “facilitates mode-aware and semantically coherent distribution transport” is not supported by a derivation showing that the chosen v_t(x) satisfies the continuity equation for the required marginals p_0 (normal features) and p_1 (structured mixture). No path construction (linear interpolation, OT, or otherwise) is specified, so it is unclear whether the integrated ODE remains measure-preserving or avoids boundary blurring.

Authors: We acknowledge that an explicit derivation would strengthen the presentation. In the flow-matching framework used in the manuscript, the velocity field is defined as a mixture v_t(x) = ∑_k π_k v_t^k(x) where each conditional velocity v_t^k follows the standard linear interpolation path between samples from the normal feature distribution and the corresponding prototype component. By the properties of conditional flow matching, this construction satisfies the continuity equation for the marginals p_0 and p_1. In the revised manuscript we will add a dedicated paragraph in §3 deriving the marginal probability path and confirming measure preservation under the mixture velocity field. revision: yes
Referee: [§4] §4 (Experiments): the SOTA claims are presented without ablations that isolate the mixture velocity field from the MIMR regularizer, without error bars or statistical significance tests across runs, and without analysis of failure cases (e.g., mode collapse on particular datasets). This leaves open whether performance gains are attributable to the central modeling choice.

Authors: We agree that isolating the contribution of the mixture velocity field is essential. The revised experimental section will include: (i) ablations that replace the mixture velocity with a unimodal Gaussian velocity while keeping MIMR, and vice versa; (ii) mean and standard deviation over five independent runs together with paired t-test significance results against the strongest baseline; and (iii) a short failure-case subsection discussing datasets where prototype collapse was observed and how MIMR mitigates it. These additions will be placed in §4 and the supplementary material. revision: yes

Circularity Check

0 steps flagged

No significant circularity: modeling choice and regularizer are independent design decisions

full rationale

The paper presents the mixture velocity field as an explicit modeling ansatz (Gaussian mixture prior per normal class) and introduces MIMR as a separate regularizer to prevent collapse. These are framed as design choices to capture multi-modality, not derived from or equivalent to fitted inputs, self-citations, or prior results by the same authors. No equations reduce the velocity field construction or transport claims to tautologies. Performance is validated empirically on external benchmarks rather than by construction. This is the common case of a self-contained empirical modeling paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The claim rests on the domain assumption that normal data is multi-modal and that a Gaussian mixture in prototype space plus mixture velocity transport will produce coherent separation; no free parameters are explicitly named in the abstract, but the number of mixture components is implicitly chosen to match the number of normal classes.

free parameters (1)

number of mixture components
Set to match the number of distinct normal classes; chosen based on dataset structure rather than learned from data.

axioms (1)

domain assumption Normal feature distributions are multi-modal and can be represented by a Gaussian mixture in prototype space
Invoked to justify departing from unimodal Gaussian priors.

invented entities (2)

Mixture velocity field no independent evidence
purpose: To provide mode-aware transport from normal features to prototype space
New modeling choice introduced in the framework.
Mutual Information Maximization Regularizer (MIMR) no independent evidence
purpose: Prevent prototype collapse and maximize normal-anomaly separability
New regularizer introduced to stabilize training.

pith-pipeline@v0.9.0 · 5479 in / 1478 out tokens · 39884 ms · 2026-05-14T20:57:31.161817+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 7 canonical work pages · 4 internal anchors

[1]

Gaussian mixture flow matching models.arXiv preprint arXiv:2504.05304,

Chen, H., Zhang, K., Tan, H., Xu, Z., Luan, F., Guibas, L., Wetzstein, G., and Bi, S. Gaussian mixture flow matching models.arXiv preprint arXiv:2504.05304,

work page arXiv
[2]

Crafting papers on machine learning

Langley, P. Crafting papers on machine learning. In Langley, P. (ed.),Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp. 1207–1216, Stan- ford, CA,

2000
[3]

Flow Matching for Generative Modeling

Lipman, Y ., Chen, R. T., Ben-Hamu, H., Nickel, M., and Le, M. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

9 Submission and Formatting Instructions for ICML 2026 Liu, X., Gong, C., and Liu, Q. Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003,

work page internal anchor Pith review Pith/arXiv arXiv 2026
[5]

Decoupled Weight Decay Regularization

Loshchilov, I. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Pang, G., Ding, C., Shen, C., and Hengel, A. v. d. Ex- plainable deep few-shot anomaly detection with deviation networks.arXiv preprint arXiv:2108.00462,

work page arXiv
[7]

Anomaly-Preference Image Generation

Wang, F., Zhang, T., Wang, Y ., Qiu, Y ., Liu, X., Guo, X., and Cui, Z. Distribution prototype diffusion learning for open-set supervised anomaly detection. InProceedings of the Computer Vision and Pattern Recognition Conference, pp. 20416–20426, 2025a. Wang, F., Zhang, T., Wang, Y ., Zhang, X., Liu, X., and Cui, Z. Scene graph-grounded image generation. ...

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Hierarchi- cal gaussian mixture normalizing flow modeling for uni- fied anomaly detection.arXiv preprint arXiv:2403.13349,

Yao, X., Li, R., Qian, Z., Wang, L., and Zhang, C. Hierarchi- cal gaussian mixture normalizing flow modeling for uni- fied anomaly detection.arXiv preprint arXiv:2403.13349,

work page arXiv
[9]

Dataset Statistics We evaluate our approach through extensive experiments on nine real-world anomaly detection datasets

10 Submission and Formatting Instructions for ICML 2026 A. Dataset Statistics We evaluate our approach through extensive experiments on nine real-world anomaly detection datasets. Key statistics of all datasets are summarized in Tab

2026
[10]

For MVTec AD, we adopt its original data split for normal samples

Our experimental setup follows established protocols from prior open-set supervised anomaly detection (OSAD) literature. For MVTec AD, we adopt its original data split for normal samples. For the remaining eight datasets, normal samples are randomly divided into training and testing sets with a ratio of 3:1. Table 5.Summary statistics of the nine real-wor...

2021
[11]

It consists of four main categories and 23 subclasses

is a large-scale, open-access gastrointestinal imaging dataset collected from real endoscopy and colonoscopy examinations. It consists of four main categories and 23 subclasses. We focus on 11 Submission and Formatting Instructions for ICML 2026 endoscopic images, where anatomical landmark categories are treated as normal samples and pathological categori...

2026
[12]

Table 6.AUC performance (mean ± std) across nine real-world AD datasets is reported under the general setting

is an intracranial hemorrhage detection dataset based on head computed tomography (CT) scans. Table 6.AUC performance (mean ± std) across nine real-world AD datasets is reported under the general setting. red highlights the best results, and blue indicates sub-optimal outcomes. All baseline SOTA results are sourced from the original papers (Ding et al., 2...

2022
[13]

(44) Substituting Eqn

=N zt−∆t;c 1zt +c 2z0, c 3I , (43) with coefficients determined by the schedule: c1 = σ2 t−∆t σ2 t αt αt−∆t , c 2 = βt,∆t σ2 t αt−∆t, c 3 = βt,∆t σ2 t σ2 t−∆t. (44) Substituting Eqn. (43) and the mixture posterior Eqn. (38) into Eqn. (42), and using the fact that the convolution of a Gaussian and a Gaussian mixture remains a Gaussian mixture, we obtain: q...

2026