MM++: Unsupervised Scale-Invariant Multilayer OOD Detection via Top-K Gated Feature Fusion

Kyoung-Don Kang; Md Farhan Shadiq; Md Tawheedul Islam Bhuian; Rahim Hossain

arxiv: 2606.17352 · v1 · pith:57XS62OTnew · submitted 2026-06-15 · 💻 cs.LG · cs.CV

MM++: Unsupervised Scale-Invariant Multilayer OOD Detection via Top-K Gated Feature Fusion

Rahim Hossain , Md Tawheedul Islam Bhuian , Md Farhan Shadiq , Kyoung-Don Kang This is my paper

Pith reviewed 2026-06-27 02:59 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords OOD detectionmultilayer feature fusionunsupervised detectionMahalanobis distanceentropy densitypost-hoc methodscale invariance

0 comments

The pith

MM++ fuses entropy-selected intermediate layers with the final representation using a regularized covariance to detect out-of-distribution inputs without any model changes or extra data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MM++ as an unsupervised method that improves out-of-distribution detection by building a joint feature space from multiple layers. It locates useful layers by finding sharp drops in entropy density that indicate strong semantic compression, then combines those layers with the terminal output. A Ledoit-Wolf regularized tied covariance matrix keeps the combined space stable for distance-based scoring. The approach stays post-hoc, requires no fine-tuning or auxiliary samples, and aims to work on varied network architectures for both near and far out-of-distribution cases.

Core claim

MM++ constructs a principled joint feature space by first identifying discriminative intermediate layers through entropy density drops that mark boundaries of sharp semantic compression, fuses the selected layers with the terminal representation, and stabilizes the unified space with a Ledoit-Wolf regularized tied covariance matrix to enable reliable distance estimation for out-of-distribution detection.

What carries the argument

Entropy density drop measurement to select layers for top-k gated fusion, combined with Ledoit-Wolf regularized tied covariance matrix for stable joint space distance estimation.

If this is right

OOD detection becomes possible without auxiliary out-of-distribution data or classifier fine-tuning.
The same procedure applies across different network architectures for both near- and far-OOD tasks.
Cross-layer correlations are captured while early-layer noise is reduced through selective fusion.
The framework remains strictly post-hoc and scale-invariant.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The layer selection step could be reused to locate the most class-discriminative stages inside a network for other post-hoc analyses.
If entropy drops consistently locate compression points, the same signal might support decisions about layer pruning or model compression.
The joint space construction might transfer to uncertainty quantification tasks beyond binary OOD detection.

Load-bearing premise

Entropy density drops reliably mark the boundaries of sharp semantic compression and thereby identify the most discriminative intermediate layers for fusion.

What would settle it

Running MM++ on a standard classifier such as ResNet on CIFAR-10 versus SVHN and finding that detection performance measured by AUROC does not exceed the single-layer Mahalanobis baseline.

Figures

Figures reproduced from arXiv: 2606.17352 by Kyoung-Don Kang, Md Farhan Shadiq, Md Tawheedul Islam Bhuian, Rahim Hossain.

**Figure 1.** Figure 1: Overview of the MM++ Framework. Unsupervised, scale-invariant multi-layer OOD detection via top-K gated feature fusion (illustrated with K = 2, ConvNeXt-T on ImageNet-LT as ID, and ImageNet-C as OOD). Left (Pipeline): MM++ first identifies top-K layers by leveraging entropy density drops to capture maximum cross-layer compression, anchoring the penultimate layer. These intermediate and terminal features ar… view at source ↗

**Figure 2.** Figure 2: Score distributions for ViT-B/16 across five OOD benchmarks with ImageNet-LT as ID. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Score distributions for Swin-T across six OOD benchmarks with ImageNet-LT as ID. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Sensitivity of MM++ to K (4) Robust precision estimation. While traditional methods utilize pseudo-inverse estimation for the tied precision matrix, MM++ adopts Ledoit–Wolf shrinkage for wellconditioned covariance. This numerically stable foundation yields substantial empirical gains—an 8.26% AUROC and 17.5% FPR95 improvement over the pseudo-inverse baseline. Crucially, this effectiveness is intrinsical… view at source ↗

**Figure 5.** Figure 5: AUROC comparison across datasets, visualizing the results in Tables 1 and 2 [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: AUROC of MM++ as a function of the number of fused layers [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Score distributions for ConvNeXt-T across six OOD benchmarks with ImageNet-LT as ID. [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Practical overhead comparison on ViT-B/16. [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

read the original abstract

We introduce MM++ (Multilayer Mahalanobis++), a fully unsupervised, strictly post-hoc, and scale-invariant framework for Out-of-Distribution (OOD) detection. To address the trade-off between scale invariance and hierarchical expressivity, MM++ constructs a principled joint feature space. It first identifies discriminative intermediate layers by measuring entropy density drops, which mark the boundaries of sharp semantic compression. By fusing these selected layers with the terminal representation, the framework captures latent cross-layer correlations while mitigating early-layer noise. Crucially, a Ledoit-Wolf regularized tied covariance matrix stabilizes this unified space, enabling reliable distance estimation. Requiring no auxiliary OOD data, classifier fine-tuning, or architectural modifications, MM++ delivers robust performance across distinct architectures for both near- and far-OOD detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MM++ combines standard Mahalanobis, multilayer features, and Ledoit-Wolf with an entropy-based layer selector, but the abstract shows no results, comparisons, or ablations so the claims cannot be checked.

read the letter

The core idea is to pick intermediate layers where entropy density drops sharply, fuse their features with the final layer via top-k gating, and run Mahalanobis distance on the combined space after Ledoit-Wolf shrinkage. This is meant to give scale-invariant, post-hoc OOD detection without any extra data or retraining.

It does pull together existing pieces in a specific way and states the practical constraints clearly. The use of tied covariance and shrinkage is a reasonable choice for stability when features come from different layers.

The problem is that the abstract asserts robust performance on near- and far-OOD across architectures but supplies zero numbers, zero error bars, zero ablation tables, and zero head-to-head comparisons with prior Mahalanobis variants or other post-hoc detectors. The entropy-density rule for layer selection is presented as the key novelty, yet nothing shows whether it actually improves over simpler heuristics or whether the claimed boundaries of semantic compression hold up. Without those checks the central claim rests on an unshown empirical outcome.

The paper is aimed at researchers who already work on unsupervised OOD detection and want another post-hoc option. A reader would get value only if the full version contains reproducible experiments and honest baselines; the current material does not reach that bar. It does not look ready for serious refereeing because there is no evidence to referee.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces MM++ (Multilayer Mahalanobis++), a fully unsupervised and post-hoc framework for OOD detection. It selects discriminative intermediate layers by detecting entropy density drops that purportedly mark boundaries of sharp semantic compression, fuses the selected layers with the terminal representation via Top-K gated feature fusion to capture cross-layer correlations while reducing early-layer noise, and employs a Ledoit-Wolf regularized tied covariance matrix to stabilize the joint feature space for Mahalanobis distance scoring. The method requires no auxiliary OOD data, classifier fine-tuning, or architectural modifications and claims robust performance for both near- and far-OOD detection across distinct architectures while remaining scale-invariant.

Significance. If the empirical claims hold, MM++ would represent a practical advance in unsupervised OOD detection by offering a principled approach to multi-layer feature fusion that avoids the need for extra data or model changes. The reliance on standard Ledoit-Wolf regularization for covariance estimation could provide a reproducible stabilization technique, and the absence of free parameters or invented entities in the high-level description is a positive attribute.

major comments (2)

[Abstract] Abstract: The central claim that MM++ 'delivers robust performance across distinct architectures for both near- and far-OOD detection' is asserted without any quantitative results, tables, figures, error bars, or ablation evidence in the supplied text. This absence directly undermines assessment of the primary empirical contribution.
[Abstract] Abstract: The layer-selection procedure rests on the unelaborated claim that 'entropy density drops reliably mark the boundaries of sharp semantic compression'; no definition of entropy density, computation details, or justification for its reliability is provided, yet this step is load-bearing for the subsequent Top-K fusion and overall performance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address each major comment below, clarifying the role of the abstract versus the full manuscript and indicating planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that MM++ 'delivers robust performance across distinct architectures for both near- and far-OOD detection' is asserted without any quantitative results, tables, figures, error bars, or ablation evidence in the supplied text. This absence directly undermines assessment of the primary empirical contribution.

Authors: The supplied text is the abstract, which by design is a concise summary. The full manuscript (Sections 4–5) contains the requested quantitative evidence: tables reporting AUROC/AUPR/FPR95 across ResNet, DenseNet, and ViT architectures; near-OOD (CIFAR-10 vs. CIFAR-100) and far-OOD (SVHN, Texture, LSUN) benchmarks; multiple random seeds with error bars; and ablation studies on layer selection and fusion. To improve the abstract’s informativeness while remaining within length limits, we will add one sentence citing the key average AUROC gains. revision: yes
Referee: [Abstract] Abstract: The layer-selection procedure rests on the unelaborated claim that 'entropy density drops reliably mark the boundaries of sharp semantic compression'; no definition of entropy density, computation details, or justification for its reliability is provided, yet this step is load-bearing for the subsequent Top-K fusion and overall performance.

Authors: The abstract is intentionally brief. The full manuscript defines entropy density in Section 3.1 as the layer-wise average of normalized Shannon entropy computed on binned, softmax-activated feature histograms, provides the exact drop-detection algorithm, and supplies both an information-bottleneck theoretical motivation and empirical validation (Figure 2) showing alignment with semantic compression points. We will insert a short parenthetical definition into the abstract for immediate clarity. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The abstract and supplied material describe layer selection via entropy density drops, Top-K gated fusion, and Ledoit-Wolf regularization without any equations, self-citations, or fitted quantities that reduce the claimed OOD performance to the inputs by construction. No derivation chain is exhibited that matches the enumerated circularity patterns; the method is presented as using standard estimators on selected features. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.1-grok · 5684 in / 988 out tokens · 42973 ms · 2026-06-27T02:59:38.693561+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 3 canonical work pages

[1]

NECO: Neural collapse based out-of-distribution detection

Mouïn Ben Ammar, Nacim Belkhir, Sebastian Popescu, Antoine Manzanera, and Gianni Franchi. NECO: Neural collapse based out-of-distribution detection. InProceedings of the International Conference on Learning Representations, 2024

2024
[2]

On the use of Mahalanobis distance for out- of-distribution detection with neural networks for medical imaging

Harry Anthony and Konstantinos Kamnitsas. On the use of Mahalanobis distance for out- of-distribution detection with neural networks for medical imaging. InProceedings of the International Workshop on Uncertainty for Safe Utilization of Machine Learning in Medical Imaging. Springer, 2023

2023
[3]

In or out? Fixing ImageNet out-of- distribution detection evaluation

Julian Bitterwolf, Maximilian Müller, and Matthias Hein. In or out? Fixing ImageNet out-of- distribution detection evaluation. InProceedings of the International Conference on Machine Learning, pages 2441–2472, 2023

2023
[4]

Describing textures in the wild

Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Andrea Vedaldi, and Stefano Soatto. Describing textures in the wild. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014

2014
[5]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021. URL https:...

2021
[6]

V os: Learning what you don’t know by virtual outlier synthesis

Xuefeng Du, Zhaoning Wang, Mu Cai, and Sharon Li. V os: Learning what you don’t know by virtual outlier synthesis. InInternational Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=TW7d65uYu5M

2022
[7]

Eva-02: A visual representation for neon genesis.Image and Vision Computing, 149:105171, 2024

Yuxin Fang, Quan Sun, Xinggang Wang, Tiejun Huang, Xinlong Wang, and Yue Cao. Eva-02: A visual representation for neon genesis.Image and Vision Computing, 149:105171, 2024

2024
[8]

Jarrod Haas, William Yolland, and Bernhard T. Rabus. Exploring simple, high quality out- of-distribution detection with l2 normalization.Transactions on Machine Learning Research, 2024, 2024

2024
[9]

Controlling neural collapse enhances out-of-distribution detection and transfer learning.arXiv preprint arXiv:2502.10691, 2025

Md Yousuf Harun, Jhair Gallardo, and Christopher Kanan. Controlling neural collapse enhances out-of-distribution detection and transfer learning.arXiv preprint arXiv:2502.10691, 2025

arXiv 2025
[10]

Benchmarking neural network robustness to common corruptions and perturbations

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=HJz6tiCqYm

2019
[11]

A baseline for detecting misclassified and out-of-distribution examples in neural networks

Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. InProceedings of the International Conference on Learning Representations, 2017

2017
[12]

Using self-supervised learning can improve model robustness and uncertainty

Dan Hendrycks, Mantas Mazeika, Saurav Kadavath, and Dawn Song. Using self-supervised learning can improve model robustness and uncertainty. InProceedings of the Advances in Neural Information Processing Systems, 2019

2019
[13]

The many faces of robustness: A critical analysis of out-of-distribution generalization

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Samat, Stephen Pimentel, et al. The many faces of robustness: A critical analysis of out-of-distribution generalization. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8340–8349, 2021

2021
[14]

Natural adversarial examples

Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. Natural adversarial examples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15262–15271, 2021

2021
[15]

Pixmix: Dreamlike pictures comprehensively improve safety measures

Dan Hendrycks, Andy Zou, Mantas Mazeika, Leonard Tang, Bo Li, Dawn Song, and Jacob Steinhardt. Pixmix: Dreamlike pictures comprehensively improve safety measures. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16783–16792, 2022. 10

2022
[16]

AP-OOD: Attention Pooling for Out-of-Distribution Detection.arXiv preprint arXiv:2602.06031, 2026

Claus Hofmann, Christian Huber, Bernhard Lehner, Daniel Klotz, Sepp Hochreiter, and Werner Zellinger. AP-OOD: Attention Pooling for Out-of-Distribution Detection.arXiv preprint arXiv:2602.06031, 2026

arXiv 2026
[17]

A geometry-based view of maha- lanobis OOD detection.arXiv preprint arXiv:2510.15202, 2025

Denis Janiak, Jakub Binkowski, and Tomasz Kajdanowicz. A geometry-based view of maha- lanobis OOD detection.arXiv preprint arXiv:2510.15202, 2025

arXiv 2025
[18]

A well-conditioned estimator for large-dimensional covariance matrices.Journal of multivariate analysis, 88(2):365–411, 2004

Olivier Ledoit and Michael Wolf. A well-conditioned estimator for large-dimensional covariance matrices.Journal of multivariate analysis, 88(2):365–411, 2004

2004
[19]

A simple unified framework for detecting out-of-distribution samples and adversarial attacks

Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 31, 2018

2018
[20]

Enhancing the reliability of out-of- distribution image detection in neural networks

Shiyu Liang, Yixuan Li, and Rayadurgam Srikant. Enhancing the reliability of out-of- distribution image detection in neural networks. InProceedings of the International Conference on Learning Representations, 2018

2018
[21]

Es-imagenet: A million event-stream classification dataset for spiking neural networks.Frontiers in Neuroscience, 15,

Yihan Lin, Wei Ding, Shaohua Qiang, Lei Deng, and Guoqi Li. Es-imagenet: A million event-stream classification dataset for spiking neural networks.Frontiers in Neuroscience, 15,
[22]

doi: 10.3389/fnins.2021.726582

work page doi:10.3389/fnins.2021.726582 2021
[23]

Mood: Multi-level out-of-distribution detection

Ziqian Lin, Suman Roy, and Yixuan Li. Mood: Multi-level out-of-distribution detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15313–15323, 2021

2021
[24]

Energy-based out-of-distribution detection

Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. Energy-based out-of-distribution detection. InProceedings of the Advances in Neural Information Processing Systems, 2020

2020
[25]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, October 2021

2021
[26]

A convnet for the 2020s

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11976–11986, 2022

2022
[27]

Large- scale long-tailed recognition in an open world

Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayuan Wang, Boqing Gong, and Stella X Yu. Large- scale long-tailed recognition in an open world. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2537–2546, 2019

2019
[28]

Learning with mixture of prototypes for out-of-distribution detection.arXiv preprint arXiv:2402.02653, 2024

Haodong Lu, Dong Gong, Shuo Wang, Jason Xue, Lina Yao, and Kristen Moore. Learning with mixture of prototypes for out-of-distribution detection.arXiv preprint arXiv:2402.02653, 2024

arXiv 2024
[29]

Out-of- distribution detection: A task-oriented survey of recent advances.ACM Computing Surveys, 58 (2):1–39, 2025

Shuo Lu, Yingsheng Wang, Lijun Sheng, Lingxiao He, Aihua Zheng, and Jian Liang. Out-of- distribution detection: A task-oriented survey of recent advances.ACM Computing Surveys, 58 (2):1–39, 2025

2025
[30]

From softmax to sparsemax: A sparse model of attention and multi-label classification

André FT Martins and Ramón Fernandez Astudillo. From softmax to sparsemax: A sparse model of attention and multi-label classification. InInternational Conference on Machine Learning, pages 1614–1623. PMLR, 2016

2016
[31]

How to exploit hyperspherical embed- dings for out-of-distribution detection? InProceedings of the International Conference on Learning Representations, 2023

Yifei Ming, Yiyou Sun, Ousmane Dia, and Yixuan Li. How to exploit hyperspherical embed- dings for out-of-distribution detection? InProceedings of the International Conference on Learning Representations, 2023

2023
[32]

Mahalanobis++: Improving OOD detection via feature normalization

Maximilian Müller and Matthias Hein. Mahalanobis++: Improving OOD detection via feature normalization. InProceedings of the International Conference on Machine Learning, 2025

2025
[33]

Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117 (40):24652–24663, 2020

Vardan Papyan, XY Han, and David L Donoho. Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117 (40):24652–24663, 2020. 11

2020
[34]

Do imagenet classifiers generalize to imagenet? InInternational Conference on Machine Learning (ICML), 2019

Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? InInternational Conference on Machine Learning (ICML), 2019

2019
[35]

T2fnorm: Train-time feature normalization for OOD detection in image classification

Sudarshan Regmi, Bibek Panthi, Sakar Dotel, Prashnna K Gyawali, Danail Stoyanov, and Binod Bhattarai. T2fnorm: Train-time feature normalization for OOD detection in image classification. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 153–162, 2024

2024
[36]

A simple fix to Mahalanobis distance for improving near-OOD detection.arXiv preprint arXiv:2106.09022, 2021

Jie Ren, Stanislav Fort, Jeremiah Liu, Abhijit Guha Roy, Shreyas Padhy, and Balaji Lakshmi- narayanan. A simple fix to Mahalanobis distance for improving near-OOD detection.arXiv preprint arXiv:2106.09022, 2021

arXiv 2021
[37]

Berg, and Li Fei-Fei

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge.International Journal of Computer Vision, 115(3):211–252, 2015

2015
[38]

SSD: A unified framework for self- supervised outlier detection

Vikash Sehwag, Mung Chiang, and Prateek Mittal. SSD: A unified framework for self- supervised outlier detection. InProceedings of the International Conference on Learning Representations, 2021

2021
[39]

Opening the black box of deep neural networks via information.arXiv preprint arXiv:1703.00810, 2017

Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deep neural networks via information.arXiv preprint arXiv:1703.00810, 2017

Pith/arXiv arXiv 2017
[40]

ReAct: Out-of-distribution detection with rectified acti- vations

Yiyou Sun, Chuan Guo, and Yixuan Li. ReAct: Out-of-distribution detection with rectified acti- vations. InProceedings of the Advances in Neural Information Processing Systems, volume 34, pages 144–157, 2021

2021
[41]

Out-of-distribution detection with deep nearest neighbors

Yiyou Sun, Yifei Ming, Xiaojin Zhu, and Yixuan Li. Out-of-distribution detection with deep nearest neighbors. InProceedings of the International Conference on Machine Learning, pages 20827–20840, 2022

2022
[42]

Non-parametric outlier synthesis

Leitian Tao, Xuefeng Du, Jerry Zhu, and Yixuan Li. Non-parametric outlier synthesis. In The Eleventh International Conference on Learning Representations, 2023. URL https: //openreview.net/forum?id=JHklpEZqduQ

2023
[43]

Deep learning and the information bottleneck principle

Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. In Proceedings of the IEEE Information Theory Workshop, 2015

2015
[44]

The inaturalist species classification and detection dataset

Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification and detection dataset. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8769–8778, 2018. doi: 10.1109/cvpr.2018.00914

work page doi:10.1109/cvpr.2018.00914 2018
[45]

Fine-grained out-of-distribution detection of medical images using combination of feature uncertainty and mahalanobis distance

Jie Wei and Guotai Wang. Fine-grained out-of-distribution detection of medical images using combination of feature uncertainty and mahalanobis distance. InProceedings of the IEEE International Symposium on Biomedical Imaging, pages 1–5. IEEE, 2023

2023
[46]

X-Mahalanobis: Transformer feature mixing for reliable OOD detection

Tong Wei, Bo-Lin Wang, Jiang-Xin Shi, Yu-Feng Li, and Min-Ling Zhang. X-Mahalanobis: Transformer feature mixing for reliable OOD detection. InProceedings of the Annual Confer- ence on Neural Information Processing Systems, 2025

2025
[47]

Russell, Genevieve Patterson, Krista A

Jianxiong Xiao, James Hays, Bryan C. Russell, Genevieve Patterson, Krista A. Ehinger, Antonio Torralba, and Aude Oliva. Basic level scene understanding: categories, attributes and structures. Frontiers in Psychology, 4, 2013. doi: 10.3389/fpsyg.2013.00506

work page doi:10.3389/fpsyg.2013.00506 2013
[48]

Openood: Benchmarking generalized out-of- distribution detection.Advances in Neural Information Processing Systems, 35:32598–32611, 2022

Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenxuan Peng, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, et al. Openood: Benchmarking generalized out-of- distribution detection.Advances in Neural Information Processing Systems, 35:32598–32611, 2022

2022
[49]

Generalized out-of-distribution detection: A survey.International Journal of Computer Vision, 132(12):5635–5662, 2024

Jingkang Yang, Kaiyang Zhou, Yixuan Li, and Ziwei Liu. Generalized out-of-distribution detection: A survey.International Journal of Computer Vision, 132(12):5635–5662, 2024. 12

2024
[50]

Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452–1464, 2017

Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452–1464, 2017. 13 A Theoretical Justification of MM++ The effectiveness of MM++ for out-of-distribution (OOD) detection is supported by three comple-...

2017
[51]

5) is motivated by the geometric properties of neural collapse [32]

Entropy-Based Layer Selection and Neural Collapse.The selection of informative layers via the entropy density drop ∆l (Eq. 5) is motivated by the geometric properties of neural collapse [32]. As representations propagate through a deep network, within-class variability transitions from high- dimensional, distributed features in early layers to low-dimensi...
[52]

This corre- sponds to approximating the joint covariance as a block-diagonal matrix, i.e., Σll′ =0 for l̸=l ′, thereby ignoring cross-layer dependencies

Joint Precision and Cross-Layer Consistency.Multi-layer OOD detectors such as [ 19, 45] typically rely on additive fusion, which implicitly assumes independence across layers. This corre- sponds to approximating the joint covariance as a block-diagonal matrix, i.e., Σll′ =0 for l̸=l ′, thereby ignoring cross-layer dependencies. Unlike methods that compute...
[53]

Let DK =P l∈K Dl be the dimension of the joint space

Well-Conditioned Estimation via Ledoit–Wolf Shrinkage.While concatenated fusion unlocks cross-layer modeling, it significantly exacerbates the dimensionality problem. Let DK =P l∈K Dl be the dimension of the joint space. When DK ≫N c, the empirical joint covariance ˆΣK becomes highly ill-conditioned or strictly rank-deficient (possessing zero-valued eigen...

arXiv 1968

[1] [1]

NECO: Neural collapse based out-of-distribution detection

Mouïn Ben Ammar, Nacim Belkhir, Sebastian Popescu, Antoine Manzanera, and Gianni Franchi. NECO: Neural collapse based out-of-distribution detection. InProceedings of the International Conference on Learning Representations, 2024

2024

[2] [2]

On the use of Mahalanobis distance for out- of-distribution detection with neural networks for medical imaging

Harry Anthony and Konstantinos Kamnitsas. On the use of Mahalanobis distance for out- of-distribution detection with neural networks for medical imaging. InProceedings of the International Workshop on Uncertainty for Safe Utilization of Machine Learning in Medical Imaging. Springer, 2023

2023

[3] [3]

In or out? Fixing ImageNet out-of- distribution detection evaluation

Julian Bitterwolf, Maximilian Müller, and Matthias Hein. In or out? Fixing ImageNet out-of- distribution detection evaluation. InProceedings of the International Conference on Machine Learning, pages 2441–2472, 2023

2023

[4] [4]

Describing textures in the wild

Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Andrea Vedaldi, and Stefano Soatto. Describing textures in the wild. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014

2014

[5] [5]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021. URL https:...

2021

[6] [6]

V os: Learning what you don’t know by virtual outlier synthesis

Xuefeng Du, Zhaoning Wang, Mu Cai, and Sharon Li. V os: Learning what you don’t know by virtual outlier synthesis. InInternational Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=TW7d65uYu5M

2022

[7] [7]

Eva-02: A visual representation for neon genesis.Image and Vision Computing, 149:105171, 2024

Yuxin Fang, Quan Sun, Xinggang Wang, Tiejun Huang, Xinlong Wang, and Yue Cao. Eva-02: A visual representation for neon genesis.Image and Vision Computing, 149:105171, 2024

2024

[8] [8]

Jarrod Haas, William Yolland, and Bernhard T. Rabus. Exploring simple, high quality out- of-distribution detection with l2 normalization.Transactions on Machine Learning Research, 2024, 2024

2024

[9] [9]

Controlling neural collapse enhances out-of-distribution detection and transfer learning.arXiv preprint arXiv:2502.10691, 2025

Md Yousuf Harun, Jhair Gallardo, and Christopher Kanan. Controlling neural collapse enhances out-of-distribution detection and transfer learning.arXiv preprint arXiv:2502.10691, 2025

arXiv 2025

[10] [10]

Benchmarking neural network robustness to common corruptions and perturbations

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=HJz6tiCqYm

2019

[11] [11]

A baseline for detecting misclassified and out-of-distribution examples in neural networks

Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. InProceedings of the International Conference on Learning Representations, 2017

2017

[12] [12]

Using self-supervised learning can improve model robustness and uncertainty

Dan Hendrycks, Mantas Mazeika, Saurav Kadavath, and Dawn Song. Using self-supervised learning can improve model robustness and uncertainty. InProceedings of the Advances in Neural Information Processing Systems, 2019

2019

[13] [13]

The many faces of robustness: A critical analysis of out-of-distribution generalization

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Samat, Stephen Pimentel, et al. The many faces of robustness: A critical analysis of out-of-distribution generalization. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8340–8349, 2021

2021

[14] [14]

Natural adversarial examples

Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. Natural adversarial examples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15262–15271, 2021

2021

[15] [15]

Pixmix: Dreamlike pictures comprehensively improve safety measures

Dan Hendrycks, Andy Zou, Mantas Mazeika, Leonard Tang, Bo Li, Dawn Song, and Jacob Steinhardt. Pixmix: Dreamlike pictures comprehensively improve safety measures. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16783–16792, 2022. 10

2022

[16] [16]

AP-OOD: Attention Pooling for Out-of-Distribution Detection.arXiv preprint arXiv:2602.06031, 2026

Claus Hofmann, Christian Huber, Bernhard Lehner, Daniel Klotz, Sepp Hochreiter, and Werner Zellinger. AP-OOD: Attention Pooling for Out-of-Distribution Detection.arXiv preprint arXiv:2602.06031, 2026

arXiv 2026

[17] [17]

A geometry-based view of maha- lanobis OOD detection.arXiv preprint arXiv:2510.15202, 2025

Denis Janiak, Jakub Binkowski, and Tomasz Kajdanowicz. A geometry-based view of maha- lanobis OOD detection.arXiv preprint arXiv:2510.15202, 2025

arXiv 2025

[18] [18]

A well-conditioned estimator for large-dimensional covariance matrices.Journal of multivariate analysis, 88(2):365–411, 2004

Olivier Ledoit and Michael Wolf. A well-conditioned estimator for large-dimensional covariance matrices.Journal of multivariate analysis, 88(2):365–411, 2004

2004

[19] [19]

A simple unified framework for detecting out-of-distribution samples and adversarial attacks

Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 31, 2018

2018

[20] [20]

Enhancing the reliability of out-of- distribution image detection in neural networks

Shiyu Liang, Yixuan Li, and Rayadurgam Srikant. Enhancing the reliability of out-of- distribution image detection in neural networks. InProceedings of the International Conference on Learning Representations, 2018

2018

[21] [21]

Es-imagenet: A million event-stream classification dataset for spiking neural networks.Frontiers in Neuroscience, 15,

Yihan Lin, Wei Ding, Shaohua Qiang, Lei Deng, and Guoqi Li. Es-imagenet: A million event-stream classification dataset for spiking neural networks.Frontiers in Neuroscience, 15,

[22] [22]

doi: 10.3389/fnins.2021.726582

work page doi:10.3389/fnins.2021.726582 2021

[23] [23]

Mood: Multi-level out-of-distribution detection

Ziqian Lin, Suman Roy, and Yixuan Li. Mood: Multi-level out-of-distribution detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15313–15323, 2021

2021

[24] [24]

Energy-based out-of-distribution detection

Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. Energy-based out-of-distribution detection. InProceedings of the Advances in Neural Information Processing Systems, 2020

2020

[25] [25]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, October 2021

2021

[26] [26]

A convnet for the 2020s

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11976–11986, 2022

2022

[27] [27]

Large- scale long-tailed recognition in an open world

Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayuan Wang, Boqing Gong, and Stella X Yu. Large- scale long-tailed recognition in an open world. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2537–2546, 2019

2019

[28] [28]

Learning with mixture of prototypes for out-of-distribution detection.arXiv preprint arXiv:2402.02653, 2024

Haodong Lu, Dong Gong, Shuo Wang, Jason Xue, Lina Yao, and Kristen Moore. Learning with mixture of prototypes for out-of-distribution detection.arXiv preprint arXiv:2402.02653, 2024

arXiv 2024

[29] [29]

Out-of- distribution detection: A task-oriented survey of recent advances.ACM Computing Surveys, 58 (2):1–39, 2025

Shuo Lu, Yingsheng Wang, Lijun Sheng, Lingxiao He, Aihua Zheng, and Jian Liang. Out-of- distribution detection: A task-oriented survey of recent advances.ACM Computing Surveys, 58 (2):1–39, 2025

2025

[30] [30]

From softmax to sparsemax: A sparse model of attention and multi-label classification

André FT Martins and Ramón Fernandez Astudillo. From softmax to sparsemax: A sparse model of attention and multi-label classification. InInternational Conference on Machine Learning, pages 1614–1623. PMLR, 2016

2016

[31] [31]

How to exploit hyperspherical embed- dings for out-of-distribution detection? InProceedings of the International Conference on Learning Representations, 2023

Yifei Ming, Yiyou Sun, Ousmane Dia, and Yixuan Li. How to exploit hyperspherical embed- dings for out-of-distribution detection? InProceedings of the International Conference on Learning Representations, 2023

2023

[32] [32]

Mahalanobis++: Improving OOD detection via feature normalization

Maximilian Müller and Matthias Hein. Mahalanobis++: Improving OOD detection via feature normalization. InProceedings of the International Conference on Machine Learning, 2025

2025

[33] [33]

Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117 (40):24652–24663, 2020

Vardan Papyan, XY Han, and David L Donoho. Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117 (40):24652–24663, 2020. 11

2020

[34] [34]

Do imagenet classifiers generalize to imagenet? InInternational Conference on Machine Learning (ICML), 2019

Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? InInternational Conference on Machine Learning (ICML), 2019

2019

[35] [35]

T2fnorm: Train-time feature normalization for OOD detection in image classification

Sudarshan Regmi, Bibek Panthi, Sakar Dotel, Prashnna K Gyawali, Danail Stoyanov, and Binod Bhattarai. T2fnorm: Train-time feature normalization for OOD detection in image classification. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 153–162, 2024

2024

[36] [36]

A simple fix to Mahalanobis distance for improving near-OOD detection.arXiv preprint arXiv:2106.09022, 2021

Jie Ren, Stanislav Fort, Jeremiah Liu, Abhijit Guha Roy, Shreyas Padhy, and Balaji Lakshmi- narayanan. A simple fix to Mahalanobis distance for improving near-OOD detection.arXiv preprint arXiv:2106.09022, 2021

arXiv 2021

[37] [37]

Berg, and Li Fei-Fei

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge.International Journal of Computer Vision, 115(3):211–252, 2015

2015

[38] [38]

SSD: A unified framework for self- supervised outlier detection

Vikash Sehwag, Mung Chiang, and Prateek Mittal. SSD: A unified framework for self- supervised outlier detection. InProceedings of the International Conference on Learning Representations, 2021

2021

[39] [39]

Opening the black box of deep neural networks via information.arXiv preprint arXiv:1703.00810, 2017

Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deep neural networks via information.arXiv preprint arXiv:1703.00810, 2017

Pith/arXiv arXiv 2017

[40] [40]

ReAct: Out-of-distribution detection with rectified acti- vations

Yiyou Sun, Chuan Guo, and Yixuan Li. ReAct: Out-of-distribution detection with rectified acti- vations. InProceedings of the Advances in Neural Information Processing Systems, volume 34, pages 144–157, 2021

2021

[41] [41]

Out-of-distribution detection with deep nearest neighbors

Yiyou Sun, Yifei Ming, Xiaojin Zhu, and Yixuan Li. Out-of-distribution detection with deep nearest neighbors. InProceedings of the International Conference on Machine Learning, pages 20827–20840, 2022

2022

[42] [42]

Non-parametric outlier synthesis

Leitian Tao, Xuefeng Du, Jerry Zhu, and Yixuan Li. Non-parametric outlier synthesis. In The Eleventh International Conference on Learning Representations, 2023. URL https: //openreview.net/forum?id=JHklpEZqduQ

2023

[43] [43]

Deep learning and the information bottleneck principle

Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. In Proceedings of the IEEE Information Theory Workshop, 2015

2015

[44] [44]

The inaturalist species classification and detection dataset

Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification and detection dataset. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8769–8778, 2018. doi: 10.1109/cvpr.2018.00914

work page doi:10.1109/cvpr.2018.00914 2018

[45] [45]

Fine-grained out-of-distribution detection of medical images using combination of feature uncertainty and mahalanobis distance

Jie Wei and Guotai Wang. Fine-grained out-of-distribution detection of medical images using combination of feature uncertainty and mahalanobis distance. InProceedings of the IEEE International Symposium on Biomedical Imaging, pages 1–5. IEEE, 2023

2023

[46] [46]

X-Mahalanobis: Transformer feature mixing for reliable OOD detection

Tong Wei, Bo-Lin Wang, Jiang-Xin Shi, Yu-Feng Li, and Min-Ling Zhang. X-Mahalanobis: Transformer feature mixing for reliable OOD detection. InProceedings of the Annual Confer- ence on Neural Information Processing Systems, 2025

2025

[47] [47]

Russell, Genevieve Patterson, Krista A

Jianxiong Xiao, James Hays, Bryan C. Russell, Genevieve Patterson, Krista A. Ehinger, Antonio Torralba, and Aude Oliva. Basic level scene understanding: categories, attributes and structures. Frontiers in Psychology, 4, 2013. doi: 10.3389/fpsyg.2013.00506

work page doi:10.3389/fpsyg.2013.00506 2013

[48] [48]

Openood: Benchmarking generalized out-of- distribution detection.Advances in Neural Information Processing Systems, 35:32598–32611, 2022

Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenxuan Peng, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, et al. Openood: Benchmarking generalized out-of- distribution detection.Advances in Neural Information Processing Systems, 35:32598–32611, 2022

2022

[49] [49]

Generalized out-of-distribution detection: A survey.International Journal of Computer Vision, 132(12):5635–5662, 2024

Jingkang Yang, Kaiyang Zhou, Yixuan Li, and Ziwei Liu. Generalized out-of-distribution detection: A survey.International Journal of Computer Vision, 132(12):5635–5662, 2024. 12

2024

[50] [50]

Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452–1464, 2017

Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452–1464, 2017. 13 A Theoretical Justification of MM++ The effectiveness of MM++ for out-of-distribution (OOD) detection is supported by three comple-...

2017

[51] [51]

5) is motivated by the geometric properties of neural collapse [32]

Entropy-Based Layer Selection and Neural Collapse.The selection of informative layers via the entropy density drop ∆l (Eq. 5) is motivated by the geometric properties of neural collapse [32]. As representations propagate through a deep network, within-class variability transitions from high- dimensional, distributed features in early layers to low-dimensi...

[52] [52]

This corre- sponds to approximating the joint covariance as a block-diagonal matrix, i.e., Σll′ =0 for l̸=l ′, thereby ignoring cross-layer dependencies

Joint Precision and Cross-Layer Consistency.Multi-layer OOD detectors such as [ 19, 45] typically rely on additive fusion, which implicitly assumes independence across layers. This corre- sponds to approximating the joint covariance as a block-diagonal matrix, i.e., Σll′ =0 for l̸=l ′, thereby ignoring cross-layer dependencies. Unlike methods that compute...

[53] [53]

Let DK =P l∈K Dl be the dimension of the joint space

Well-Conditioned Estimation via Ledoit–Wolf Shrinkage.While concatenated fusion unlocks cross-layer modeling, it significantly exacerbates the dimensionality problem. Let DK =P l∈K Dl be the dimension of the joint space. When DK ≫N c, the empirical joint covariance ˆΣK becomes highly ill-conditioned or strictly rank-deficient (possessing zero-valued eigen...

arXiv 1968