arxiv: 2605.10756 · v1 · submitted 2026-05-11 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

TINS: Test-time ID-prototype-separated Negative Semantics Learning for OOD Detection

Jing Xu, Jubo Feng, Nanyang Ye, Qinying Gu, Xinbing Wang, Yifeng Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:57 UTC · model grok-4.3

classification 💻 cs.CV

keywords OOD detectionvision-language modelstest-time learningnegative semanticsmodality inversionImageNetout-of-distribution

0 comments

The pith

Learning sample-specific negative text embeddings separated from ID prototypes at test time improves OOD detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that static negative labels in vision-language models cannot keep up with diverse and changing out-of-distribution inputs. Naive expansion of negatives from test samples risks pulling in ID-like contamination that blurs the decision boundary. TINS counters this by inverting each image into a dedicated negative text embedding and applying regularization that forces those embeddings away from ID prototypes. The resulting method delivers lower false-positive rates across multiple benchmarks while adding only group-wise scoring and buffer updates for stability. A reader would care because safer open-world deployment of image classifiers depends on precisely this kind of adaptive separation between known and unknown content.

Core claim

TINS learns sample-specific negative text embeddings via image-to-text modality inversion and introduces ID-prototype-separated regularization to keep them separated from ID semantics. To further stabilize negative semantics expansion, TINS employs group-wise aggregation scoring and a buffer update strategy. Extensive experiments across Four-OOD, OpenOOD, Temporal-shift, and Various ID settings show consistent improvements over strong baselines.

What carries the argument

ID-prototype-separated regularization applied to sample-specific negative text embeddings obtained through image-to-text modality inversion.

Load-bearing premise

The regularization successfully keeps learned negative embeddings away from ID semantics without losing useful diversity or creating new overlap problems when OOD samples are close to the ID set.

What would settle it

Measure whether removing the ID-prototype-separated regularization causes the average FPR95 on the Four-OOD benchmark with ImageNet-1K to rise back toward the 14 percent baseline.

Figures

Figures reproduced from arXiv: 2605.10756 by Jing Xu, Jubo Feng, Nanyang Ye, Qinying Gu, Xinbing Wang, Yifeng Yang.

**Figure 2.** Figure 2: The overall framework of TINS, which learns adaptive negative semantics at test time via [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Effect of two initialization strategies with different iteration steps. Various ID datasets. To further validate the robustness of our method, we evaluate it on various ID datasets, including Food-101 [4], ImageNet-Sketch [50], ImageNet-R [21], and ImageNet-V2 [41]. Following InterNeg [55], both InterNeg and our method randomly sample four images per class from each ID dataset to construct ID image prototy… view at source ↗

**Figure 4.** Figure 4: Hyperparameter analyses: (a) Regularization coefficient [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Trade-off between OOD detection performance and inference speed under different optimization iteration steps. One minor limitation of our method is the additional cost introduced by test-time optimization. As shown in [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation studies on (a) inference batch size, (b) bank capacity [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

read the original abstract

Vision-language models enable OOD detection by comparing image alignment with ID labels and negative semantics. Existing negative-label-based methods mainly rely on static negative labels constructed before inference, limiting their ability to cover diverse and evolving OOD concepts. Although test-time expansion provides a natural solution, naively learning negative semantics from potential OOD samples may introduce hard ID contamination. To address this issue, we propose a \textbf{T}est-time \textbf{I}D-prototype-separated \textbf{N}egative \textbf{S}emantics learning method, termed \textbf{TINS}. TINS learns sample-specific negative text embeddings via image-to-text modality inversion and introduces ID-prototype-separated regularization to keep them separated from ID semantics. To further stabilize negative semantics expansion, TINS employs group-wise aggregation scoring and a buffer update strategy. Extensive experiments across Four-OOD, OpenOOD, Temporal-shift, and Various ID settings show consistent improvements over strong baselines. Notably, on the Four-OOD benchmark with ImageNet-1K as ID, TINS reduces the average FPR95 from 14.04\% to 6.72\%. Our code is available at https://github.com/zxk1212/tins.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TINS adds test-time negative semantics learning with a separation regularizer and shows benchmark gains, but the mechanism's role isn't isolated yet.

read the letter

TINS learns negative text embeddings on the fly from test images using modality inversion, then applies a regularization term to keep those embeddings away from ID prototypes. It also adds group-wise aggregation and a buffer update to stabilize the process. The main reported result is a drop in average FPR95 from 14.04% to 6.72% on the Four-OOD benchmark with ImageNet-1K as ID, plus gains across other settings like OpenOOD and temporal shifts. Code is released, which is useful for checking the details. The idea targets a real issue with static negative labels by making them sample-specific while trying to block hard ID contamination. That combination is the clearest new piece relative to prior negative-label work. The empirical improvements are stated clearly and look consistent across the listed benchmarks. The soft spot is the lack of direct evidence that the separation regularizer is what drives the gains rather than the inversion step or the aggregation alone. No ablations on that term appear in the abstract, and there is no mention of hyperparameter sweeps or statistical tests on the differences. If the regularization is too weak, contamination could still happen; if too strong, the negative embeddings might lose useful diversity. The stress-test point about verifying that balance holds up. This is for groups already working on OOD detection for vision-language models who want a practical incremental method. It does not change core theory but could be worth testing in deployment pipelines. I would send it to peer review because the numbers are concrete and the code is public, so referees can dig into the mechanism and ask for the missing controls.

Referee Report

3 major / 2 minor

Summary. The paper claims to introduce TINS, a test-time ID-prototype-separated negative semantics learning method for OOD detection in vision-language models. It learns sample-specific negative text embeddings via image-to-text modality inversion and applies ID-prototype-separated regularization to prevent hard ID contamination from potential OOD samples. It further uses group-wise aggregation scoring and a buffer update strategy for stability. Experiments across Four-OOD, OpenOOD, Temporal-shift, and Various ID settings show consistent gains over baselines, notably reducing average FPR95 from 14.04% to 6.72% on the Four-OOD benchmark with ImageNet-1K as ID.

Significance. If the central mechanism holds, the work advances test-time OOD detection by dynamically generating tailored negative semantics without pre-fixed static labels, addressing coverage of diverse and evolving OOD concepts. The public code release supports reproducibility. The reported benchmark improvements are concrete and span multiple settings. Significance is tempered by the need to confirm the separation regularization drives the gains rather than the inversion or aggregation steps alone.

major comments (3)

[§3] §3 (ID-prototype-separated regularization): The regularization is presented as preventing hard ID contamination while preserving negative semantics diversity, but the manuscript provides no direct verification such as embedding visualizations, diversity metrics (e.g., variance or intra-group distances), or before/after comparisons. This is load-bearing for the central claim, as gains could stem from modality inversion or group-wise aggregation instead.
[§4.2] §4.2 (Four-OOD results): The FPR95 reduction from 14.04% to 6.72% is reported without ablation isolating the separation term's contribution, hyperparameter sensitivity analysis for the regularization strength, or statistical significance (e.g., mean and std over multiple seeds). This leaves open whether the claimed mechanism is responsible.
[§4.3] §4.3 (ablation and analysis): No examination of potential new failure modes, such as overly generic or low-variance negative embeddings on near-boundary samples when regularization is strong, or residual contamination when weak. This directly addresses the weakest assumption in the method's design.

minor comments (2)

[Abstract] Abstract: 'Four-OOD' is referenced without a short parenthetical listing of its constituent datasets, which would improve immediate clarity for readers.
[§3] Notation: The precise formulation of the ID-prototype separation loss (including how prototypes are computed and the weighting hyperparameter) could be stated more explicitly with an equation number for easier reference.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments correctly identify areas where additional evidence is needed to substantiate the contribution of the ID-prototype-separated regularization. We address each major comment below and will revise the manuscript to incorporate the suggested analyses, ablations, and visualizations.

read point-by-point responses

Referee: [§3] §3 (ID-prototype-separated regularization): The regularization is presented as preventing hard ID contamination while preserving negative semantics diversity, but the manuscript provides no direct verification such as embedding visualizations, diversity metrics (e.g., variance or intra-group distances), or before/after comparisons. This is load-bearing for the central claim, as gains could stem from modality inversion or group-wise aggregation instead.

Authors: We agree that direct empirical verification of the regularization is necessary to support the central claim. In the revised manuscript, we will add t-SNE visualizations of negative text embeddings before and after applying the ID-prototype-separated regularization. We will also report quantitative metrics including average cosine similarity to ID prototypes (to show separation) and intra-group embedding variance (to show preserved diversity). These will be presented alongside the existing results to isolate the regularization's effect. revision: yes
Referee: [§4.2] §4.2 (Four-OOD results): The FPR95 reduction from 14.04% to 6.72% is reported without ablation isolating the separation term's contribution, hyperparameter sensitivity analysis for the regularization strength, or statistical significance (e.g., mean and std over multiple seeds). This leaves open whether the claimed mechanism is responsible.

Authors: We acknowledge the need for isolating experiments and statistical reporting. In the revision, Section 4.2 will include an ablation study disabling only the separation regularization term while keeping modality inversion and group-wise aggregation fixed, to quantify its isolated contribution to the FPR95 reduction. We will also add a sensitivity analysis over the regularization strength hyperparameter and report all main results as mean ± std over 5 random seeds. revision: yes
Referee: [§4.3] §4.3 (ablation and analysis): No examination of potential new failure modes, such as overly generic or low-variance negative embeddings on near-boundary samples when regularization is strong, or residual contamination when weak. This directly addresses the weakest assumption in the method's design.

Authors: We appreciate the suggestion to examine failure modes. The revised Section 4.3 will include targeted analysis on near-OOD samples, measuring negative embedding variance and similarity to ID prototypes across a range of regularization strengths. We will discuss observed cases of overly generic embeddings (strong regularization) and residual contamination (weak regularization), and provide practical guidance on hyperparameter selection to balance these trade-offs. revision: yes

Circularity Check

0 steps flagged

No significant circularity: claims rest on external benchmark experiments rather than self-referential fits or definitions

full rationale

The paper proposes TINS as a test-time method using image-to-text modality inversion plus ID-prototype-separated regularization, with performance gains (e.g., FPR95 reduction on Four-OOD) demonstrated via experiments on independent benchmarks including OpenOOD and Temporal-shift. No load-bearing step reduces by construction to its own inputs: the regularization is a design choice whose effectiveness is measured externally rather than derived tautologically from fitted parameters or prior self-citations. The derivation chain is self-contained against external data and does not invoke uniqueness theorems, ansatzes smuggled via self-citation, or renaming of known results as novel predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that vision-language models can reliably invert images to negative text embeddings and that prototype separation can be enforced without side effects; no new entities are postulated and no free parameters are explicitly fitted in the abstract description.

axioms (1)

domain assumption Vision-language models produce embeddings that allow meaningful image-to-text inversion for distinguishing ID from OOD concepts.
Invoked in the modality inversion step of TINS.

pith-pipeline@v0.9.0 · 5528 in / 1296 out tokens · 81760 ms · 2026-05-12T03:57:47.898893+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
L_ours(t−,v) = 1−cos(t−,v) +λ·1C∑c=1C(1+cos(t−,μc)) ... ID-prototype-separated regularization to keep them separated from ID semantics
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear
group-wise aggregation scoring ... Theorem 1 on balanced grouping of negative activations

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 1 internal anchor

[1]

Towards robust au- tonomous driving: Out-of-distribution object detection in bird’s eye view space.IEEE Open Journal of Vehicular Technology, 2025

Muhammad Asad, Ihsan Ullah, Ganesh Sistu, and Michael G Madden. Towards robust au- tonomous driving: Out-of-distribution object detection in bird’s eye view space.IEEE Open Journal of Vehicular Technology, 2025

work page 2025
[2]

Id-like prompt learning for few-shot out-of-distribution detection

Yichen Bai, Zongbo Han, Bing Cao, Xiaoheng Jiang, Qinghua Hu, and Changqing Zhang. Id-like prompt learning for few-shot out-of-distribution detection. InConference on Computer Vision and Pattern Recognition, pages 17480–17489, 2024

work page 2024
[3]

In or out? fixing imagenet out- of-distribution detection evaluation

Julian Bitterwolf, Maximilian Müller, and Matthias Hein. In or out? fixing imagenet out- of-distribution detection evaluation. InProceedings of the 40th International Conference on Machine Learning, pages 2471–2506, 2023

work page 2023
[4]

Food-101–mining discriminative components with random forests

Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101–mining discriminative components with random forests. InEuropean conference on computer vision, pages 446–461. Springer, 2014

work page 2014
[5]

Envisioning outlier exposure by large language models for out-of-distribution detection

Chentao Cao, Zhun Zhong, Zhanke Zhou, Yang Liu, Tongliang Liu, and Bo Han. Envisioning outlier exposure by large language models for out-of-distribution detection. InInternational Conference on Machine Learning, 2024

work page 2024
[6]

Conjugated semantic pool improves OOD detection with pre-trained vision-language models

Mengyuan Chen, Junyu Gao, and Changsheng Xu. Conjugated semantic pool improves OOD detection with pre-trained vision-language models. InAnnual Conference on Neural Information Processing Systems, pages 82560–82593, 2024

work page 2024
[7]

Describing textures in the wild

Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014

work page 2014
[8]

Imagenet: A large- scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

work page 2009
[9]

The mnist database of handwritten digit images for machine learning research [best of the web].IEEE signal processing magazine, 29(6):141–142, 2012

Li Deng. The mnist database of handwritten digit images for machine learning research [best of the web].IEEE signal processing magazine, 29(6):141–142, 2012

work page 2012
[10]

Extremely simple activation shaping for out-of-distribution detection

Andrija Djurisic, Nebojsa Bozanic, Arjun Ashok, and Rosanne Liu. Extremely simple activation shaping for out-of-distribution detection. InThe Eleventh International Conference on Learning Representations

work page
[11]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations

work page
[12]

SIREN: shaping representations for detecting out-of-distribution objects

Xuefeng Du, Gabriel Gozum, Yifei Ming, and Yixuan Li. SIREN: shaping representations for detecting out-of-distribution objects. InAnnual Conference on Neural Information Processing Systems, pages 20434–20449, 2022

work page 2022
[13]

V os: Learning what you don’t know by virtual outlier synthesis.Proceedings of the International Conference on Learning Representa- tions, 2022

Xuefeng Du, Zhaoning Wang, Mu Cai, and Yixuan Li. V os: Learning what you don’t know by virtual outlier synthesis.Proceedings of the International Conference on Learning Representa- tions, 2022. 10

work page 2022
[14]

Clipscope: Enhancing zero-shot ood detection with bayesian scoring

Hao Fu, Naman Patel, Prashanth Krishnamurthy, et al. Clipscope: Enhancing zero-shot ood detection with bayesian scoring. InProceedings of the Winter Conference on Applications of Computer Vision, pages 5346–5355, 2025

work page 2025
[15]

Aucseg: Auc-oriented pixel-level long-tail semantic segmentation.Advances in Neural Information Processing Systems, 37:126863–126907, 2024

Boyu Han, Qianqian Xu, Zhiyong Yang, Shilong Bao, Peisong Wen, Yangbangyan Jiang, and Qingming Huang. Aucseg: Auc-oriented pixel-level long-tail semantic segmentation.Advances in Neural Information Processing Systems, 37:126863–126907, 2024

work page 2024
[16]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016
[17]

A baseline for detecting misclassified and out-of-distribution examples in neural networks.Proceedings of International Conference on Learning Represen- tations, 2017

Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks.Proceedings of International Conference on Learning Represen- tations, 2017

work page 2017
[18]

Deep anomaly detection with outlier exposure.Proceedings of the International Conference on Learning Representations, 2019

Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. Deep anomaly detection with outlier exposure.Proceedings of the International Conference on Learning Representations, 2019

work page 2019
[19]

Using self-supervised learning can improve model robustness and uncertainty.Advances in neural information processing systems, 32, 2019

Dan Hendrycks, Mantas Mazeika, Saurav Kadavath, and Dawn Song. Using self-supervised learning can improve model robustness and uncertainty.Advances in neural information processing systems, 32, 2019

work page 2019
[20]

Augmix: A simple data processing method to improve robustness and uncertainty

Dan Hendrycks, Norman Mu, Ekin D Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshmi- narayanan. Augmix: A simple data processing method to improve robustness and uncertainty. arXiv preprint arXiv:1912.02781, 2019

work page arXiv 1912
[21]

The many faces of robustness: A critical analysis of out-of-distribution generalization

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, et al. The many faces of robustness: A critical analysis of out-of-distribution generalization. InProceedings of the IEEE/CVF international conference on computer vision, pages 8340–8349, 2021

work page 2021
[22]

Pixmix: Dreamlike pictures comprehensively improve safety measures

Dan Hendrycks, Andy Zou, Mantas Mazeika, Leonard Tang, Bo Li, Dawn Song, and Jacob Steinhardt. Pixmix: Dreamlike pictures comprehensively improve safety measures. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16783–16792, June 2022

work page 2022
[23]

Mos: Towards scaling out-of-distribution detection for large semantic space

Rui Huang and Yixuan Li. Mos: Towards scaling out-of-distribution detection for large semantic space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8710–8719, 2021

work page 2021
[24]

On the importance of gradients for detecting distributional shifts in the wild.Advances in Neural Information Processing Systems, 34: 677–689, 2021

Rui Huang, Andrew Geng, and Yixuan Li. On the importance of gradients for detecting distributional shifts in the wild.Advances in Neural Information Processing Systems, 34: 677–689, 2021

work page 2021
[25]

Negative label guided ood detection with pretrained vision-language models

Xue Jiang, Feng Liu, Zhen Fang, Hong Chen, Tongliang Liu, Feng Zheng, and Bo Han. Negative label guided ood detection with pretrained vision-language models. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[26]

Opengan: Open-set recognition via open data generation

Shu Kong and Deva Ramanan. Opengan: Open-set recognition via open data generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 813–822, 2021

work page 2021
[27]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009
[28]

Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

work page 2015
[29]

A simple unified framework for detecting out-of-distribution samples and adversarial attacks.Advances in neural information processing systems, 31, 2018

Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks.Advances in neural information processing systems, 31, 2018

work page 2018
[30]

Enhancing the reliability of out-of-distribution image detection in neural networks

Shiyu Liang, Yixuan Li, and R Srikant. Enhancing the reliability of out-of-distribution image detection in neural networks. InInternational Conference on Learning Representations, 2018. 11

work page 2018
[31]

Energy-based out-of-distribution detection.Advances in neural information processing systems, 33:21464–21475, 2020

Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. Energy-based out-of-distribution detection.Advances in neural information processing systems, 33:21464–21475, 2020

work page 2020
[32]

Gen: Pushing the limits of softmax-based out-of-distribution detection

Xixi Liu, Yaroslava Lochman, and Christopher Zach. Gen: Pushing the limits of softmax-based out-of-distribution detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 23946–23955, 2023

work page 2023
[33]

Wordnet: a lexical database for english.Communications of the ACM, 38(11): 39–41, 1995

George A Miller. Wordnet: a lexical database for english.Communications of the ACM, 38(11): 39–41, 1995

work page 1995
[34]

Delving into out-of- distribution detection with vision-language representations.Advances in neural information processing systems, 35:35087–35102, 2022

Yifei Ming, Ziyang Cai, Jiuxiang Gu, Yiyou Sun, Wei Li, and Yixuan Li. Delving into out-of- distribution detection with vision-language representations.Advances in neural information processing systems, 35:35087–35102, 2022

work page 2022
[35]

How to exploit hyperspherical embed- dings for out-of-distribution detection?arXiv preprint arXiv:2203.04450, 2022

Yifei Ming, Yiyou Sun, Ousmane Dia, and Yixuan Li. How to exploit hyperspherical embed- dings for out-of-distribution detection?arXiv preprint arXiv:2203.04450, 2022

work page arXiv 2022
[36]

Cross the gap: Exposing the intra-modal misalignment in clip via modality inversion

M Mistretta, A Baldrati, L Agnolucci, M Bertini, AD Bagdanov, et al. Cross the gap: Exposing the intra-modal misalignment in clip via modality inversion. In13th International Conference on Learning Representations, ICLR 2025, pages 90437–90458. International Conference on Learning Representations, ICLR, 2025

work page 2025
[37]

Locoop: Few-shot out-of-distribution detection via prompt learning

Atsuyuki Miyai, Qing Yu, Go Irie, and Kiyoharu Aizawa. Locoop: Few-shot out-of-distribution detection via prompt learning. InAnnual Conference on Neural Information Processing Systems, pages 76298–76310, 2023

work page 2023
[38]

Reading digits in natural images with unsupervised feature learning

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011

work page 2011
[39]

Out- of-distribution detection with negative prompts

Jun Nie, Yonggang Zhang, Zhen Fang, Tongliang Liu, Bo Han, and Xinmei Tian. Out- of-distribution detection with negative prompts. InInternational Conference on Learning Representations, pages 1–20, 2024

work page 2024
[40]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

work page 2021
[41]

Do imagenet classifiers generalize to imagenet? InInternational conference on machine learning, pages 5389–5400

Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? InInternational conference on machine learning, pages 5389–5400. PMLR, 2019

work page 2019
[42]

Out-of-domain detection based on generative adversarial network

Seonghan Ryu, Sangjun Koo, Hwanjo Yu, and Gary Geunbae Lee. Out-of-domain detection based on generative adversarial network. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 714–718, 2018

work page 2018
[43]

Ssd: A unified framework for self-supervised outlier detection.arXiv preprint arXiv:2103.12051, 2021

Vikash Sehwag, Mung Chiang, and Prateek Mittal. Ssd: A unified framework for self-supervised outlier detection.arXiv preprint arXiv:2103.12051, 2021

work page arXiv 2021
[44]

React: Out-of-distribution detection with rectified activations.Advances in Neural Information Processing Systems, 34:144–157, 2021

Yiyou Sun, Chuan Guo, and Yixuan Li. React: Out-of-distribution detection with rectified activations.Advances in Neural Information Processing Systems, 34:144–157, 2021

work page 2021
[45]

Out-of-distribution detection with deep nearest neighbors

Yiyou Sun, Yifei Ming, Xiaojin Zhu, and Yixuan Li. Out-of-distribution detection with deep nearest neighbors. InInternational Conference on Machine Learning, pages 20827–20840. PMLR, 2022

work page 2022
[46]

The inaturalist species classification and detection dataset

Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification and detection dataset. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018

work page 2018
[47]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 12

work page 2017
[48]

Bimcv covid-19+: a large annotated dataset of rx and ct images from covid-19 patients.arXiv preprint arXiv:2006.01174, 2020

Maria De La Iglesia Vayá, Jose Manuel Saborit, Joaquim Angel Montell, Antonio Pertusa, Aurelia Bustos, Miguel Cazorla, Joaquin Galant, Xavier Barber, Domingo Orozco-Beltrán, Francisco García-García, et al. Bimcv covid-19+: a large annotated dataset of rx and ct images from covid-19 patients.arXiv preprint arXiv:2006.01174, 2020

work page arXiv 2006
[49]

Open-set recognition: A good closed-set classifier is all you need

Sagar Vaze, Kai Han, Andrea Vedaldi, and Andrew Zisserman. Open-set recognition: A good closed-set classifier is all you need. InInternational conference on learning representations, 2021

work page 2021
[50]

Learning robust global repre- sentations by penalizing local predictive power.Advances in neural information processing systems, 32, 2019

Haohan Wang, Songwei Ge, Zachary Lipton, and Eric P Xing. Learning robust global repre- sentations by penalizing local predictive power.Advances in neural information processing systems, 32, 2019

work page 2019
[51]

Vim: Out-of-distribution with virtual-logit matching

Haoqi Wang, Zhizhong Li, Litong Feng, and Wayne Zhang. Vim: Out-of-distribution with virtual-logit matching. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4921–4930, 2022

work page 2022
[52]

Application of uncertainty to out-of- distribution detection for autonomous driving perception safety.IEEE Transactions on Intelli- gent Transportation Systems, 2025

Ke Wang, Qi Ma, Chongqiang Shen, and Jianbo Lu. Application of uncertainty to out-of- distribution detection for autonomous driving perception safety.IEEE Transactions on Intelli- gent Transportation Systems, 2025

work page 2025
[53]

Sun database: Large-scale scene recognition from abbey to zoo

Jianxiong Xiao, James Hays, Krista A Ehinger, Aude Oliva, and Antonio Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In2010 IEEE computer society conference on computer vision and pattern recognition, pages 3485–3492. IEEE, 2010

work page 2010
[54]

Scaling for training time and post-hoc out-of-distribution detection enhancement

Kai Xu, Rongyu Chen, Gianni Franchi, and Angela Yao. Scaling for training time and post-hoc out-of-distribution detection enhancement. InThe Twelfth International Conference on Learning Representations

work page
[55]

Mind the Way You Select Negative Texts: Pursuing the Distance Consistency in OOD Detection with VLMs

Zhikang Xu, Qianqian Xu, Zitai Wang, Cong Hua, Sicong Li, Zhiyong Yang, and Qingming Huang. Mind the way you select negative texts: Pursuing the distance consistency in ood detection with vlms.arXiv preprint arXiv:2603.02618, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[56]

Oodd: Test-time out-of-distribution detection with dynamic dictionary

Yifeng Yang, Lin Zhu, Zewen Sun, Hengyu Liu, Qinying Gu, and Nanyang Ye. Oodd: Test-time out-of-distribution detection with dynamic dictionary. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 30630–30639, 2025

work page 2025
[57]

Out-of-distribution detection using union of 1-dimensional subspaces

Alireza Zaeemzadeh, Niccolo Bisagno, Zeno Sambugaro, Nicola Conci, Nazanin Rahnavard, and Mubarak Shah. Out-of-distribution detection using union of 1-dimensional subspaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9452–9461, 2021

work page 2021
[58]

Openood v1

Jingyang Zhang, Jingkang Yang, Pengyun Wang, Haoqi Wang, Yueqian Lin, Haoran Zhang, Yiyou Sun, Xuefeng Du, Yixuan Li, Ziwei Liu, et al. Openood v1. 5: Enhanced benchmark for out-of-distribution detection. InNeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models

work page 2023
[59]

Adaneg: Adaptive negative proxy guided ood detection with vision-language models.Advances in Neural Information Processing Systems, 37:38744–38768, 2024

Yabin Zhang and Lei Zhang. Adaneg: Adaptive negative proxy guided ood detection with vision-language models.Advances in Neural Information Processing Systems, 37:38744–38768, 2024

work page 2024
[60]

Lapt: Label-driven automated prompt tuning for ood detection with vision-language models

Yabin Zhang, Wenjie Zhu, Chenhang He, and Lei Zhang. Lapt: Label-driven automated prompt tuning for ood detection with vision-language models. InEuropean conference on computer vision, pages 271–288. Springer, 2024

work page 2024
[61]

Activation matters: Test-time activated negative labels for ood detection with vision-language models.arXiv preprint arXiv:2603.25250, 2026

Yabin Zhang, Maya Varma, Yunhe Gao, Jean-Benoit Delbrouck, Jiaming Liu, Chong Wang, and Curtis Langlotz. Activation matters: Test-time activated negative labels for ood detection with vision-language models.arXiv preprint arXiv:2603.25250, 2026

work page arXiv 2026
[62]

Places: A 10 million image database for scene recognition.IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2017

Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition.IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2017

work page 2017
[63]

Ants: Adaptive negative textual space shaping for ood detection via test-time mllm understanding and reasoning.arXiv preprint arXiv:2509.03951, 2025

Wenjie Zhu, Yabin Zhang, Xin Jin, Wenjun Zeng, and Lei Zhang. Ants: Adaptive negative textual space shaping for ood detection via test-time mllm understanding and reasoning.arXiv preprint arXiv:2509.03951, 2025. 13 Appendices A Limitations 15 B Broader Impacts 15 C Pseudo Code 15 D Proof of Theorem 1 16 E More Detailed Results 16 E.1 Statistical Significa...

work page arXiv 2025
[64]

On CIFAR-10, our method also achieves the best performance

On CIFAR-100, our method substantially improves over InterNeg, reducing near-OOD FPR95 from 62.54% to 33.44% and far-OOD FPR95 from 20.02% to 2.90%. On CIFAR-10, our method also achieves the best performance. These results demonstrate that our method perform well beyond ImageNet-1K and remains effective across different ID label spaces. Table 8: Detailed ...

work page