pith. machine review for the scientific record. sign in

arxiv: 2605.07821 · v1 · submitted 2026-05-08 · 💻 cs.CV · cs.AI

Recognition: no theorem link

Divide and Conquer: Object Co-occurrence Helps Mitigate Simplicity Bias in OOD Detection

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:50 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords out-of-distribution detectionobject co-occurrencesimplicity biasdisentangled representationsnear-OODdivide-and-conquersemantic contextcomputer vision
0
0 comments X

The pith

Object co-occurrence patterns in images enable a divide-and-conquer OOD detection method that distinguishes near-OOD samples by using semantic context rather than simple features.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current out-of-distribution detection methods often fail on near-OOD cases because neural networks exhibit simplicity bias and focus on easy-to-learn image regions instead of building rich disentangled representations. The paper claims that object co-occurrence patterns, meaning how different objects tend to appear together in natural scenes, supply the missing contextual information to overcome this limitation. It predicts separate object representations for a test image, checks those patterns against statistics from the in-distribution training set, and sorts the case into one of three scenarios before applying a tailored detection step. This divide-and-conquer process lets the detector consider semantic relationships among objects instead of isolated simple cues. Readers would care because better near-OOD detection directly improves the safety of deployed vision systems that must handle subtle real-world shifts.

Core claim

The paper establishes that an Object-Centric OOD detection framework can capture Object CO-occurrence (OCO) patterns by first predicting disentangled representations for a test sample, then adaptively dividing the observed patterns into three scenarios according to co-occurrence statistics from the ID training data, and finally executing OOD detection in a divide-and-conquer fashion; this allows the method to distinguish near-OOD samples through semantic contextual relationships instead of defaulting to simple, easily learnable regions.

What carries the argument

Object co-occurrence (OCO) patterns that are observed in ID training data and used to adaptively divide each test sample into one of three scenarios before targeted detection.

If this is right

  • The framework produces competitive OOD detection results on both challenging and full-spectrum benchmarks.
  • It handles detection under both semantic shifts and covariate shifts in the test data.
  • Near-OOD performance improves specifically because the method incorporates semantic contextual relationships instead of relying solely on simple features.
  • The divide-and-conquer structure allows separate handling of different pattern types rather than a single entangled representation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same co-occurrence division idea could be tested on other vision tasks that suffer from feature bias, such as robustness to adversarial perturbations or domain generalization.
  • One could measure whether the performance gain scales with the diversity of object categories in the training set, providing a testable prediction about data requirements.
  • In practice the method might reduce false negatives for safety-critical applications like autonomous driving where near-OOD objects appear in unusual but still plausible combinations.

Load-bearing premise

Object co-occurrence patterns measured from the in-distribution training data are representative enough to correctly assign any new test sample to one of the three scenarios, and that assignment reliably reduces simplicity bias when learning disentangled representations.

What would settle it

A collection of near-OOD images whose object co-occurrence statistics closely match the ID training distribution yet are still misclassified as in-distribution by the method, or an ablation showing that the three-scenario division produces no gain over a baseline without the division step.

Figures

Figures reproduced from arXiv: 2605.07821 by Boyang Dai, Chaoqi Chen, Yizhou Yu.

Figure 1
Figure 1. Figure 1: Attention visualization of vanilla method and object [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our OCO. We first establish ID training data object co-occurrence pattern statistics ( [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Number of samples in each group for ID (ImageNet [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: OOD detection results under different scenarios. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: OOD detection results on different slot numbers. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of object co-occurrence probabilities ver [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Attention visualization of ID. sine and cosine-Gaussian kernels to improve detection per￾formance while maintaining computational efficiency. Our method is primarily based on probabilistic scoring, lever￾aging Maximum Softmax Probability (MSP) to normalize all scores within the [0,1] interval. This probabilistic for￾mulation enables a natural representation of OOD scores while ensuring consistent scaling a… view at source ↗
Figure 9
Figure 9. Figure 9: Attention visualization of OOD. From a human visual perspective, the background indeed shares similar visual features with slugs. When OOD ob￾ject co-occurrence appears (third row), the scene presents higher complexity with human arms intersecting with a stingray. Initially, the slots capture the human arm features and misidentify them as basset. However, the model correctly identifies the object as a stin… view at source ↗
Figure 10
Figure 10. Figure 10: Score distributions for ViT model on ImageNet-200 [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
read the original abstract

Out-of-distribution (OOD) detection is crucial for ensuring the reliability of deep learning models. Existing methods mostly focus on regular entangled representations to discriminate in-distribution (ID) and OOD data, neglecting the rich contextual information within images. This issue is particularly challenging for detecting near-OOD, as models with simplicity bias struggle to learn discriminative features in disentangled representations. The human visual system can use the co-occurrence of objects in the natural environment to facilitate scene understanding. Inspired by this, we propose an Object-Centric OOD detection framework that learns to capture Object CO-occurrence (OCO) patterns within images. The proposed method introduces a new OOD detection paradigm that understands object co-occurrence within an image by predicting disentangled representations for the test sample, then adaptively divides patterns into three scenarios based on object co-occurrence patterns observed in ID training data, and finally performs OOD detection in a divide-and-conquer manner. By doing so, OCO can distinguish near-OOD by considering the semantic contextual relationships present in their images, avoiding the tendency to focus solely on simple, easily learnable regions. We evaluate OCO through experiments across challenging and full-spectrum OOD settings, demonstrating competitive results and confirming its ability to address both semantic and covariate shifts. Code is released at https://github.com/Michael-McQueen/OCO.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces an Object-Centric OOD detection framework (OCO) that uses object co-occurrence patterns from in-distribution (ID) training data to improve detection of near out-of-distribution (OOD) samples. The approach predicts disentangled object-centric representations for test images, adaptively divides them into three scenarios based on how their co-occurrence patterns match ID statistics, and applies OOD scoring in a divide-and-conquer manner to mitigate simplicity bias by considering semantic contextual relationships rather than simple features.

Significance. If the empirical results hold, this work provides a novel paradigm for OOD detection by drawing inspiration from human scene understanding via object co-occurrences. It directly targets the challenge of near-OOD detection where standard methods fail due to simplicity bias in learning discriminative features. The competitive performance on challenging OOD settings and the public code release make it a potentially impactful contribution to reliable deep learning systems.

major comments (2)
  1. Methods section (description of disentangled representation prediction and scenario division): The central claim that OCO mitigates simplicity bias rests on the ability to predict disentangled representations that reliably capture object co-occurrence patterns for the adaptive division step. The abstract notes that models 'struggle to learn discriminative features in disentangled representations,' yet the framework uses exactly these predictions to partition test samples into the three ID-derived scenarios. Without an ablation demonstrating that the co-occurrence predictor avoids attending to simple background features (e.g., via attention visualization or feature importance analysis on near-OOD samples), the divide-and-conquer benefit for subtle contextual shifts remains unverified.
  2. Experiments section (results tables on near-OOD benchmarks): The paper claims competitive results across full-spectrum OOD settings, but does not report per-scenario OOD scores or an ablation comparing the full OCO pipeline against a baseline that uses the same disentangled representations without the three-way division. This is load-bearing for the claim that the adaptive division specifically addresses simplicity bias, as opposed to the gains coming from the object-centric representation alone.
minor comments (2)
  1. Abstract: The three scenarios are referenced but not briefly characterized (e.g., 'matched,' 'partially matched,' 'unmatched' co-occurrences); adding one sentence would improve accessibility.
  2. Related Work: The positioning relative to prior object-centric and context-aware OOD methods could be expanded with 2-3 additional citations to recent disentanglement-based detectors.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments and recommendations. We address each of the major comments in detail below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: Methods section (description of disentangled representation prediction and scenario division): The central claim that OCO mitigates simplicity bias rests on the ability to predict disentangled representations that reliably capture object co-occurrence patterns for the adaptive division step. The abstract notes that models 'struggle to learn discriminative features in disentangled representations,' yet the framework uses exactly these predictions to partition test samples into the three ID-derived scenarios. Without an ablation demonstrating that the co-occurrence predictor avoids attending to simple background features (e.g., via attention visualization or feature importance analysis on near-OOD samples), the divide-and-conquer benefit for subtle contextual shifts remains unverified.

    Authors: We appreciate the referee pointing out this potential inconsistency. The statement in the abstract refers to the general difficulty that standard OOD detection models face when relying on disentangled representations due to simplicity bias. Our proposed OCO framework, however, introduces a dedicated co-occurrence predictor trained specifically on ID data to capture object co-occurrence statistics. This allows for reliable prediction of disentangled object-centric representations tailored to co-occurrence patterns. To further validate that the predictor focuses on semantic object information rather than simple background features, we will incorporate attention visualizations and feature importance analyses for near-OOD samples in the revised version of the manuscript. This addition will provide empirical support for the effectiveness of the division step in mitigating simplicity bias. revision: yes

  2. Referee: Experiments section (results tables on near-OOD benchmarks): The paper claims competitive results across full-spectrum OOD settings, but does not report per-scenario OOD scores or an ablation comparing the full OCO pipeline against a baseline that uses the same disentangled representations without the three-way division. This is load-bearing for the claim that the adaptive division specifically addresses simplicity bias, as opposed to the gains coming from the object-centric representation alone.

    Authors: We agree that demonstrating the specific contribution of the adaptive division is essential. We will add an ablation study that compares the complete OCO framework to a variant that employs the same disentangled representations but omits the three-scenario division, applying a uniform OOD scoring instead. Furthermore, we will include per-scenario OOD detection scores in the experimental results to illustrate performance variations across the different co-occurrence scenarios. These revisions will help isolate the benefits of the divide-and-conquer strategy and reinforce that the improvements are not solely attributable to the object-centric representations. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces an Object-Centric OOD detection framework that learns OCO patterns from ID training data, predicts disentangled representations for test samples, divides them into three scenarios based on those patterns, and applies divide-and-conquer OOD scoring. No step reduces a claimed prediction or result to its own inputs by construction, as the division and scoring rely on empirical co-occurrence statistics evaluated on held-out OOD benchmarks rather than tautological re-use of fitted values. No self-citation is load-bearing for the central claim, and the method does not rename known results or smuggle ansatzes via prior work. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

With only the abstract available, concrete free parameters cannot be enumerated, but the approach implicitly depends on rules or thresholds for dividing into three scenarios and on the ability to predict disentangled representations. The core domain assumption is that co-occurrence statistics from ID data transfer usefully to OOD detection.

free parameters (1)
  • scenario division criteria or thresholds
    Adaptive division of patterns into three scenarios based on observed ID co-occurrence likely requires chosen or fitted rules.
axioms (1)
  • domain assumption Object co-occurrence patterns in natural images provide discriminative contextual information for distinguishing ID from near-OOD samples
    This is the central inspiration drawn from the human visual system and stated as the basis for the divide-and-conquer strategy.

pith-pipeline@v0.9.0 · 5540 in / 1246 out tokens · 48893 ms · 2026-05-11T01:50:26.116247+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages

  1. [1]

    NECO: neural col- lapse based out-of-distribution detection

    Mou ¨ın Ben Ammar, Nacim Belkhir, Sebastian Popescu, An- toine Manzanera, and Gianni Franchi. NECO: neural col- lapse based out-of-distribution detection. InICLR. OpenRe- view.net, 2024. 5, 6, 7, 8

  2. [2]

    In or out? fixing imagenet out-of-distribution detection evalua- tion

    Julian Bitterwolf, Maximilian M ¨uller, and Matthias Hein. In or out? fixing imagenet out-of-distribution detection evalua- tion. InICML, pages 2471–2506, 2023. 5

  3. [3]

    Object represen- tations in the human brain reflect the co-occurrence statistics of vision and language.Nature communications, 12(1):4081,

    Michael F Bonner and Russell A Epstein. Object represen- tations in the human brain reflect the co-occurrence statistics of vision and language.Nature communications, 12(1):4081,

  4. [4]

    Burgess, Loic Matthey, Nicholas Watters, Rishabh Kabra, Irina Higgins, Matt Botvinick, and Alexan- der Lerchner

    Christopher P. Burgess, Loic Matthey, Nicholas Watters, Rishabh Kabra, Irina Higgins, Matt Botvinick, and Alexan- der Lerchner. Monet: Unsupervised scene decomposition and representation, 2019. 8

  5. [5]

    Compound domain generalization via meta- knowledge encoding

    Chaoqi Chen, Jiongcheng Li, Xiaoguang Han, Xiaoqing Liu, and Yizhou Yu. Compound domain generalization via meta- knowledge encoding. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 7119–7129, 2022. 1

  6. [6]

    Dual energy-based model with open- world uncertainty estimation for out-of-distribution detec- tion

    Qi Chen and Hu Ding. Dual energy-based model with open- world uncertainty estimation for out-of-distribution detec- tion. InCVPR, pages 25728–25737, 2025. 1

  7. [7]

    On the properties of neural machine translation: Encoder-decoder approaches

    Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches. InEMNLP, pages 103–111, 2014. 2

  8. [8]

    Describing textures in the wild

    Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild. InCVPR, pages 3606–3613, 2014. 5

  9. [9]

    Davenport and Mary Potter

    Jodi L. Davenport and Mary Potter. Scene consistency in object and background perception.Psychological Science, 15:559 – 564, 2004. 2

  10. [10]

    A generalization of bayesian inference

    Arthur P Dempster. A generalization of bayesian inference. Journal of the Royal Statistical Society: Series B (Method- ological), 30(2):205–232, 1968. 4

  11. [11]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InCVPR, pages 248–255, 2009. 5

  12. [12]

    Arcface: Additive angular margin loss for deep face recognition

    Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. InCVPR, pages 4690–4699, 2019. 1

  13. [13]

    Extremely simple activation shaping for out- of-distribution detection

    Andrija Djurisic, Nebojsa Bozanic, Arjun Ashok, and Rosanne Liu. Extremely simple activation shaping for out- of-distribution detection. InICLR, 2023. 8

  14. [14]

    Adversarially robust few-shot learn- ing via parameter co-distillation of similarity and class con- cept learners

    Junhao Dong, Piotr Koniusz, Junxi Chen, Xiaohua Xie, and Yew-Soon Ong. Adversarially robust few-shot learn- ing via parameter co-distillation of similarity and class con- cept learners. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 28535– 28544, 2024. 1

  15. [15]

    Confound from all sides, distill with resilience: Multi- objective adversarial paths to zero-shot robustness

    Junhao Dong, Jiao Liu, Xinghua Qu, and Yew-Soon Ong. Confound from all sides, distill with resilience: Multi- objective adversarial paths to zero-shot robustness. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 624–634, 2025. 1

  16. [16]

    Allies teach better than enemies: Inverse adversaries for robust knowledge distilla- tion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

    Junhao Dong, Raoof Zare Moayedi, Yew-Soon Ong, and Seyed-Mohsen Moosavi-Dezfooli. Allies teach better than enemies: Inverse adversaries for robust knowledge distilla- tion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026. 1

  17. [17]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InICLR, 2021. 5

  18. [18]

    VOS: learning what you don’t know by virtual outlier synthesis

    Xuefeng Du, Zhaoning Wang, Mu Cai, and Yixuan Li. VOS: learning what you don’t know by virtual outlier synthesis. In ICLR, 2022. 8

  19. [19]

    Unsupervised open- vocabulary object localization in videos

    Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, and Tong He. Unsupervised open- vocabulary object localization in videos. InICCV, pages 13701–13709. IEEE, 2023. 8

  20. [20]

    Rethinking amodal video segmentation from learning supervised signals with object-centric representation

    Ke Fan, Jingshi Lei, Xuelin Qian, Miaopeng Yu, Tianjun Xiao, Tong He, Zheng Zhang, and Yanwei Fu. Rethinking amodal video segmentation from learning supervised signals with object-centric representation. InICCV, pages 1272–

  21. [21]

    Flexible visual recognition by evidential modeling of confu- sion and ignorance

    Lei Fan, Bo Liu, Haoxiang Li, Ying Wu, and Gang Hua. Flexible visual recognition by evidential modeling of confu- sion and ignorance. InICCV, pages 1338–1347. IEEE, 2023. 4

  22. [22]

    Kernel PCA for out-of-distribution detection

    Kun Fang, Qinghua Tao, Kexin Lv, Mingzhen He, Xiaolin Huang, and Jie Yang. Kernel PCA for out-of-distribution detection. InNeurIPS, 2024. 5, 6, 7, 8

  23. [23]

    Is out-of-distribution detection learnable? In NeurIPS, pages 37199–37213, 2022

    Zhen Fang, Yixuan Li, Jie Lu, Jiahua Dong, Bo Han, and Feng Liu. Is out-of-distribution detection learnable? In NeurIPS, pages 37199–37213, 2022. 1

  24. [24]

    MIT press, 1998

    Christiane Fellbaum.WordNet: An electronic lexical database. MIT press, 1998. 8

  25. [25]

    Exploring the limits of out-of-distribution detection

    Stanislav Fort, Jie Ren, and Balaji Lakshminarayanan. Exploring the limits of out-of-distribution detection. In NeurIPS, pages 7068–7081, 2021. 1

  26. [26]

    Botvinick, and Alexander Lerchner

    Klaus Greff, Rapha ¨el Lopez Kaufman, Rishabh Kabra, Nick Watters, Chris Burgess, Daniel Zoran, Loic Matthey, Matthew M. Botvinick, and Alexander Lerchner. Multi- object representation learning with iterative variational in- ference. InICML, pages 2424–2433. PMLR, 2019. 8

  27. [27]

    Weinberger

    Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks. InICML, pages 1321–1330, 2017. 4

  28. [28]

    Training independent subnetworks for robust prediction

    Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew Mingbo Dai, and Dustin Tran. Training independent subnetworks for robust prediction. InICLR, 2021. 3

  29. [29]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR, pages 770–778, 2016. 1

  30. [30]

    Dietterich

    Dan Hendrycks and Thomas G. Dietterich. Benchmarking neural network robustness to common corruptions and per- turbations. InICLR, 2019. 5 4

  31. [31]

    A baseline for detect- ing misclassified and out-of-distribution examples in neural networks

    Dan Hendrycks and Kevin Gimpel. A baseline for detect- ing misclassified and out-of-distribution examples in neural networks. InICLR, 2017. 1, 8

  32. [32]

    Dietterich

    Dan Hendrycks, Mantas Mazeika, and Thomas G. Dietterich. Deep anomaly detection with outlier exposure. InICLR,

  33. [33]

    The many faces of robustness: A criti- cal analysis of out-of-distribution generalization

    Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kada- vath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, and Justin Gilmer. The many faces of robustness: A criti- cal analysis of out-of-distribution generalization. InICCV. IEEE, 2021. 5

  34. [34]

    Scaling out-of-distribution detection for real-world settings

    Dan Hendrycks, Steven Basart, Mantas Mazeika, Moham- madreza Mostajabi, Jacob Steinhardt, and Dawn Xiaodong Song. Scaling out-of-distribution detection for real-world settings. InICML, pages 8759–8773, 2022. 5, 6, 7, 8

  35. [35]

    Fever-ood: Free energy vulnerability elimination for robust out-of-distribution detec- tion

    Brian KS Isaac-Medina, Mauricio Che, Yona Falinie A Gaus, Samet Akcay, and Toby P Breckon. Fever-ood: Free energy vulnerability elimination for robust out-of-distribution detec- tion. InICCV, pages 4529–4538, 2025. 1

  36. [36]

    Learning to compose: Improving object centric learn- ing by injecting compositionality

    Whie Jung, Jaehoon Yoo, Sungjin Ahn, and Seunghoon Hong. Learning to compose: Improving object centric learn- ing by injecting compositionality. InICLR, 2024. 2

  37. [37]

    A simple unified framework for detecting out-of-distribution samples and adversarial attacks

    Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. InNeurIPS, pages 7167– 7177, 2018. 1

  38. [38]

    Fast decision boundary based out- of-distribution detector

    Litian Liu and Yao Qin. Fast decision boundary based out- of-distribution detector. InICML, 2024. 5, 6, 7

  39. [39]

    Owens, and Yixuan Li

    Weitang Liu, Xiaoyun Wang, John D. Owens, and Yixuan Li. Energy-based out-of-distribution detection. InNeurIPS, pages 21464–21475, 2020. 1, 5, 6, 7, 8

  40. [40]

    Object- centric learning with slot attention

    Francesco Locatello, Dirk Weissenborn, Thomas Un- terthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, and Thomas Kipf. Object- centric learning with slot attention. InNeurIPS, pages 11525–11538, 2020. 2, 8

  41. [41]

    Decoupled weight decay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InICLR, 2019. 5

  42. [42]

    Generalized out-of-distribution detection and be- yond in vision language model era: A survey, 2024

    Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Yueqian Lin, Qing Yu, Go Irie, Shafiq Joty, Yixuan Li, Hai Li, Ziwei Liu, Toshihiko Yamasaki, and Kiyoharu Aizawa. Generalized out-of-distribution detection and be- yond in vision language model era: A survey, 2024. 1

  43. [43]

    Maxime Oquab, Timoth ´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rab- bat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e J´egou, Julien Mairal, P...

  44. [44]

    The effects of contextual scenes on the identification of objects.Memory & cognition, 3(5):519– 526, 1975

    Stephen E Palmer. The effects of contextual scenes on the identification of objects.Memory & cognition, 3(5):519– 526, 1975. 2

  45. [45]

    Nearest neighbor guidance for out-of-distribution detection

    Jaewoo Park, Yoon Gyo Jung, and Andrew Beng Jin Teoh. Nearest neighbor guidance for out-of-distribution detection. InICCV, pages 1686–1695. IEEE, 2023. 1, 5, 6, 7

  46. [46]

    Do imagenet classifiers generalize to im- agenet? InICML, 2019

    Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to im- agenet? InICML, 2019. 5

  47. [47]

    Bridging the gap to real-world object-centric learning

    Maximilian Seitzer, Max Horn, Andrii Zadaianchuk, Do- minik Zietlow, Tianjun Xiao, Carl-Johann Simon-Gabriel, Tong He, Zheng Zhang, Bernhard Sch¨olkopf, Thomas Brox, and Francesco Locatello. Bridging the gap to real-world object-centric learning. InICLR, 2023. 5, 8

  48. [48]

    Princeton University Press, 1976

    G Shafer.A Mathematical Theory of Evidence. Princeton University Press, 1976. 4

  49. [49]

    The pitfalls of simplicity bias in neural networks

    Harshay Shah, Kaustav Tamuly, Aditi Raghunathan, Prateek Jain, and Praneeth Netrapalli. The pitfalls of simplicity bias in neural networks. InNeurIPS, 2020. 1

  50. [50]

    DICE: leveraging sparsification for out-of-distribution detection

    Yiyou Sun and Yixuan Li. DICE: leveraging sparsification for out-of-distribution detection. InECCV, pages 691–708. Springer, 2022. 8

  51. [51]

    React: Out-of- distribution detection with rectified activations

    Yiyou Sun, Chuan Guo, and Yixuan Li. React: Out-of- distribution detection with rectified activations. InNeurIPS, pages 144–157, 2021. 8

  52. [52]

    Out-of- distribution detection with deep nearest neighbors

    Yiyou Sun, Yifei Ming, Xiaojin Zhu, and Yixuan Li. Out-of- distribution detection with deep nearest neighbors. InICML, pages 20827–20840, 2022. 1

  53. [53]

    Non- parametric outlier synthesis

    Leitian Tao, Xuefeng Du, Jerry Zhu, and Yixuan Li. Non- parametric outlier synthesis. InICLR, 2023. 8

  54. [54]

    Traffic sign detection using a multi-scale re- current attention network.IEEE transactions on intelligent transportation systems, 20(12):4466–4475, 2019

    Yan Tian, Judith Gelernter, Xun Wang, Jianyuan Li, and Yizhou Yu. Traffic sign detection using a multi-scale re- current attention network.IEEE transactions on intelligent transportation systems, 20(12):4466–4475, 2019. 1

  55. [55]

    Overcoming simplicity bias in deep networks using a feature sieve

    Rishabh Tiwari and Pradeep Shenoy. Overcoming simplicity bias in deep networks using a feature sieve. InICML, pages 34330–34343. PMLR, 2023. 1

  56. [56]

    Unbiased look at dataset bias

    Antonio Torralba and Alexei A Efros. Unbiased look at dataset bias. InCVPR 2011, pages 1521–1528. IEEE, 2011. 1

  57. [57]

    The inaturalist species classification and de- tection dataset

    Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification and de- tection dataset. InCVPR, pages 8769–8778, 2018. 5

  58. [58]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNIPS, pages 5998– 6008, 2017. 2

  59. [59]

    Open-set recognition: A good closed-set classifier is all you need

    Sagar Vaze, Kai Han, Andrea Vedaldi, and Andrew Zisser- man. Open-set recognition: A good closed-set classifier is all you need. InICLR, 2022. 1, 5

  60. [60]

    Vim: Out-of-distribution with virtual-logit matching

    Haoqi Wang, Zhizhong Li, Litong Feng, and Wayne Zhang. Vim: Out-of-distribution with virtual-logit matching. In CVPR, pages 4921–4930, 2022. 1, 5

  61. [61]

    SINDER: repairing the singular defects of dinov2

    Haoqi Wang, Tong Zhang, and Mathieu Salzmann. SINDER: repairing the singular defects of dinov2. InECCV, 2024. 5

  62. [62]

    Mitigating neural network overconfidence with logit normalization

    Hongxin Wei, Renchunzi Xie, Hao Cheng, Lei Feng, Bo An, and Yixuan Li. Mitigating neural network overconfidence with logit normalization. InICML, pages 23631–23644,

  63. [63]

    Provable compositional generalization for object-centric learning

    Thadd ¨aus Wiedemer, Jack Brady, Alexander Panfilov, At- tila Juhos, Matthias Bethge, and Wieland Brendel. Provable compositional generalization for object-centric learning. In ICLR, 2024. 2

  64. [64]

    Scaling for training time and post-hoc out-of-distribution de- tection enhancement

    Kai Xu, Rongyu Chen, Gianni Franchi, and Angela Yao. Scaling for training time and post-hoc out-of-distribution de- tection enhancement. InICLR, 2024. 5, 6, 7

  65. [65]

    Openood: Benchmarking generalized out-of-distribution detection

    Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenxuan Peng, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, et al. Openood: Benchmarking generalized out-of-distribution detection. InNeurIPS, pages 32598–32611, 2022. 5

  66. [66]

    Full-spectrum out-of-distribution detection.Int

    Jingkang Yang, Kaiyang Zhou, and Ziwei Liu. Full-spectrum out-of-distribution detection.Int. J. Comput. Vis., 131(10): 2607–2622, 2023. 2, 5

  67. [67]

    Oodd: Test-time out-of-distribution detection with dynamic dictionary

    Yifeng Yang, Lin Zhu, Zewen Sun, Hengyu Liu, Qinying Gu, and Nanyang Ye. Oodd: Test-time out-of-distribution detection with dynamic dictionary. InCVPR, pages 30630– 30639, 2025. 1, 5, 6, 7

  68. [68]

    Openslot: Mixed open-set recognition with object-centric learning.arXiv preprint arXiv:2407.02386,

    Xu Yin, Fei Pan, Guoyuan An, Yuchi Huo, Zixuan Xie, and Sung-Eui Yoon. Openslot: Mixed open-set recognition with object-centric learning.arXiv preprint arXiv:2407.02386,

  69. [69]

    Out- of-distribution detection based on in-distribution data pat- terns memorization with modern hopfield energy

    Jinsong Zhang, Qiang Fu, Xu Chen, Lun Du, Zelin Li, Gang Wang, Xiaoguang Liu, Shi Han, and Dongmei Zhang. Out- of-distribution detection based on in-distribution data pat- terns memorization with modern hopfield energy. InICLR,

  70. [70]

    Openood v1

    Jingyang Zhang, Jingkang Yang, Pengyun Wang, Haoqi Wang, Yueqian Lin, Haoran Zhang, Yiyou Sun, Xue- feng Du, Kaiyang Zhou, Wayne Zhang, Yixuan Li, Ziwei Liu, Yiran Chen, and Hai Li. Openood v1.5: Enhanced benchmark for out-of-distribution detection.arXiv preprint arXiv:2306.09301, 2023. 5

  71. [71]

    Feature contamination: Neural networks learn uncorrelated features and fail to generalize

    Tianren Zhang, Chujie Zhao, Guanyu Chen, Yizhou Jiang, and Feng Chen. Feature contamination: Neural networks learn uncorrelated features and fail to generalize. InICML,

  72. [72]

    Zhang, Simon Lacoste-Julien, Gert- jan J

    Yan Zhang, David W. Zhang, Simon Lacoste-Julien, Gert- jan J. Burghouts, and Cees G. M. Snoek. Unlocking slot at- tention by changing optimal transport costs. InICML, pages 41931–41951. PMLR, 2023. 8

  73. [73]

    Adap- tive prompt learning via gaussian outlier synthesis for out- of-distribution detection

    Yongkang Zhang, Dongyu She, and Zhong Zhou. Adap- tive prompt learning via gaussian outlier synthesis for out- of-distribution detection. InICCV, pages 3235–3244, 2025. 1

  74. [74]

    Linfeng Zhao, Lingzhi Kong, Robin Walters, and Lawson L. S. Wong. Toward compositional generalization in object- oriented world modeling. InICML, pages 26841–26864. PMLR, 2022. 2 6