arxiv: 2604.04012 · v1 · submitted 2026-04-05 · 💻 cs.CV · cs.LG

Recognition: 1 theorem link

· Lean Theorem

OASIC: Occlusion-Agnostic and Severity-Informed Classification

Kay Gijzen (1 , 2) , Gertjan J. Burghouts (2) , Dani\"el M. Pelt (1) ((1) Leiden University , (2) TNO)

Authors on Pith no claims yet

Pith reviewed 2026-05-13 17:25 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords occlusionimage classificationmodel selectionmaskingseverity estimationrobust visioncomputer vision

0 comments

The pith

Masking occluders and routing each image to a severity-matched model lifts AUC on occluded images by 18.5 points.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies two distinct harms from occlusions: loss of visible object information and the addition of distracting patterns from the occluder. It removes the distractions by gray-masking the occluding regions at test time, treating them as visual anomalies relative to the object of interest. It further shows that occlusion severity can be estimated from the image and that a model trained on objects masked at a matching severity level outperforms any single model trained across a broader range. The resulting system estimates severity, applies the mask, and selects the appropriate model, producing the reported gains over both standard training on occluded data and fine-tuning on clean data. If the claim holds, variable real-world occlusions become easier to handle by maintaining a small set of specialized models rather than one universal classifier.

Core claim

OASIC estimates the severity of occlusion in a test image, masks the occluding pattern in gray, and routes the result to a classifier trained specifically for that severity band. This adaptive choice after masking outperforms every fixed model trained on any smaller or wider range of occlusion severities, because a model tuned to a narrow severity band generalizes better within its band than a model exposed to all bands at once.

What carries the argument

Severity-informed model selection after gray masking of occluders, where severity is estimated from the fraction of the object that remains visible.

If this is right

Gray masking removes distracting occluder patterns without requiring knowledge of the occluder type or shape.
Models trained on narrow severity bands outperform broader models when test severity matches the training band.
Severity estimation from the masked image enables correct dynamic routing among the specialized models.
The largest gains occur precisely when test occlusion severity falls inside the band for which the selected model was trained.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same severity-routing idea could be tested on other degradations such as blur or additive noise by training separate expert models for each degradation level.
If severity estimation is noisy, the adaptive system risks selecting a worse model than a single general classifier, so the method depends on accurate severity prediction.
Maintaining a handful of severity-specific models plus cheap routing may be preferable to one large universal model when occlusion levels vary widely across a deployment.

Load-bearing premise

Occlusion severity can be estimated reliably from the test image alone and the matching-severity model will outperform a single general model on images of similar severity.

What would settle it

A single model trained on the union of all severity levels performs as well as or better than the severity-selected models on a held-out test set whose occlusion levels are known in advance.

Figures

Figures reproduced from arXiv: 2604.04012 by 2), (2) TNO), Dani\"el M. Pelt (1) ((1) Leiden University, Gertjan J. Burghouts (2), Kay Gijzen (1.

**Figure 1.** Figure 1: Occlusion handling by OASIC. At test time, an occlusion map is inferred by scoring against the memory bank M. The segmented mask is turned into gray to suppress distraction, while the estimated occlusion severity informs the selection of the most suitable classification model f ∗ from the pool F, to better handle reduced visual information. 3.1 Occlusion segmentation and masking OASIC treats occlusion as a… view at source ↗

**Figure 2.** Figure 2: , the effect of different threshold values τ on the masking is clearly visible. (a) Occluded image (b) Masked image, τ = 0.3 (c) Masked image, τ = 0.5 (d) Masked image, τ = 0.7 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Larger degrees of occlusions (severity) deteriorate the performance of fine [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Textured occluders draw away the attention from the object. The first [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: OASIC is much more robust to severe occlusions, improving 5x compared [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

read the original abstract

Severe occlusions of objects pose a major challenge for computer vision. We show that two root causes are (1) the loss of visible information and (2) the distracting patterns caused by the occluders. Our approach addresses both causes at the same time. First, the distracting patterns are removed at test-time, via masking of the occluding patterns. This masking is independent of the type of occlusion, by handling the occlusion through the lens of visual anomalies w.r.t. the object of interest. Second, to deal with less visual details, we follow standard practice by masking random parts of the object during training, for various degrees of occlusions. We discover that (a) it is possible to estimate the degree of the occlusion (i.e. severity) at test-time, and (b) that a model optimized for a specific degree of occlusion also performs best on a similar degree during test-time. Combining these two insights brings us to a severity-informed classification model called OASIC: Occlusion Agnostic Severity Informed Classification. We estimate the severity of occlusion for a test image, mask the occluder, and select the model that is optimized for the degree of occlusion. This strategy performs better than any single model optimized for any smaller or broader range of occlusion severities. Experiments show that combining gray masking with adaptive model selection improves $\text{AUC}_\text{occ}$ by +18.5 over standard training on occluded images and +23.7 over finetuning on unoccluded images.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper proposes OASIC for classifying severely occluded objects in computer vision. It removes distracting occluder patterns at test time via gray masking treated as visual anomalies relative to the object, trains models on random masking at varying occlusion severities, and claims two discoveries: occlusion severity can be estimated from a test image, and severity-matched specialist models outperform general ones. The method estimates severity, applies masking, and selects the matching model, reporting +18.5 AUC_occ gains over standard occluded training and +23.7 over unoccluded finetuning.

Significance. If validated, the combination of occlusion-agnostic test-time masking and severity-adaptive model selection offers a practical route to robust classification under partial visibility, with potential impact on applications like autonomous driving or medical imaging. The empirical margins are notable, and the parameter-light masking strategy (no occlusion-type specifics) is a clear strength if the severity estimation step proves reliable across datasets.

major comments (3)

[Abstract / Experiments] Abstract and experimental results section: The headline +18.5 AUC_occ claim depends on reliable test-time severity estimation followed by specialist-model selection, yet no quantitative validation (MAE, rank correlation, or accuracy against ground-truth severity) or ablation injecting realistic estimation noise is reported; without these, the adaptive gain could collapse to random selection.
[Method / Experiments] Method description (severity-informed selection): The assertion that a model optimized for a specific occlusion degree performs best on matching test cases requires an explicit ablation against a single model trained on the full severity range; the current comparison to 'standard training' and 'finetuning on unoccluded images' does not isolate whether adaptive selection itself drives the reported margin.
[Experiments] Experimental setup: Full dataset descriptions, number of severity levels, implementation details of gray masking, and statistical reporting (error bars, number of runs) are absent, making it impossible to assess whether the AUC_occ improvements are robust or dataset-specific.

minor comments (3)

[Method] Clarify the exact procedure for anomaly-based masking (e.g., how the 'object of interest' reference is obtained at test time) and provide pseudocode or a diagram.
[Abstract / Related Work] Define AUC_occ explicitly and distinguish it from standard AUC; add citations to prior work on occlusion simulation and anomaly detection for masking.
[Figures / Tables] Figure captions and table headers should include more detail on occlusion severity ranges and baseline configurations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen the presentation of severity estimation validation, add the requested ablations, and include missing experimental details.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and experimental results section: The headline +18.5 AUC_occ claim depends on reliable test-time severity estimation followed by specialist-model selection, yet no quantitative validation (MAE, rank correlation, or accuracy against ground-truth severity) or ablation injecting realistic estimation noise is reported; without these, the adaptive gain could collapse to random selection.

Authors: We agree that quantitative validation of severity estimation is essential to support the adaptive selection claim. The current manuscript states that severity can be estimated but does not report MAE, correlation, or noise-injection ablations. We will add a dedicated subsection in Experiments reporting these metrics (MAE and Spearman rank correlation against ground-truth severity labels) plus an ablation that perturbs the estimated severity with realistic noise levels and measures the resulting drop in AUC_occ. This will demonstrate that the reported gains are not attributable to random model selection. revision: yes
Referee: [Method / Experiments] Method description (severity-informed selection): The assertion that a model optimized for a specific occlusion degree performs best on matching test cases requires an explicit ablation against a single model trained on the full severity range; the current comparison to 'standard training' and 'finetuning on unoccluded images' does not isolate whether adaptive selection itself drives the reported margin.

Authors: We acknowledge that the existing baselines do not fully isolate the benefit of severity-matched selection. We will add an explicit ablation training a single model on the union of all severity levels and directly comparing its performance to the specialist models selected by estimated severity on held-out test images. Results will be reported per severity bin to show that matched specialists outperform the unified model, thereby confirming that adaptive selection contributes to the observed margins. revision: yes
Referee: [Experiments] Experimental setup: Full dataset descriptions, number of severity levels, implementation details of gray masking, and statistical reporting (error bars, number of runs) are absent, making it impossible to assess whether the AUC_occ improvements are robust or dataset-specific.

Authors: We apologize for these omissions. The revised manuscript will expand the Experimental Setup section with: complete dataset descriptions and splits, the precise number of severity levels and their ranges, implementation details of the gray-masking procedure (anomaly threshold, masking strategy), and statistical reporting including error bars computed over multiple independent runs (e.g., 5 runs) with standard deviations. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical gains measured against external baselines

full rationale

The paper presents OASIC as an empirical strategy: gray masking at test time to remove occluders, severity estimation from the masked image, and selection among models trained on different occlusion severity ranges. All reported improvements (+18.5 AUC_occ over standard occluded training, +23.7 over unoccluded finetuning) are obtained by direct comparison to independently trained baselines on held-out test sets. No equations define severity as a function of the selected model's performance, no fitted parameter is relabeled as a prediction, and no self-citation is used to establish uniqueness or an ansatz. The derivation chain consists of standard training procedures plus an inference-time selection rule whose benefit is externally validated by the measured AUC margins rather than by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach relies on the assumption that anomaly detection can isolate occluders independently of type, and that severity estimation is accurate enough to enable model selection benefits.

free parameters (1)

occlusion severity levels
The specific degrees of occlusion used for training different models are chosen or fitted based on data distributions.

axioms (1)

domain assumption Occluders can be detected as visual anomalies relative to the object of interest
Assumed in the masking approach at test-time.

pith-pipeline@v0.9.0 · 5604 in / 1105 out tokens · 39893 ms · 2026-05-13T17:25:48.399835+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We estimate the severity of occlusion for a test image, mask the occluder, and select the model that is optimized for the degree of occlusion.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 3 internal anchors

[1]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Chen, J.N., Sun, S., He, J., Torr, P.H., Yuille, A., Bai, S.: Transmix: Attend to mix for vision transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12135–12144 (2022)

work page 2022
[2]

In: 2025 IEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV)

Damm, S., Laszkiewicz, M., Lederer, J., Fischer, A.: Anomalydino: Boosting patch- based few-shot anomaly detection with dinov2. In: 2025 IEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV). pp. 1319–1329. IEEE (2025)

work page 2025
[3]

Improved Regularization of Convolutional Neural Networks with Cutout

DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural net- works with cutout. arXiv preprint arXiv:1708.04552 (2017)

work page internal anchor Pith review arXiv 2017
[4]

In: BMVC

Fawzi, A., Frossard, P.: Measuring the effect of nuisance variables on classifiers. In: BMVC. pp. 137–1 (2016)

work page 2016
[5]

Trends in cogni- tive sciences3(4), 128–135 (1999)

French, R.M.: Catastrophic forgetting in connectionist networks. Trends in cogni- tive sciences3(4), 128–135 (1999)

work page 1999
[6]

https://github.com/jacobgil/pytorch-grad-cam (2021)

Gildenblat, J., contributors: Pytorch library for cam methods. https://github.com/jacobgil/pytorch-grad-cam (2021)

work page 2021
[7]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16000–16009 (2022)

work page 2022
[8]

112215 (2025)

Kassaw,K.,Luzi,F.,Collins,L.M.,Malof,J.M.:Aredeeplearningmodelsrobustto partial object occlusion in visual recognition tasks? Pattern Recognition p. 112215 (2025)

work page 2025
[9]

In: Proceedings of the IEEE/CVF international conference on computer vision

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4015–4026 (2023)

work page 2023
[10]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Kong, X., Zhang, X.: Understanding masked image modeling via learning occlu- sion invariant feature. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6241–6251 (2023)

work page 2023
[11]

In: ProceedingsoftheIEEE/CVFConferenceonComputerVisionandPatternRecog- nition

Kortylewski, A., He, J., Liu, Q., Yuille, A.L.: Compositional convolutional neu- ral networks: A deep architecture with innate robustness to partial occlusion. In: ProceedingsoftheIEEE/CVFConferenceonComputerVisionandPatternRecog- nition. pp. 8940–8949 (2020)

work page 2020
[12]

In: Proceedings of the IEEE International Conference on Computer Vision Workshops

Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine- grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. pp. 554–561 (2013)

work page 2013
[13]

In: Proceedings of the IEEE international conference on computer vision

Kumar Singh, K., Jae Lee, Y.: Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In: Proceedings of the IEEE international conference on computer vision. pp. 3524–3533 (2017)

work page 2017
[14]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Liang, F., Wu, B., Dai, X., Li, K., Zhao, Y., Zhang, H., Zhang, P., Vajda, P., Marculescu, D.: Open-vocabulary semantic segmentation with mask-adapted clip. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7061–7070 (2023)

work page 2023
[15]

In: 2020 international joint conference on neural networks (IJCNN)

Muhammad, M.B., Yeasin, M.: Eigen-cam: Class activation map using principal components. In: 2020 international joint conference on neural networks (IJCNN). pp. 1–7. IEEE (2020)

work page 2020
[16]

Advances in Neural Information Processing Systems34, 23296–23308 (2021)

Naseer, M.M., Ranasinghe, K., Khan, S.H., Hayat, M., Shahbaz Khan, F., Yang, M.H.: Intriguing properties of vision transformers. Advances in Neural Information Processing Systems34, 23296–23308 (2021)

work page 2021
[17]

Gijzen, et al

Oquab, M., Darcet, T., Moutakanni, T., Vo, H.V., Szafraniec, M., Khalidov, V., Fernandez, P., HAZIZA, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., 14 K. Gijzen, et al. Galuba, W., Howes, R., Huang, P.Y., Li, S.W., Misra, I., Rabbat, M., Sharma, V., Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: DINOv2: Learni...

work page 2024
[18]

IEEE Trans- actions on Systems, Man, and Cybernetics9(1), 62–66 (1979)

Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans- actions on Systems, Man, and Cybernetics9(1), 62–66 (1979)

work page 1979
[19]

ACM Siggraph Computer Graphics19(3), 287– 296 (1985)

Perlin, K.: An image synthesizer. ACM Siggraph Computer Graphics19(3), 287– 296 (1985)

work page 1985
[20]

In: International conference on machine learning

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021)

work page 2021
[21]

SAM 2: Segment Anything in Images and Videos

Ravi, N., Gabeur, V., Hu, Y.T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., et al.: Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Roth, K., Pemula, L., Zepeda, J., Schölkopf, B., Brox, T., Gehler, P.: Towards total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14318–14328 (2022)

work page 2022
[23]

In: Medi- cal Imaging 2025: Image-Guided Procedures, Robotic Interventions, and Modeling

Shen, Y., Ding, H., Shao, X., Unberath, M.: Performance and nonadversarial ro- bustness of the segment anything model 2 in surgical video segmentation. In: Medi- cal Imaging 2025: Image-Guided Procedures, Robotic Interventions, and Modeling. vol. 13408, pp. 93–98. SPIE (2025)

work page 2025
[24]

Journal of big data6(1), 1–48 (2019)

Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. Journal of big data6(1), 1–48 (2019)

work page 2019
[25]

In: CVPR 2011

Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR 2011. pp. 1521–1528. IEEE (2011)

work page 2011
[26]

Smart Trends in Computing and Communications: Proceedings of SmartCom 2025, Vol- ume 1010, 375 (2025)

Vashisht, A., Tekade, I., Shah, J., Sawarn, A., Yadav, D.S., Sontakke, P., Patil, R.: Effective segmentation of grape leaves using segment anything model 2. Smart Trends in Computing and Communications: Proceedings of SmartCom 2025, Vol- ume 1010, 375 (2025)

work page 2025
[27]

In: European Conference on Computer Vision

Xiao, M., Kortylewski, A., Wu, R., Qiao, S., Shen, W., Yuille, A.: Tdmpnet: Proto- type network with recurrent top-down modulation for robust object classification under partial occlusion. In: European Conference on Computer Vision. pp. 447–

work page
[28]

In: Proceedings of the IEEE/CVF international conference on computer vision

Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: Regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 6023–6032 (2019)

work page 2019
[29]

In: Proceedings of the IEEE/CVF international conference on computer vision

Zavrtanik, V., Kristan, M., Skočaj, D.: Draem-a discriminatively trained re- construction embedding for surface anomaly detection. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 8330–8339 (2021)

work page 2021
[30]

mixup: Beyond Empirical Risk Minimization

Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[31]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Zhang, Z., Xie, C., Wang, J., Xie, L., Yuille, A.L.: Deepvoting: A robust and explainable deep network for semantic part detection under partial occlusion. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1372–1380 (2018)

work page 2018
[32]

In: ICLR (2022)

Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A., Kong, T.: ibot: Image bert pre-training with online tokenize. In: ICLR (2022)

work page 2022