Recognition: 3 theorem links
· Lean TheoremRanked Activation Shift for Post-Hoc Out-of-Distribution Detection
Pith reviewed 2026-05-14 23:50 UTC · model grok-4.3
The pith
Replacing sorted penultimate activations with a fixed in-distribution profile produces consistent out-of-distribution detection without tuning or accuracy loss.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that Ranked Activation Shift achieves strong and consistent out-of-distribution detection by replacing the sorted magnitudes of penultimate-layer activations with a fixed reference profile taken from in-distribution samples. The substitution is performed once, requires no tuning, works for any activation function, and leaves the network's in-distribution predictions unchanged by construction. Separate analysis shows that the resulting inhibition and excitation of activations each contribute independently to improved separation between in-distribution and out-of-distribution inputs.
What carries the argument
Ranked Activation Shift, which sorts the magnitudes of penultimate activations and replaces them with corresponding entries from a fixed in-distribution reference profile to produce a shifted activation vector used for OOD scoring.
If this is right
- The method yields stable out-of-distribution performance across multiple datasets and network architectures without per-dataset adjustments.
- No hyperparameter search or assumptions about the penultimate activation function are required.
- Both the inhibitory and excitatory effects of the shift contribute separately to better discrimination.
- In-distribution classification accuracy remains exactly the same as the unmodified model.
Where Pith is reading between the lines
- Similar ranking-based replacement steps could be tested in other post-hoc correction techniques that currently rely on raw activation statistics.
- The observation that activation-distribution mismatch drives instability suggests that training procedures could be modified to encourage more uniform activation profiles.
- Reference profiles might be made slightly adaptive per class or per domain to handle structured distribution shifts while retaining the hyperparameter-free property.
Load-bearing premise
A single fixed reference profile computed from in-distribution activations will generalize to produce reliable shifts that discriminate out-of-distribution inputs across different datasets and models.
What would settle it
A controlled experiment on a new model or dataset in which the method's AUROC for out-of-distribution detection falls below that of simple baselines when the penultimate activation distribution deviates strongly from the training reference profile.
Figures
read the original abstract
State-of-the-art post-hoc out-of-distribution detection methods rely on intermediate layer activation editing. However, they exhibit inconsistent performance across datasets and models. We show that this instability is driven by differences in the activation distributions, and identify a failure mode of scaling-based methods that arises when penultimate layer activations are not rectified. Motivated by this analysis, we propose \ours, a hyperparameter-free post-hoc method that replaces sorted activation magnitudes with a fixed in-distribution reference profile. Our simple plug-and-play method shows strong and consistent performance across datasets and architectures without assumptions on the penultimate layer activation function, and without requiring any hyperparameter tuning, while preserving in-distribution classification accuracy by construction. We further analyze what drives the improvement, showing that both inhibiting and exciting activation shifts independently contribute to better out-of-distribution discrimination.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes failure modes in state-of-the-art post-hoc OOD detection methods that rely on intermediate-layer activation editing, attributing instability to differences in activation distributions and identifying a specific failure mode in scaling-based methods when penultimate-layer activations are not rectified. It proposes Ranked Activation Shift (RAS), a hyperparameter-free method that replaces sorted activation magnitudes with a fixed in-distribution reference profile. The central claims are strong and consistent performance across datasets and architectures without assumptions on the activation function or hyperparameter tuning, preservation of ID classification accuracy by construction, and that both inhibiting and exciting shifts contribute to improved OOD discrimination.
Significance. If the empirical claims hold, the work provides a simple, plug-and-play post-hoc OOD detector that directly addresses documented inconsistencies in prior activation-editing methods. The parameter-free nature and lack of activation-function assumptions would make it immediately usable across models, while the analysis of shift directions offers mechanistic insight into what drives effective discrimination.
major comments (2)
- [Abstract] Abstract and method description: the central claim of 'strong and consistent performance across datasets and architectures without ... hyperparameter tuning' rests on the fixed reference profile generalizing to new ID data and architectures. If penultimate-layer statistics (means, tails, rectification status) differ between the profile-construction set and a later ID test set, the ranked shift can misalign and weaken OOD discrimination, even while preserving nominal ID accuracy. No explicit test of this generalization risk is described.
- [Abstract] The manuscript provides no access to full experimental results, tables, or implementation details, so the quantitative support for the performance claims cannot be verified (soundness rated 4.0).
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract] Abstract and method description: the central claim of 'strong and consistent performance across datasets and architectures without ... hyperparameter tuning' rests on the fixed reference profile generalizing to new ID data and architectures. If penultimate-layer statistics (means, tails, rectification status) differ between the profile-construction set and a later ID test set, the ranked shift can misalign and weaken OOD discrimination, even while preserving nominal ID accuracy. No explicit test of this generalization risk is described.
Authors: We appreciate the referee's point regarding potential misalignment if the reference profile is applied to ID data with differing activation statistics. In the current experiments, the reference profile is always constructed from the training split of the same ID dataset used for evaluation, ensuring distributional alignment by design. To directly address the generalization concern, we will add a new subsection in the revised manuscript with experiments that vary the reference set construction (e.g., using random subsets or cross-validation folds of the ID training data) and report the resulting OOD detection performance. These results will quantify sensitivity to within-ID variations and support the claim of robustness without hyperparameter tuning. revision: partial
-
Referee: [Abstract] The manuscript provides no access to full experimental results, tables, or implementation details, so the quantitative support for the performance claims cannot be verified (soundness rated 4.0).
Authors: We agree that the current manuscript version lacks complete tables and implementation details in the main body, which limits immediate verification. In the revised submission, we will include expanded experimental tables reporting all metrics across every dataset-architecture pair, along with a dedicated reproducibility section that provides a link to the full open-source code repository containing the implementation, data preprocessing scripts, and exact experimental configurations. revision: yes
Circularity Check
Derivation chain is self-contained without reduction to inputs
full rationale
The proposed Ranked Activation Shift method defines a fixed in-distribution reference profile from training data and uses it to adjust activation ranks for OOD scoring. This is a direct algorithmic construction, not a fitted model whose outputs are then presented as predictions. The paper's analysis of activation distributions and failure modes provides independent motivation, and performance is validated empirically rather than following tautologically from the definition. No load-bearing self-citations or ansatzes smuggled in are present in the provided text.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Penultimate layer activations can be sorted by magnitude and replaced by a fixed reference without changing in-distribution classification accuracy
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel (J-cost uniqueness) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
RAS … replaces sorted activation magnitudes with a fixed in-distribution reference profile … ¯a_π(j) ← μ_j … μ = 1/N Σ r(a_i)
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanalpha_pin_under_high_calibration unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
hyperparameter-free … without assumptions on the penultimate layer activation function
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection (coupling-combiner forces bilinear branch) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
both inhibiting and exciting activation shifts independently contribute
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in ai safety. arXiv e-prints pp. arXiv–1606 (2016)
work page 2016
-
[2]
https://huggingface.co/spaces/ Bekhouche/ImageNet-1k_leaderboard(2025), accessed: 2026-03-02
Bekhouche, M.: ImageNet-1k Leaderboard. https://huggingface.co/spaces/ Bekhouche/ImageNet-1k_leaderboard(2025), accessed: 2026-03-02
work page 2025
-
[3]
In: Computer Vision and Pattern Recognition Conference
Bendale, A., Boult, T.E.: Towards open set deep networks. In: Computer Vision and Pattern Recognition Conference. pp. 1563–1572 (2016)
work page 2016
-
[4]
In: International Conference on Machine Learning
Bitterwolf, J., Müller, M., Hein, M.: In or out? fixing ImageNet out-of-distribution detection evaluation. In: International Conference on Machine Learning. pp. 2471– 2506 (2023)
work page 2023
-
[5]
IEEE Access12, 79401–79414 (2024)
Borlino, F.C., Lu, L., Tommasi, T.: Foundation models and fine-tuning: A bench- mark for out-of-distribution detection. IEEE Access12, 79401–79414 (2024)
work page 2024
-
[6]
In: Computer Vision and Pattern Recognition Conference
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: Computer Vision and Pattern Recognition Conference. pp. 3606–3613 (2014)
work page 2014
-
[7]
In: Computer Vision and Pattern Recognition Conference
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large- scale hierarchical image database. In: Computer Vision and Pattern Recognition Conference. pp. 248–255. IEEE (2009)
work page 2009
-
[8]
IEEE Signal Processing Magazine29(6), 141–142 (2012)
Deng, L.: The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine29(6), 141–142 (2012)
work page 2012
-
[9]
In: International Conference on Learning Representations (2023)
Djurisic, A., Bozanic, N., Ashok, A., Liu, R.: Extremely simple activation shap- ing for out-of-distribution detection. In: International Conference on Learning Representations (2023)
work page 2023
-
[10]
In: International Conference on Learning Representations (2021)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
work page 2021
-
[11]
In: Computer Vision Winter Workshop (2025)
Guglielmo, G., Masana, M.: Leveraging intermediate representations for better out-of-distribution detection. In: Computer Vision Winter Workshop (2025)
work page 2025
-
[12]
Harun, M.Y., Lee, K., Gallardo, J., Krishnan, G., Kanan, C.: What variables affect out-of-distribution generalization in pretrained models? Advances in Neural Information Processing Systems (2024)
work page 2024
-
[13]
In: Computer Vision and Pattern Recognition Conference
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition Conference. pp. 770–778 (2016)
work page 2016
-
[14]
In: International Conference on Machine Learning
Hendrycks, D., Basart, S., Mazeika, M., Zou, A., Kwon, J., Mostajabi, M., Stein- hardt, J., Song, D.: Scaling out-of-distribution detection for real-world settings. In: International Conference on Machine Learning. pp. 8759–8773. PMLR (2022)
work page 2022
-
[15]
In: International Conference on Learning Represen- tations (2019)
Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. In: International Conference on Learning Represen- tations (2019)
work page 2019
-
[16]
In: International Conference on Learning Representations (2017)
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of- distribution examples in neural networks. In: International Conference on Learning Representations (2017)
work page 2017
-
[17]
Hong, Z., Yue, Y., Chen, Y., Cong, L., Lin, H., Luo, Y., Wang, M.H., Wang, W., Xu, J., Yang, X., et al.: Out-of-distribution detection in medical image analysis: A survey. arXiv e-prints pp. arXiv–2404 (2024) 16 G. Guglielmo and M. Masana
work page 2024
-
[18]
Advances in Neural Information Processing Systems34, 677–689 (2021)
Huang, R., Geng, A., Li, Y.: On the importance of gradients for detecting distribu- tional shifts in the wild. Advances in Neural Information Processing Systems34, 677–689 (2021)
work page 2021
-
[19]
In: Advances in Neural Information Processing Systems (2025)
Meza De la Jara, I., Rodriguez-Opazo, C., Teney, D., Ranasinghe, D., Abbasnejad, E.: Mysteries of the deep: Role of intermediate representations in out-of-distribution detection. In: Advances in Neural Information Processing Systems (2025)
work page 2025
-
[20]
Master’s thesis, Department of Computer Science, University of Toronto (2009)
Krizhevsky, A., Hinton, G., et al.: Learning Multiple Layers of Features from Tiny Images. Master’s thesis, Department of Computer Science, University of Toronto (2009)
work page 2009
-
[21]
Advances in Neural Information Processing Systems31(2018)
Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out- of-distribution samples and adversarial attacks. Advances in Neural Information Processing Systems31(2018)
work page 2018
-
[22]
In: International Conference on Learning Representa- tions (2018)
Liang, S., Li, Y., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks. In: International Conference on Learning Representa- tions (2018)
work page 2018
-
[23]
Advances in Neural Information Processing Systems33, 21464–21475 (2020)
Liu, W., Wang, X., Owens, J., Li, Y.: Energy-based out-of-distribution detection. Advances in Neural Information Processing Systems33, 21464–21475 (2020)
work page 2020
-
[24]
In: Computer Vision and Pattern Recognition Conference
Liu, X., Lochman, Y., Zach, C.: GEN: Pushing the limits of softmax-based out-of- distribution detection. In: Computer Vision and Pattern Recognition Conference. pp. 23946–23955 (2023)
work page 2023
-
[25]
In: International Conference on Computer Vision
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision. pp. 10012–10022 (2021)
work page 2021
-
[26]
In: Computer Vision and Pattern Recognition Conference
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s. In: Computer Vision and Pattern Recognition Conference. pp. 11976–11986 (2022)
work page 2022
-
[27]
Transactions on Machine Learning Research (2024)
Miyai, A., Yang, J., Zhang, J., Ming, Y., Lin, Y., Yu, Q., Irie, G., Joty, S., Li, Y., Li, H.H., et al.: Generalized out-of-distribution detection and beyond in vision language model era: A survey. Transactions on Machine Learning Research (2024)
work page 2024
-
[28]
In: International Conference on Machine Learning (2025)
Müller, M., Hein, M.: Mahalanobis++: Improving ood detection via feature nor- malization. In: International Conference on Machine Learning (2025)
work page 2025
-
[29]
In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y., et al.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning. Granada (2011)
work page 2011
-
[30]
In: Computer Vision and Pattern Recognition Conference
Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., Dollár, P.: Designing network design spaces. In: Computer Vision and Pattern Recognition Conference. pp. 10428– 10436 (2020)
work page 2020
-
[31]
Ren, J., Fort, S., Liu, J., Guha Roy, A., Padhy, S., Lakshminarayanan, B.: A simple fix to mahalanobis distance for improving near-ood detection. arXiv e-prints pp. arXiv–2106 (2021)
work page 2021
-
[32]
In: International Conference on Machine Learning
Sastry, C.S., Oore, S.: Detecting out-of-distribution examples with gram matrices. In: International Conference on Machine Learning. pp. 8491–8501. PMLR (2020)
work page 2020
-
[33]
In: Advances in Neural Information Processing Systems (2022)
Song, Y., Sebe, N., Wang, W.: Rankfeat: Rank-1 feature removal for out-of- distribution detection. In: Advances in Neural Information Processing Systems (2022)
work page 2022
-
[34]
Advances in Neural Information Processing Systems34, 144–157 (2021)
Sun, Y., Guo, C., Li, Y.: React: Out-of-distribution detection with rectified activa- tions. Advances in Neural Information Processing Systems34, 144–157 (2021)
work page 2021
-
[35]
In: European Conference on Computer Vision
Sun, Y., Li, Y.: DICE: Leveraging sparsification for out-of-distribution detection. In: European Conference on Computer Vision. pp. 691–708. Springer (2022) Ranked Activation Shift 17
work page 2022
-
[36]
In: International Conference on Machine Learning
Sun, Y., Ming, Y., Zhu, X., Li, Y.: Out-of-distribution detection with deep nearest neighbors. In: International Conference on Machine Learning. pp. 20827–20840. PMLR (2022)
work page 2022
-
[37]
In: International Conference on Machine Learning
Tan, M., Le, Q.: EfficientNet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. pp. 6105–6114. PMLR (2019)
work page 2019
-
[38]
In: International Congress on Information and Communication Technol- ogy
Thimonier, H., Popineau, F., Rimmel, A., Doan, B.L., Daniel, F.: Comparative evaluation of anomaly detection methods for fraud detection in online credit card payments. In: International Congress on Information and Communication Technol- ogy. pp. 37–50. Springer (2024)
work page 2024
-
[39]
TorchVision maintainers and contributors: TorchVision: PyTorch’s computer vision library.https://github.com/pytorch/vision(2016)
work page 2016
-
[40]
In: International Conference on Machine Learning
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning. pp. 10347–10357. PMLR (2021)
work page 2021
-
[41]
In: Computer Vision and Pattern Recognition Conference
Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P., Belongie, S.: The inaturalist species classification and detection dataset. In: Computer Vision and Pattern Recognition Conference. pp. 8769–8778 (2018)
work page 2018
-
[42]
In: International Conference on Computer Vision
Venkataramanan, A., Benbihi, A., Laviale, M., Pradalier, C.: Gaussian latent representations for uncertainty estimation using mahalanobis distance in deep classifiers. In: International Conference on Computer Vision. pp. 4488–4497 (2023)
work page 2023
-
[43]
In: Computer Vision and Pattern Recognition Conference
Wang, H., Li, Z., Feng, L., Zhang, W.: Vim: Out-of-distribution with virtual-logit matching. In: Computer Vision and Pattern Recognition Conference. pp. 4921–4930 (2022)
work page 2022
-
[44]
In: Conference on Empirical Methods in Natural Language Processing
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., Rush, A.: Transformers: State-of-the-art natural language processing. In: Conference on Empirical Methods in Nat...
work page 2020
-
[45]
In: International Conference on Computer Vision
Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., Zhang, L.: CvT: Introduc- ing convolutions to vision transformers. In: International Conference on Computer Vision. pp. 22–31 (2021)
work page 2021
-
[46]
In: International Conference on Learning Representations (2024)
Xu, K., Chen, R., Franchi, G., Yao, A.: Scaling for training time and post-hoc out-of-distribution detection enhancement. In: International Conference on Learning Representations (2024)
work page 2024
-
[47]
International Journal of Computer Vision132(12), 5635–5662 (2024)
Yang, J., Zhou, K., Li, Y., Liu, Z.: Generalized out-of-distribution detection: A survey. International Journal of Computer Vision132(12), 5635–5662 (2024)
work page 2024
-
[48]
Yang,J.,Zhou,K.,Liu,Z.:Full-spectrumout-of-distributiondetection.International Journal of Computer Vision131(10), 2607–2622 (2023)
work page 2023
-
[49]
Journal of Data-centric Machine Learning Research (2024)
Zhang, J., Yang, J., Wang, P., Wang, H., Lin, Y., Zhang, H., Sun, Y., Du, X., Zhou, K., Zhang, W., et al.: Openood v1.5: Enhanced benchmark for out-of-distribution detection. Journal of Data-centric Machine Learning Research (2024)
work page 2024
-
[50]
In: International Conference on Learning Representations (2022)
Zhang, J., Fu, Q., Chen, X., Du, L., Li, Z., Wang, G., Han, S., Zhang, D., et al.: Out- of-distribution detection based on in-distribution data patterns memorization with modern hopfield energy. In: International Conference on Learning Representations (2022)
work page 2022
-
[51]
IEEE Transactions on Pattern Analysis and Machine Intelligence40(6), 1452–1464 (2017) 18 G
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence40(6), 1452–1464 (2017) 18 G. Guglielmo and M. Masana Appendix A Covariate Shift Beyond semantic shifts, robust OoD detection must also handle inputs affected by covariate sh...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.