pith. machine review for the scientific record. sign in

arxiv: 2605.04231 · v1 · submitted 2026-05-05 · 💻 cs.CV

Recognition: unknown

Anatomy of a failure: When, how, and why deep vision fails in scientific domains

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:32 UTC · model grok-4.3

classification 💻 cs.CV
keywords deep learning failurescientific imaginginfrared imagingsimplicity biasone-dimensional collapsepathologyAI safetymodality mismatch
0
0 comments X

The pith

Deep learning on information-rich infrared images collapses to one-dimensional predictions and underperforms despite its advantages over standard photos.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines why deep learning succeeds on everyday RGB images but fails when applied to scientific imaging that encodes precise physical and chemical properties across many channels. Comparing models trained on stained tissue RGB images against infrared images of the same samples shows that the more informative infrared data leads to worse performance. The cause is an interaction between the structured priors of infrared data and deep learning's simplicity bias, which drives models to base all predictions on a single dimension while leaving most of their internal capacity unused. This collapse persists even with state-of-the-art robustness techniques designed for RGB data. The finding indicates that generic deep learning frameworks can waste the quantitative strengths of scientific modalities in domains such as pathology.

Core claim

Naive application of deep learning to quantitative scientific images such as infrared data produces underperformance because the data priors interact poorly with the simplicity bias of deep networks, causing models to collapse to one-dimensional predictions. This leaves the model's representational capacity largely unused and undermines the advantages of information-rich scientific modalities, even when robustness strategies validated on RGB imagery are applied.

What carries the argument

The interaction between infrared data priors and deep learning simplicity bias that produces collapse to one-dimensional predictions.

Load-bearing premise

The observed underperformance and collapse to one dimension on infrared data is caused by the interaction between those data priors and deep learning simplicity bias rather than by dataset size, label quality, or specific architecture choices.

What would settle it

Train models on the same infrared dataset but with an added regularization term that penalizes low-dimensional internal representations, then measure whether prediction accuracy rises to match or exceed the RGB baseline.

Figures

Figures reproduced from arXiv: 2605.04231 by Dou Hoon Kwark, Ji-Hun Oh, John Cheville, Kevin Yeh, Kianoush Falahkheirkhah, Rohit Bhargava, Volodymyr Kindratenko.

Figure 1
Figure 1. Figure 1: High-level comparison. a. Example tile-level classification across input domains. b. Histogram comparing log(MSE) between virtual and real test images in IR and H&E domains. Note, data are standardized before MSE computation; lower MSE indicates higher translation accuracy. The right panel shows cases where: (I) IR and H&E are mutually translatable, and (II) IR-to-H&E translation is feasible, but the rever… view at source ↗
Figure 2
Figure 2. Figure 2: Cue analysis and network dissection. a. Sensitivity (DJS) to spatial frequency and HVS cues, with excess sensitivity (IR – H&E) shown above bars. b. Histogram of first-layer kernel total variation. c. Test accuracy across spatial downscaling factors; 1 denotes original resolution, 256 denotes full collapse. d. Intra-CKA computed on a test subset across all 54 2  ResNet50 layer pairs. e. Test accuracy drop… view at source ↗
Figure 3
Figure 3. Figure 3: Overfitting modes and SB dynamics. a. Accuracy measured before vs. after pruning the hard training subset, evaluated on both the test set and pruned subset. b. Inter-CKA computed on a subset of the combined train-test sets, using the last-layer activations across all 15 2  model pairs trained on different train/test splits. c. Sensitivity (DJS) responses to spatial frequency (left) and HVS cues (right) ac… view at source ↗
Figure 4
Figure 4. Figure 4: Failure repercussions. a. Grad-CAM++ saliency maps overlaid with ground-truth tumor regions (left), with CS curves (right) computed by thresholding the top 1–10% of saliency as attended regions. b. Test accuracy across tumor-ratio percentiles and spatial downscaling factors. c. For 11 EU estimators, normalized ECE after progressively rejecting the top 1–90% highest-EU test samples. Lower is better; 1 denot… view at source ↗
read the original abstract

Mirroring its ubiquity in popular media and all human activities, the use of deep learning (DL) is rapidly growing in scientific imaging modalities. However, unlike everyday RGB pictures, pixels encode precise physicochemical properties in scientific imaging across potentially thousands of channels. While DL is well validated on human-centric RGB perceptual tasks, its effectiveness for scientific imaging remains uncertain. Here, we show that the naive application of DL frameworks to scientific images can lead to critical failures. We evaluate the use of DL for pathology, comparing RGB images of stained tissue with the quantitative and information-rich biochemical signatures of infrared (IR) imaging. Despite this informational advantage, DL models trained on IR data paradoxically underperform. We investigate this discrepancy to find that IR data priors interact poorly with the simplicity bias of DL, causing models to collapse to one-dimensional predictions. This constitutes a catastrophic DL failure because the model's representational capacity remains largely unused, while furthermore raising AI safety concerns and undermining the advantages of such scientific modalities. Notably, this problem persists even with state-of-the-art DL robustification strategies, which are primarily designed and validated for RGB imagery and thus inherit the same prior-bias mismatch. This work establishes a framework for understanding the limitations of generic DL in science and advocates for the study of modality-specific failure modes to guide the development of specialized, safe AI algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that deep learning models applied to scientific imaging modalities such as infrared (IR) pathology data underperform relative to RGB images despite the former's richer physicochemical information content across thousands of channels. It attributes this to an interaction between IR data priors and DL simplicity bias, which causes models to collapse to one-dimensional predictions, rendering their representational capacity unused. The failure is reported to persist under state-of-the-art robustification strategies, establishing a framework for modality-specific DL limitations in science and raising AI safety concerns.

Significance. If the proposed mechanism is substantiated through controlled experiments, the work would be significant for identifying when generic DL frameworks fail in high-dimensional scientific domains rather than perceptual RGB tasks. It provides an empirical comparison grounded in real pathology data and highlights the need for specialized algorithms, which could influence development of safe AI for quantitative imaging modalities.

major comments (2)
  1. [Empirical evaluation and results] The central causal claim—that underperformance and one-dimensional collapse on IR data result specifically from the interaction of high-dimensional IR priors with DL simplicity bias rather than confounds—lacks isolating controls. No ablations are described that subsample the RGB dataset to match the effective sample size N of the IR experiments or inject comparable label noise, leaving the attribution unsecured even if the performance gap is replicated.
  2. [Robustification experiments] The persistence of the failure under robustification strategies is presented as evidence of a fundamental prior-bias mismatch, but these strategies were not tested under equalized data conditions (e.g., identical N or label quality between RGB and IR). This weakens the conclusion that the issue is modality-specific rather than architecture- or data-scale-dependent.
minor comments (2)
  1. [Abstract] The abstract and main text would benefit from explicit quantitative metrics (e.g., accuracy deltas, dimensionality measures of predictions, or statistical significance tests) to support the core observation of underperformance and collapse, as the current presentation remains largely qualitative.
  2. [Methods] Clarify the exact definition and measurement of 'one-dimensional predictions' and 'simplicity bias' in the context of the IR experiments to avoid interpretive ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help us strengthen the causal claims in our work. We address each major comment below and will update the manuscript with additional experiments and clarifications.

read point-by-point responses
  1. Referee: [Empirical evaluation and results] The central causal claim—that underperformance and one-dimensional collapse on IR data result specifically from the interaction of high-dimensional IR priors with DL simplicity bias rather than confounds—lacks isolating controls. No ablations are described that subsample the RGB dataset to match the effective sample size N of the IR experiments or inject comparable label noise, leaving the attribution unsecured even if the performance gap is replicated.

    Authors: We agree that isolating controls are important for securing the attribution to data priors rather than confounds such as sample size or label noise. In the revised manuscript, we will add ablations that subsample the larger RGB dataset to match the effective sample size N of the IR experiments and report performance under these matched conditions. For label noise, both modalities are annotated by the same experts on corresponding tissue sections, so label quality is comparable; nevertheless, we will include an additional experiment injecting synthetic label noise at matched levels to further isolate the effect of the priors. revision: yes

  2. Referee: [Robustification experiments] The persistence of the failure under robustification strategies is presented as evidence of a fundamental prior-bias mismatch, but these strategies were not tested under equalized data conditions (e.g., identical N or label quality between RGB and IR). This weakens the conclusion that the issue is modality-specific rather than architecture- or data-scale-dependent.

    Authors: We concur that testing under equalized conditions would better support the modality-specific nature of the failure. We will revise the robustification experiments section to include evaluations on subsampled RGB data with N matched to IR, as well as under controlled label quality. This will show that the one-dimensional collapse and underperformance persist even when data scale and quality are equalized, thereby reinforcing that the root cause is the mismatch between high-dimensional scientific priors and the simplicity bias of standard DL architectures. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on direct experimental comparisons without self-referential derivations

full rationale

The paper conducts an empirical study comparing deep learning performance on RGB versus IR imaging in pathology, reporting underperformance and one-dimensional collapse on IR data. Central claims are supported by observed results and attribution to data priors interacting with simplicity bias, without any mathematical derivations, equations, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the findings to inputs by construction. The analysis is self-contained through direct data comparisons and experimental observations, with no reduction of outputs to prior definitions or author-specific uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that simplicity bias is the dominant cause of the observed collapse and that standard robustification techniques are representative of best practices for scientific data.

axioms (1)
  • domain assumption Deep networks exhibit a simplicity bias that favors low-dimensional solutions when data priors allow it.
    Invoked to explain why IR data leads to underperformance despite higher information content.

pith-pipeline@v0.9.0 · 5570 in / 1180 out tokens · 26750 ms · 2026-05-08T17:32:24.477166+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

152 extracted references · 6 canonical work pages · 3 internal anchors

  1. [1]

    Underspecification presents challenges for credibility in modern machine learning,

    A. D’Amour, K. Heller, D. Moldovan, B. Adlam, B. Alipanahi, A. Beutel, C. Chen, J. Deaton, J. Eisenstein, M. D. Hoffman,et al., “Underspecification presents challenges for credibility in modern machine learning,”J. Mach. Learn. Res., vol. 23, no. 226, pp. 1–61, 2022

  2. [2]

    Shortcut learning in deep neural networks,

    R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F. A. Wich- mann, “Shortcut learning in deep neural networks,”Nat. Mach. Intell., vol. 2, no. 11, pp. 665– 673, 2020

  3. [3]

    Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,

    C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,”Nat. Mach. Intell., vol. 1, no. 5, pp. 206–215, 2019. 19 A PREPRINT

  4. [4]

    Avoiding a replication crisis in deep-learning-based bioimage analysis,

    R. F. Laine, I. Arganda-Carreras, R. Henriques, and G. Jacquemet, “Avoiding a replication crisis in deep-learning-based bioimage analysis,”Nat. Methods, vol. 18, no. 10, pp. 1136– 1144, 2021

  5. [5]

    Leakage and the reproducibility crisis in machine-learning- based science,

    S. Kapoor and A. Narayanan, “Leakage and the reproducibility crisis in machine-learning- based science,”Patterns, vol. 4, no. 9, 2023

  6. [6]

    Imagenet large scale visual recognition challenge,

    O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein,et al., “Imagenet large scale visual recognition challenge,”Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015

  7. [7]

    Pre-training on grayscale imagenet improves medical image clas- sification,

    Y . Xie and D. Richmond, “Pre-training on grayscale imagenet improves medical image clas- sification,” inEur. Conf. Comput. Vis. Worksh. (ECCVW), 2018

  8. [8]

    Unintended bias in 2d+ image segmentation and its effect on attention asymmetry,

    Z. Molnár, G. Szabó, and A. Horváth, “Unintended bias in 2d+ image segmentation and its effect on attention asymmetry,”arXiv preprint arXiv:2505.14105, 2025

  9. [9]

    Transfusion: Understanding transfer learning for medical imaging,

    M. Raghu, C. Zhang, J. Kleinberg, and S. Bengio, “Transfusion: Understanding transfer learning for medical imaging,” inNeural Inf. Process. Syst. (NeurIPS), 2019

  10. [10]

    Alleviating modality bias training for infrared-visible person re-identification,

    Y . Huang, Q. Wu, J. Xu, Y . Zhong, P. Zhang, and Z. Zhang, “Alleviating modality bias training for infrared-visible person re-identification,”IEEE Trans. Multimedia., vol. 24, pp. 1570– 1582, 2021

  11. [11]

    M-specgene: Generalized foundation model for rgbt multispectral vision,

    K. Zhou, F. Yang, S. Wang, B. Wen, C. Zi, L. Chen, Q. Shen, and X. Cao, “M-specgene: Generalized foundation model for rgbt multispectral vision,” inIEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2025

  12. [12]

    Scientific machine learning through physics–informed neural networks: Where we are and what’s next,

    S. Cuomo, V . S. Di Cola, F. Giampaolo, G. Rozza, M. Raissi, and F. Piccialli, “Scientific machine learning through physics–informed neural networks: Where we are and what’s next,” J. Sci. Comput., vol. 92, no. 3, p. 88, 2022

  13. [13]

    A versatile deep learning architecture for classi- fication and label-free prediction of hyperspectral images,

    B. Manifold, S. Men, R. Hu, and D. Fu, “A versatile deep learning architecture for classi- fication and label-free prediction of hyperspectral images,”Nat. Mach. Intell., vol. 3, no. 4, pp. 306–315, 2021

  14. [14]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inNeural Inf. Process. Syst. (NeurIPS), 2017

  15. [15]

    Convolutional neural networks analyzed via convolu- tional sparse coding,

    V . Papyan, Y . Romano, and M. Elad, “Convolutional neural networks analyzed via convolu- tional sparse coding,”J. Mach. Learn. Res., vol. 18, no. 83, pp. 1–52, 2017

  16. [16]

    Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations,

    N. McGreivy and A. Hakim, “Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations,”Nat. Mach. Intell., vol. 6, no. 10, pp. 1256–1269, 2024

  17. [17]

    Unraveling overoptimism and publication bias in ml-driven science,

    P. Saidi, G. Dasarathy, and V . Berisha, “Unraveling overoptimism and publication bias in ml-driven science,”Patterns, vol. 6, no. 4, 2025

  18. [18]

    Infrared spectroscopic imaging for histopathologic recognition,

    D. C. Fernandez, R. Bhargava, S. M. Hewitt, and I. W. Levin, “Infrared spectroscopic imaging for histopathologic recognition,”Nat. Biotechnol., vol. 23, no. 4, pp. 469–474, 2005

  19. [19]

    Digital histopathology by infrared spectroscopic imaging,

    R. Bhargava, “Digital histopathology by infrared spectroscopic imaging,”Annu. Rev. Anal. Chem., vol. 16, no. 1, pp. 205–230, 2023

  20. [20]

    Advances in mid-infrared spectroscopy for chemical analysis,

    J. Haas and B. Mizaikoff, “Advances in mid-infrared spectroscopy for chemical analysis,” Annu. Rev. Anal. Chem., vol. 9, no. 1, pp. 45–68, 2016

  21. [21]

    Infrared spectroscopic laser scanning confocal microscopy for whole-slide chemical imaging,

    K. Yeh, I. Sharma, K. Falahkheirkhah, M. P. Confer, A. C. Orr, Y .-T. Liu, Y . Phal, R.-J. Ho, M. Mehta, A. Bhargava,et al., “Infrared spectroscopic laser scanning confocal microscopy for whole-slide chemical imaging,”Nat. Commun., vol. 14, no. 1, p. 5215, 2023

  22. [22]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inIEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp. 770–778, 2016. 20 A PREPRINT

  23. [23]

    Understanding convolutional neural networks with information theory: An initial exploration,

    S. Yu, K. Wickstrøm, R. Jenssen, and J. C. Principe, “Understanding convolutional neural networks with information theory: An initial exploration,”Trans. Neural Netw. Learn. Syst., vol. 32, no. 1, pp. 435–442, 2020

  24. [24]

    Explaining deep and resnet architecture choices with informa- tion flow,

    S. Chang and J. C. Principe, “Explaining deep and resnet architecture choices with informa- tion flow,” inInt. Jt. Conf. Neural Netw. (IJCNN), pp. 1–6, 2022

  25. [25]

    Information-theoretic analysis of multimodal image translation,

    R. Liu, Y . Li, Y . Li, Y . P. Du, and Z.-P. Liang, “Information-theoretic analysis of multimodal image translation,”IEEE Trans. Med. Imaging., 2025

  26. [26]

    The visual filter mediating letter identification,

    J. A. Solomon and D. G. Pelli, “The visual filter mediating letter identification,”Nature, vol. 369, no. 6479, pp. 395–397, 1994

  27. [27]

    A fourier perspective on model robustness in computer vision,

    D. Yin, R. Gontijo Lopes, J. Shlens, E. D. Cubuk, and J. Gilmer, “A fourier perspective on model robustness in computer vision,” inNeural Inf. Process. Syst. (NeurIPS), vol. 32, 2019

  28. [28]

    Robust deep learning object recognition models rely on low frequency information in natural images,

    Z. Li, J. Ortega Caro, E. Rusak, W. Brendel, M. Bethge, F. Anselmi, A. B. Patel, A. S. Tolias, and X. Pitkow, “Robust deep learning object recognition models rely on low frequency information in natural images,”PLoS Comput. Biol., vol. 19, no. 3, p. e1010932, 2023

  29. [29]

    Spatial-frequency channels, shape bias, and adversarial robustness,

    A. Subramanian, E. Sizikova, N. Majaj, and D. Pelli, “Spatial-frequency channels, shape bias, and adversarial robustness,” inNeural Inf. Process. Syst. (NeurIPS), vol. 36, 2024

  30. [30]

    Gen- eralisation in humans and deep neural networks,

    R. Geirhos, C. R. Temme, J. Rauber, H. H. Schütt, M. Bethge, and F. A. Wichmann, “Gen- eralisation in humans and deep neural networks,” inNeural Inf. Process. Syst. (NeurIPS), vol. 31, 2018

  31. [31]

    Contributions of shape, texture, and color in visual recognition,

    Y . Ge, Y . Xiao, Z. Xu, X. Wang, and L. Itti, “Contributions of shape, texture, and color in visual recognition,” inEur. Conf. Comput. Vis. (ECCV), pp. 369–386, 2022

  32. [32]

    The origins and prevalence of texture bias in convo- lutional neural networks,

    K. Hermann, T. Chen, and S. Kornblith, “The origins and prevalence of texture bias in convo- lutional neural networks,” inNeural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 19000–19015, 2020

  33. [33]

    Does enhanced shape bias improve neural network robustness to common corruptions?,

    C. K. Mummadi, R. Subramaniam, R. Hutmacher, J. Vitay, V . Fischer, and J. H. Metzen, “Does enhanced shape bias improve neural network robustness to common corruptions?,” in Int. Conf. Learn. Represent. (ICLR), 2021

  34. [34]

    Are convolutional neural networks or transformers more like human vision?,

    S. Tuli, I. Dasgupta, E. Grant, and T. L. Griffiths, “Are convolutional neural networks or transformers more like human vision?,” inCogSci, 2021

  35. [35]

    Beyond accuracy: quantifying trial-by-trial behaviour of cnns and humans by measuring error consistency,

    R. Geirhos, K. Meding, and F. A. Wichmann, “Beyond accuracy: quantifying trial-by-trial behaviour of cnns and humans by measuring error consistency,” inNeural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 13890–13902, 2020

  36. [36]

    Similarity of neural network representa- tions revisited,

    S. Kornblith, M. Norouzi, H. Lee, and G. Hinton, “Similarity of neural network representa- tions revisited,” inInt. Conf. Mach. Learn. (ICML), pp. 3519–3529, 2019

  37. [37]

    The lottery ticket hypothesis: Finding sparse, trainable neural networks,

    J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” inInt. Conf. Learn. Represent. (ICLR), 2018

  38. [38]

    Complexity matters: feature learning in the presence of spurious correlations,

    G. Qiu, D. Kuang, and S. Goel, “Complexity matters: feature learning in the presence of spurious correlations,” inInt. Conf. Mach. Learn. (ICML), 2024

  39. [39]

    Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness,

    R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel, “Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness,” inInt. Conf. Learn. Represent. (ICLR), 2019

  40. [40]

    High-frequency component helps explain the generalization of convolutional neural networks,

    H. Wang, X. Wu, Z. Huang, and E. P. Xing, “High-frequency component helps explain the generalization of convolutional neural networks,” inIEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp. 8684–8694, 2020

  41. [41]

    Rethinking the image feature biases exhib- ited by deep convolutional neural network models in image recognition,

    D. Dai, Y . Li, Y . Wang, H. Bao, and G. Wang, “Rethinking the image feature biases exhib- ited by deep convolutional neural network models in image recognition,”CAAI Trans. Intell. Technol., vol. 7, no. 4, pp. 721–731, 2022. 21 A PREPRINT

  42. [42]

    Shape-biased cnns are not always superior in out-of-distribution robustness,

    X. Qiu, M. Kan, Y . Zhou, Y . Bi, and S. Shan, “Shape-biased cnns are not always superior in out-of-distribution robustness,” inIEEE Winter Conf. Appl. Comput. Vis. (WACV), pp. 2326– 2335, 2024

  43. [43]

    Computer-aided prostate cancer diagnosis from digitized histopathology: a review on texture-based systems,

    C. Mosquera-Lopez, S. Agaian, A. Velez-Hoyos, and I. Thompson, “Computer-aided prostate cancer diagnosis from digitized histopathology: a review on texture-based systems,”IEEE Rev. Biomed. Eng., vol. 8, pp. 98–113, 2014

  44. [44]

    Breast cancer histopathology image analysis: A review,

    M. Veta, J. P. Pluim, P. J. Van Diest, and M. A. Viergever, “Breast cancer histopathology image analysis: A review,”IEEE Trans. Biomed. Eng., vol. 61, no. 5, pp. 1400–1411, 2014

  45. [45]

    Feature contamination: Neural net- works learn uncorrelated features and fail to generalize,

    T. Zhang, C. Zhao, G. Chen, Y . Jiang, and F. Chen, “Feature contamination: Neural net- works learn uncorrelated features and fail to generalize,” inInt. Conf. Mach. Learn. (ICML), pp. 60446–60495, 2024

  46. [46]

    Reconciling modern machine-learning practice and the classical bias–variance trade-off,

    M. Belkin, D. Hsu, S. Ma, and S. Mandal, “Reconciling modern machine-learning practice and the classical bias–variance trade-off,”PNAS, vol. 116, no. 32, pp. 15849–15854, 2019

  47. [47]

    Triple descent and the two kinds of overfitting: Where & why do they appear?,

    S. d’Ascoli, L. Sagun, and G. Biroli, “Triple descent and the two kinds of overfitting: Where & why do they appear?,” inNeural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 3058–3069, 2020

  48. [48]

    Benign, tempered, or catastrophic: a taxonomy of overfitting,

    N. Mallinar, J. B. Simon, A. Abedsoltan, P. Pandit, M. Belkin, and P. Nakkiran, “Benign, tempered, or catastrophic: a taxonomy of overfitting,” inNeural Inf. Process. Syst. (NeurIPS), pp. 1182–1195, 2022

  49. [49]

    The pitfalls of mem- orization: When memorization hurts generalization,

    R. Bayat, M. Pezeshki, E. Dohmatob, D. Lopez-Paz, and P. Vincent, “The pitfalls of mem- orization: When memorization hurts generalization,” inInt. Conf. Learn. Represent. (ICLR), 2025

  50. [50]

    What neural networks memorize and why: Discovering the long tail via influence estimation,

    V . Feldman and C. Zhang, “What neural networks memorize and why: Discovering the long tail via influence estimation,” inNeural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 2881–2891, 2020

  51. [51]

    Characterizing structural regularities of labeled data in overparameterized models,

    Z. Jiang, C. Zhang, K. Talwar, and M. C. Mozer, “Characterizing structural regularities of labeled data in overparameterized models,” inInt. Conf. Mach. Learn. (ICML), pp. 5034– 5044, 2021

  52. [52]

    What do larger image classifiers memorise?,

    M. Lukasik, V . Nagarajan, A. S. Rawat, A. K. Menon, and S. Kumar, “What do larger image classifiers memorise?,”Trans. Mach. Learn. Res., 2024

  53. [53]

    Identifying mislabeled data using the area under the margin ranking,

    G. Pleiss, T. Zhang, E. Elenberg, and K. Q. Weinberger, “Identifying mislabeled data using the area under the margin ranking,” inNeural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 17044– 17056, 2020

  54. [54]

    Deep learning on a data diet: Finding important examples early in training,

    M. Paul, S. Ganguli, and G. K. Dziugaite, “Deep learning on a data diet: Finding important examples early in training,” inNeural Inf. Process. Syst. (NeurIPS), vol. 34, pp. 20596–20607, 2021

  55. [55]

    An empirical study of example forgetting during deep neural network learning,

    M. Toneva, A. Sordoni, R. T. des Combes, A. Trischler, Y . Bengio, and G. J. Gordon, “An empirical study of example forgetting during deep neural network learning,” inInt. Conf. Learn. Represent. (ICLR), 2019

  56. [56]

    Characterizing datapoints via second-split forgetting,

    P. Maini, S. Garg, Z. Lipton, and J. Z. Kolter, “Characterizing datapoints via second-split forgetting,” inNeural Inf. Process. Syst. (NeurIPS), vol. 35, pp. 30044–30057, 2022

  57. [57]

    Estimating example difficulty using variance of gradients,

    C. Agarwal, D. D’souza, and S. Hooker, “Estimating example difficulty using variance of gradients,” inIEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp. 10368–10378, 2022

  58. [58]

    Beyond neural scaling laws: beating power law scaling via data pruning,

    B. Sorscher, R. Geirhos, S. Shekhar, S. Ganguli, and A. Morcos, “Beyond neural scaling laws: beating power law scaling via data pruning,” inNeural Inf. Process. Syst. (NeurIPS), vol. 35, pp. 19523–19536, 2022. 22 A PREPRINT

  59. [59]

    A simple unified framework for detecting out-of- distribution samples and adversarial attacks,

    K. Lee, K. Lee, H. Lee, and J. Shin, “A simple unified framework for detecting out-of- distribution samples and adversarial attacks,” inNeural Inf. Process. Syst. (NeurIPS), vol. 31, 2018

  60. [60]

    Deep learning through the lens of example difficulty,

    R. Baldock, H. Maennel, and B. Neyshabur, “Deep learning through the lens of example difficulty,” inNeural Inf. Process. Syst. (NeurIPS), vol. 34, pp. 10876–10889, 2021

  61. [61]

    Stability and generalization,

    O. Bousquet and A. Elisseeff, “Stability and generalization,”J. Mach. Learn. Res., vol. 2, pp. 499–526, 2002

  62. [62]

    A closer look at memorization in deep networks,

    D. Arpit, S. Jastrz˛ ebski, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal, T. Maharaj, A. Fis- cher, A. Courville, Y . Bengio,et al., “A closer look at memorization in deep networks,” in Int. Conf. Mach. Learn. (ICML), pp. 233–242, 2017

  63. [63]

    Random deep neural networks are biased towards simple functions,

    G. De Palma, B. Kiani, and S. Lloyd, “Random deep neural networks are biased towards simple functions,” inNeural Inf. Process. Syst. (NeurIPS), vol. 32, 2019

  64. [64]

    Deep learning generalizes because the parameter-function map is biased towards simple functions,

    G. Valle-Perez, C. Q. Camargo, and A. A. Louis, “Deep learning generalizes because the parameter-function map is biased towards simple functions,” inInt. Conf. Learn. Represent. (ICLR), 2019

  65. [65]

    On the spectral bias of neural networks,

    N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y . Bengio, and A. Courville, “On the spectral bias of neural networks,” inInt. Conf. Mach. Learn. (ICML), pp. 5301–5310, 2019

  66. [66]

    The surprising simplicity of the early-time learning dynamics of neural networks,

    W. Hu, L. Xiao, B. Adlam, and J. Pennington, “The surprising simplicity of the early-time learning dynamics of neural networks,” inNeural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 17116–17128, 2020

  67. [67]

    The pitfalls of simplicity bias in neural networks,

    H. Shah, K. Tamuly, A. Raghunathan, P. Jain, and P. Netrapalli, “The pitfalls of simplicity bias in neural networks,” inNeural Inf. Process. Syst. (NeurIPS), vol. 33, pp. 9573–9585, 2020

  68. [68]

    Critical learning periods in deep networks,

    A. Achille, M. Rovere, and S. Soatto, “Critical learning periods in deep networks,” inInt. Conf. Learn. Represent. (ICLR), 2018

  69. [69]

    An investigation of why overparam- eterization exacerbates spurious correlations,

    S. Sagawa, A. Raghunathan, P. W. Koh, and P. Liang, “An investigation of why overparam- eterization exacerbates spurious correlations,” inInt. Conf. Mach. Learn. (ICML), pp. 8346– 8356, 2020

  70. [70]

    Identifying spurious biases early in training through the lens of simplicity bias,

    Y . Yang, E. Gan, G. K. Dziugaite, and B. Mirzasoleiman, “Identifying spurious biases early in training through the lens of simplicity bias,” inInt. Conf. Artif. Intell. Stat. (AISTATS), pp. 2953–2961, 2024

  71. [71]

    Intrinsic dimension of data representa- tions in deep neural networks,

    A. Ansuini, A. Laio, J. H. Macke, and D. Zoccolan, “Intrinsic dimension of data representa- tions in deep neural networks,” inNeural Inf. Process. Syst. (NeurIPS), vol. 32, 2019

  72. [72]

    Prevalence of neural collapse during the terminal phase of deep learning training,

    V . Papyan, X. Han, and D. L. Donoho, “Prevalence of neural collapse during the terminal phase of deep learning training,”PNAS, vol. 117, no. 40, pp. 24652–24663, 2020

  73. [73]

    An algorithm for finding intrinsic dimensionality of data,

    K. Fukunaga and D. R. Olsen, “An algorithm for finding intrinsic dimensionality of data,” IEEE Trans. Comput., vol. 100, no. 2, pp. 176–183, 1971

  74. [74]

    Maximum likelihood estimation of intrinsic dimension,

    E. Levina and P. Bickel, “Maximum likelihood estimation of intrinsic dimension,” inNeural Inf. Process. Syst. (NeurIPS), vol. 17, 2004

  75. [75]

    Estimating the intrinsic dimension of datasets by a minimal neighborhood information,

    E. Facco, M. d’Errico, A. Rodriguez, and A. Laio, “Estimating the intrinsic dimension of datasets by a minimal neighborhood information,”Sci. Rep., vol. 7, no. 1, p. 12140, 2017

  76. [76]

    Grad-cam++: Gener- alized gradient-based visual explanations for deep convolutional networks,

    A. Chattopadhay, A. Sarkar, P. Howlader, and V . N. Balasubramanian, “Grad-cam++: Gener- alized gradient-based visual explanations for deep convolutional networks,” inIEEE Winter Conf. Appl. Comput. Vis. (WACV), pp. 839–847, 2018

  77. [77]

    Frequency-tuned salient region detec- tion,

    R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detec- tion,” inIEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp. 1597–1604, 2009. 23 A PREPRINT

  78. [78]

    Hs-fpn: High frequency and spatial perception fpn for tiny object detection,

    Z. Shi, J. Hu, J. Ren, H. Ye, X. Yuan, Y . Ouyang, J. He, B. Ji, and J. Guo, “Hs-fpn: High frequency and spatial perception fpn for tiny object detection,” inAAAI Conf. Artif. Intell., pp. 6896–6904, 2025

  79. [79]

    Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer,

    B. E. Bejnordi, M. Veta, P. J. Van Diest, B. Van Ginneken, N. Karssemeijer, G. Litjens, J. A. Van Der Laak, M. Hermsen, Q. F. Manson, M. Balkenhol,et al., “Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer,”JAMA, vol. 318, no. 22, pp. 2199–2210, 2017

  80. [80]

    Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer,

    J. N. Kather, A. T. Pearson, N. Halama, D. Jäger, J. Krause, S. H. Loosen, A. Marx, P. Boor, F. Tacke, U. P. Neumann,et al., “Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer,”Nat. Med., vol. 25, no. 7, pp. 1054–1056, 2019

Showing first 80 references.