pith. sign in

arxiv: 2606.03493 · v1 · pith:P56VHMEMnew · submitted 2026-06-02 · 💻 cs.CV · cs.LG

Low-Frequency Shortcuts in Texture-Driven Visual Learning

Pith reviewed 2026-06-28 10:28 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords shortcut learningtexture-driven domainslow-frequency componentsspectral analysisvisual classificationout-of-distribution robustness
0
0 comments X

The pith

Texture-driven visual models base most decisions on a few low-frequency components even though classification information lies in higher-frequency details.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes shortcut learning in neural networks but shifts focus from standard shape-driven benchmarks to texture-driven domains. It establishes that these models exhibit low-frequency shortcuts, relying on skewed spectral behavior from a small set of low-frequency components. Removing those components from both training and test data produces more balanced frequency use and raises in-distribution accuracy by as much as 8 percent. The same shortcuts cause large drops in accuracy under out-of-distribution corruptions, while their removal improves robustness to low-frequency corruptions at the expense of performance on high-frequency ones.

Core claim

Texture-driven domains suffer from low-frequency shortcuts. Models make the majority of their decisions based on a few low-frequency components with skewed spectral behavior, despite classification information residing in higher-frequency fine-grained details. Pruning the low-frequency components from training and test sets eliminates the shortcut, yields balanced spectral behavior, and improves in-distribution accuracy by up to 8 percent. The shortcuts also render models vulnerable to out-of-distribution corruptions, with accuracy drops reaching 70 percent, while pruning improves robustness to low-frequency corruptions by up to 40 percent and creates a trade-off on high-frequency corruption

What carries the argument

Low-frequency components (LFCs) identified by spectral analysis of model decisions; pruning them from images forces a shift from skewed to balanced spectral reliance.

If this is right

  • Pruning LFCs raises in-distribution accuracy by up to 8 percent.
  • Low-frequency shortcuts cause accuracy drops of up to 70 percent under out-of-distribution corruptions.
  • Pruning LFCs improves robustness to low-frequency corruptions by up to 40 percent.
  • The resulting balanced spectral behavior produces opposing effects on generalization to low-frequency versus high-frequency corruptions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Frequency-aware data filtering may be worth testing on other texture-heavy tasks such as material or medical-image classification.
  • Training procedures could incorporate explicit penalties against over-reliance on any single frequency band to reduce shortcut formation.
  • The observed low-versus-high frequency trade-off suggests that robustness benchmarks should separately report performance across spectral regimes rather than aggregate scores alone.

Load-bearing premise

The assumption that the classification signal truly resides in the higher-frequency components and that removing the identified low-frequency components does not discard task-relevant information or introduce new artifacts.

What would settle it

Observe whether accuracy gains from LFC pruning disappear when the same models are evaluated on versions of the data in which higher-frequency content has been deliberately degraded while low-frequency content remains intact.

Figures

Figures reproduced from arXiv: 2606.03493 by Cathy Hou, David Alvarez-Melis, Stratos Idreos, Utku \c{S}irin.

Figure 1
Figure 1. Figure 1: We show that texture-driven domains make majority of their decisions based on a few [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Frequency analysis methodology. When pruning, we remove frequency com￾ponents diagonally from the top-left to the bottom-right of the image (or vice versa), since oscillation rates and spatial complexity increase along this direction. Each such diag￾onal is referred to as a frequency component; the terms diagonal, frequency component, and component are used interchangeably. Prun￾ing is used for sensitivity… view at source ↗
Figure 3
Figure 3. Figure 3: Sample images for the texture￾driven domains we study. Ground Terrain Recognition. Ground terrain recog￾nition supports applications such as autonomous driv￾ing [61; 89] and robot navigation [29; 96]. Terrains correspond to surface types (e.g., leaves, grass) and are classified using spatial cues that characterize sur￾face material and texture (3rd image in [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: ID accuracy results for pruning LFCs (dark line), MFCs (light orange line), and HFCs (red [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Accuracy contributions with unpruned (top) and pruned (bottom) training and test images. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Sample images from TextileNet (top) and CIFAR-10 (bottom). These results indicate that texture-driven domains suffer from low-frequency shortcuts. While texture￾driven domains have their classification information primarily in higher frequencies, neural networks rely exponentially more on LFCs than they do on HFCs. Pruning LFCs mitigates the shortcut by shifting the spectral behavior towards higher frequen… view at source ↗
Figure 7
Figure 7. Figure 7: OOD results for fog (top) and Gaussian blur (bottom) corruptions for ResNet-50. [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Spectral behavior of TextileNet. significantly decreases its ID accuracy, by up to 10% at 10 LFCs. In [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Low-frequency shortcuts persist across mixed-semantics tasks. 2. High-Frequency Corruption: Gaussian Blur. The left graph at bottom of [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Pruning LFCs improves OOD accuracy. OOD Summary. Low-frequency shortcuts make models highly vul￾nerable to OOD corruptions, causing up to 70% accuracy drop compared to ID performance. Pruning LFCs significantly improves robustness to low-frequency corruptions, up to 40%, and introduces a trade-off for high￾frequency corruptions; the improved spectral behavior provides a better generalization, whereas the … view at source ↗
Figure 11
Figure 11. Figure 11: Low-frequency shortcuts for a VFM (DinoV2) and VLM (CLIP) using GTOS dataset. [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Test set ID accuracy results for pruning LFCs (dark line) and HFCs (red line). Test-set [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Accuracy contributions based on test-set images. [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: OOD corruption results based on test-set images. [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Spectral behavior for ResNet-50 across different seeds when using unpruned images. As [PITH_FULL_IMAGE:figures/full_fig_p013_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Spectral behavior for ResNet-50 across different seeds when using pruned images. As can [PITH_FULL_IMAGE:figures/full_fig_p013_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Sample images from CIFAR-10 (left) and texture-driven tasks (right). [PITH_FULL_IMAGE:figures/full_fig_p014_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: ID Results for ResNet-50, MobileNet-V3, ViT-Small, and ViT-Tony. [PITH_FULL_IMAGE:figures/full_fig_p015_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Spectral Behavior for ResNet-50, MobileNet-V3, ViT-Small, and ViT-Tony when trained [PITH_FULL_IMAGE:figures/full_fig_p016_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Spectral Behavior for ResNet-50, MobileNet-V3, ViT-Small, and ViT-Tony when trained [PITH_FULL_IMAGE:figures/full_fig_p016_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: OOD performance under different severity levels. X-axis: number of pruned LFCs. Y-axis: [PITH_FULL_IMAGE:figures/full_fig_p017_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: OOD Results for ResNet50 [PITH_FULL_IMAGE:figures/full_fig_p018_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: OOD Results for MobileNet-V3. H Impact of Model Size and Architecture on OOD Results [PITH_FULL_IMAGE:figures/full_fig_p018_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: OOD Results for ViT-Small. 0.7 0.8 0.9 0 4 8 12 16 20 24 28 32 336 ID/OOD Accuracy # Pruned LF Diagonals OOD tests ViT-Tiny SP-Colorectal TextileNet GTOS 0 0.5 1 ID OOD ID OOD ID OOD ID OOD SP-Col TxNet GTOS C10 ID/OOD Accuracy 0 0.5 1 ID OOD ID OOD ID OOD ID OOD SP-Col TxNet GTOS C10 ID/OOD Accuracy 0 0.5 1 ID OOD ID OOD ID OOD ID OOD SP-Col TxNet GTOS C10 ID/OOD Accuracy 0 0.2 0.4 0.6 0.8 1 ID/OOD Accur… view at source ↗
Figure 25
Figure 25. Figure 25: OOD Results for ViT-Tiny. the PCP pipeline largely recovers from the corruptions, closely approximating the ID accuracy. This is because applying corruption after pruning reduces the impact of corruption, as corruption uses some of the pruned components. As a result, the final OOD accuracy is higher for PCP than for CP. For elastic corruption, however, CP achieves higher accuracy than PCP. This is because… view at source ↗
Figure 26
Figure 26. Figure 26: OOD corruption pipelines. PCP: prune-corrupt-prune. CP: corrupt-prune. [PITH_FULL_IMAGE:figures/full_fig_p020_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: CIFAR-10’s OOD performance closely approximates its ID performance. It suffers an [PITH_FULL_IMAGE:figures/full_fig_p020_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Frequency characteristics of the corruptions we use, across the four tasks we analyze. [PITH_FULL_IMAGE:figures/full_fig_p021_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Impact of pruning HFCs on OOD performance (red line). We present pruning HFCs, along [PITH_FULL_IMAGE:figures/full_fig_p024_29.png] view at source ↗
read the original abstract

Neural networks suffer from shortcut learning, where learned features generalize well to the training set but not to in-distribution (ID) or out-of-distribution (OOD) test sets. Existing studies are all based on a few standard benchmarks, which are shape-driven. Numerous application domains, however, are texture-driven. In this work, we present shortcut learning analysis for texture-driven domains, and compare it with that of a standard benchmark. We show that texture-driven domains suffer from low-frequency shortcuts. They make the majority of their decisions based on a few low-frequency components (LFCs) with a skewed spectral behavior, despite that their classification information is in higher-frequency, fine-grained details. Pruning LFCs from training and test sets eliminates the shortcut and provides a more balanced spectral behavior, improving the ID accuracy by up to 8%. We show that low-frequency shortcuts make the models highly vulnerable to OOD corruptions, leading up to 70% accuracy drop compared to the ID accuracy. Pruning LFCs significantly improves robustness to low-frequency corruptions, by up to 40%, and introduces a trade-off for high-frequency corruptions; the balanced spectral behavior provides a better generalization performance, whereas the increased dependence on high-frequency features reduces it. OOD accuracy depends on the interaction between these two factors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes shortcut learning in texture-driven visual classification tasks, contrasting it with shape-driven benchmarks. It argues that texture-driven models rely on a few low-frequency components (LFCs) as shortcuts, even though discriminative information is in higher frequencies. By pruning LFCs from both training and test sets, the authors report improved in-distribution (ID) accuracy (up to 8%) and robustness to low-frequency corruptions (up to 40%), while noting a trade-off with high-frequency corruptions due to increased high-frequency dependence.

Significance. If the pruning genuinely isolates shortcuts without removing task-relevant signal, this would extend shortcut analysis beyond standard benchmarks to texture-driven domains common in applications, providing a concrete spectral intervention that improves both ID performance and low-frequency robustness. The empirical measurement of spectral bias and the reported OOD trade-off offer falsifiable predictions for follow-up work in domain-specific robustness.

major comments (2)
  1. [Abstract] Abstract: the reported gains of up to 8% ID accuracy and 40% robustness are presented without dataset details, spectral analysis method, or controls verifying that pruned images preserve class identity (e.g., human labeling accuracy or reconstruction error); this directly affects whether the gains demonstrate shortcut elimination or result from data modification.
  2. [Abstract] Abstract: the central claim that classification information resides in higher-frequency components (and that LFC pruning removes only the shortcut) rests on the accuracy improvements after pruning; without independent verification that higher frequencies alone suffice, the observed gains risk circularity with the pruning operation itself altering image statistics.
minor comments (2)
  1. [Abstract] Abstract: the statement that 'numerous application domains... are texture-driven' is not accompanied by concrete examples or citations to such domains.
  2. [Abstract] Abstract: the comparison to 'a standard benchmark' does not specify which benchmark or how the texture-driven datasets were chosen.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript analyzing low-frequency shortcuts in texture-driven visual domains. We address each major comment point by point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported gains of up to 8% ID accuracy and 40% robustness are presented without dataset details, spectral analysis method, or controls verifying that pruned images preserve class identity (e.g., human labeling accuracy or reconstruction error); this directly affects whether the gains demonstrate shortcut elimination or result from data modification.

    Authors: We agree that the abstract's brevity omits key contextual details that would strengthen interpretability. The full manuscript specifies the texture-driven datasets analyzed, describes the Fourier-domain pruning procedure for isolating LFCs, and reports reconstruction-based metrics confirming that class identity is retained post-pruning. In the revised manuscript we will expand the abstract to include concise references to the datasets, the spectral pruning method, and the identity-preservation controls. revision: yes

  2. Referee: [Abstract] Abstract: the central claim that classification information resides in higher-frequency components (and that LFC pruning removes only the shortcut) rests on the accuracy improvements after pruning; without independent verification that higher frequencies alone suffice, the observed gains risk circularity with the pruning operation itself altering image statistics.

    Authors: The primary evidence for the claim is the post-pruning accuracy improvement together with the measured shift to balanced spectral usage. The manuscript additionally quantifies the original models' spectral bias toward LFCs and documents the resulting robustness trade-off with high-frequency corruptions, which supplies corroborating (non-circular) support for increased high-frequency reliance. We will add an explicit discussion paragraph in the revision to separate the pruning-based evidence from the supporting spectral-bias and trade-off analyses, thereby reducing any appearance of circularity. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical measurement study

full rationale

The paper conducts an empirical analysis of shortcut learning by training models on texture-driven domains, observing reliance on low-frequency components via spectral analysis, and measuring accuracy/robustness changes after pruning those components from train and test sets. No equations, parameter fits, or derivations are present that reduce the reported gains (e.g., up to 8% ID accuracy) to quantities defined by the same data or self-citations. The pruning and accuracy measurements are independent experimental outcomes, not forced by construction. Self-contained against external benchmarks with no load-bearing self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that texture classification information is concentrated in high-frequency components and that the observed spectral skew is caused by shortcut learning rather than dataset statistics.

axioms (1)
  • domain assumption Classification information in texture-driven domains resides primarily in higher-frequency fine-grained details rather than low-frequency components.
    Stated directly in the abstract as the contrast to the observed shortcut behavior.

pith-pipeline@v0.9.1-grok · 5770 in / 1256 out tokens · 20240 ms · 2026-06-28T10:28:00.223420+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

96 extracted references · 1 canonical work pages

  1. [1]

    https://github.com/phelber/EuroSAT, 2019

    EuroSAT GitHub Repo. https://github.com/phelber/EuroSAT, 2019

  2. [2]

    https://github.com/mwalmsley/galaxy_mnist, 2022

    Galaxy MNIST GitHub Repo. https://github.com/mwalmsley/galaxy_mnist, 2022

  3. [3]

    DINOv2: Learning Robust Visual Features without Supervision, 2023

  4. [4]

    https://github.com/openai/CLIP, 2025

    CLIP GitHub Repo. https://github.com/openai/CLIP, 2025

  5. [5]

    https://github.com/facebookresearch/dinov2, 2025

    DinoV2 GitHub Repo. https://github.com/facebookresearch/dinov2, 2025

  6. [6]

    https://github.com/mwalmsley/galaxy-datasets, 2025

    Galaxy Zoo. https://github.com/mwalmsley/galaxy-datasets, 2025

  7. [7]

    Abello, Roberto Hirata, and Zhangyang Wang

    Antonio A. Abello, Roberto Hirata, and Zhangyang Wang. Dissecting the High-Frequency Bias in Convolutional Neural Networks. InCVPRW, pages 863–871, 2021

  8. [8]

    Ahmed, T

    N. Ahmed, T. Natarajan, and K.R. Rao. Discrete Cosine Transform.IEEE Transactions on Computers, C-23(1):90–93, 1974

  9. [9]

    Improving Vision Transformers by Revisiting High-Frequency Components

    Jiawang Bai, Li Yuan, Shu-Tao Xia, Shuicheng Yan, Zhifeng Li, and Wei Liu. Improving Vision Transformers by Revisiting High-Frequency Components. InECCV, page 1–18, 2022

  10. [10]

    Nicholas Baker, Hongjing Lu, Gennady Erlikhman, and Philip J. Kellman. Deep Convolutional Networks do not Classify based on Global Object Shape.PLOS Computational Biology, 14 (12):1–43, 2018

  11. [11]

    DeepSat: A Learning Framework for Satellite Imagery

    Saikat Basu, Sangram Ganguly, Supratik Mukhopadhyay, Robert DiBiano, Manohar Karki, and Ramakrishna Nemani. DeepSat: A Learning Framework for Satellite Imagery. InSIGSPATIAL, 2015

  12. [12]

    Network Dissection: Quantifying Interpretability of Deep Visual Representations

    David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. Network Dissection: Quantifying Interpretability of Deep Visual Representations. InCVPR, 2017

  13. [13]

    Recognition in Terra Incognita

    Sara Beery, Grant Van Horn, and Pietro Perona. Recognition in Terra Incognita. InECCV, 2018

  14. [14]

    Tsaftaris, and Sonia Dahdouh

    Christopher Boland, Keith A Goatman, Sotirios A. Tsaftaris, and Sonia Dahdouh. There Are No Shortcuts to Anywhere Worth Going: Identifying Shortcuts in Deep Learning Models for Medical Image Analysis. InInternational Conference on Medical Imaging with Deep Learning, volume 250, pages 131–150, 2024

  15. [15]

    ImageNet-trained CNNs are not Biased Towards Texture: Revisiting Feature Reliance Through Controlled Suppression

    Tom Burgert, Oliver Stoll, Paolo Rota, and Begüm Demir. ImageNet-trained CNNs are not Biased Towards Texture: Revisiting Feature Reliance Through Controlled Suppression. In NeurIPS, 2025

  16. [16]

    Towards Understanding the Spectral Bias of Deep Learning

    Yuan Cao, Zhiying Fang, Yue Wu, Ding-Xuan Zhou, and Quanquan Gu. Towards Understanding the Spectral Bias of Deep Learning. InIJCAI, pages 2205–2211, 8 2021

  17. [17]

    Enhancing Neural Network Interpretability Through Conductance-Based Information Plane Analysis, 2024

    Jaouad Dabounou and Amine Baazzouz. Enhancing Neural Network Interpretability Through Conductance-Based Information Plane Analysis, 2024

  18. [18]

    Roads, Xiaoliang Luo, Daniel N

    Nikolay Dagaev, Brett D. Roads, Xiaoliang Luo, Daniel N. Barry, Kaustubh R. Patil, and Bradley C. Love. A Too-Good-to-Be-True Prior to Reduce Shortcut Reliance.Pattern Recogni- tion Letters, 166:164–171, 2023

  19. [19]

    Le, and Mingxing Tan

    Zihang Dai, Hanxiao Liu, Quoc V . Le, and Mingxing Tan. CoAtNet: Marrying Convolution and Attention for All Data Sizes. InNeurIPS, 2021

  20. [20]

    ImageNet: A Large-scale Hierarchical Image Database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A Large-scale Hierarchical Image Database. InCVPR, pages 248–255, 2009

  21. [21]

    GalaxiesML: A Dataset of Galaxy Images, Photometry, Redshifts, and Structural Parameters for Machine Learning

    Tuan Do, Bernie Boscoe, Evan Jones, Yun Qi Li, and Kevin Alfaro. GalaxiesML: A Dataset of Galaxy Images, Photometry, Redshifts, and Structural Parameters for Machine Learning. 2024. 25

  22. [22]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. InICLR, 2021

  23. [23]

    FreeGaze: Resource-efficient Gaze Estimation via Frequency Domain Contrastive Learning.CoRR, abs/2209.06692, 2022

    Lingyu Du and Guohao Lan. FreeGaze: Resource-efficient Gaze Estimation via Frequency Domain Contrastive Learning.CoRR, abs/2209.06692, 2022

  24. [24]

    Band- limited Training and Inference for Convolutional Neural Networks

    Adam Dziedzic, John Paparrizos, Sanjay Krishnan, Aaron Elmore, and Michael Franklin. Band- limited Training and Inference for Convolutional Neural Networks. InICML, pages 1745–1754, 2019

  25. [25]

    Using Compression to Speed Up Image Classifica- tion in Artificial Neural Networks

    Dan Fu and Gabriel Guimaraes. Using Compression to Speed Up Image Classifica- tion in Artificial Neural Networks. 2016. URL https://www.danfu.org/files/ CompressionImageClassification.pdf

  26. [26]

    Can Biases in ImageNet Models Explain Generalization? In CVPR, pages 22184–22194, 2024

    Paul Gavrikov and Janis Keuper. Can Biases in ImageNet Models Explain Generalization? In CVPR, pages 22184–22194, 2024

  27. [27]

    Wichmann, and Wieland Brendel

    Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. ImageNet-trained CNNs are Biased Towards Texture; Increasing Shape Bias Improves Accuracy and Robustness. InICLR, 2019

  28. [28]

    Wichmann

    Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, and Felix A. Wichmann. Shortcut Learning in Deep Neural Networks.Nature Machine Intelligence, 2:665–673, 2020

  29. [29]

    GA-Nav: Efficient Terrain Segmentation for Robot Navi- gation in Unstructured Outdoor Environments.IEEE Robotics and Automation Letters, 7(3): 8138–8145, 2022

    Tianrui Guan, Divya Kothandaraman, Rohan Chandra, Adarsh Jagan Sathyamoorthy, Kasun Weerakoon, and Dinesh Manocha. GA-Nav: Efficient Terrain Segmentation for Robot Navi- gation in Unstructured Outdoor Environments.IEEE Robotics and Automation Letters, 7(3): 8138–8145, 2022

  30. [30]

    Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, and Larry S. Davis. VITON: An Image-Based Virtual Try-On Network. InCVPR, pages 7543–7552, 2018

  31. [31]

    Deep Residual Learning for Image Recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. InCVPR, pages 770–778, 2016

  32. [32]

    Introducing eurosat: A novel dataset and deep learning benchmark for land use and land cover classification

    Patrick Helber, Benjamin Bischke, Andreas Dengel, and Damian Borth. Introducing eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. In IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, pages 204–207. IEEE, 2018

  33. [33]

    Patrick Helber, Benjamin Bischke, Andreas Dengel, and Damian Borth. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019

  34. [34]

    Dietterich

    Dan Hendrycks and Thomas G. Dietterich. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. InICLR, 2019

  35. [35]

    SPIDER-colorectal dataset

    HistAI. SPIDER-colorectal dataset. https://huggingface.co/datasets/histai/ SPIDER-colorectal, 2025. Accessed: 2026-01-07

  36. [36]

    Inflammation

    HMB302. Inflammation. https://hmb302.ca/chapters/inflammation/, 2023. Online histology and pathology educational resource. Accessed: 2026-01-07

  37. [37]

    Le, Mark Sandler, Bo Chen, Wei- jun Wang, Liang-Chieh Chen, Mingxing Tan, Grace Chu, Vijay Vasudevan, and Yukun Zhu

    Andrew Howard, Ruoming Pang, Hartwig Adam, Quoc V . Le, Mark Sandler, Bo Chen, Wei- jun Wang, Liang-Chieh Chen, Mingxing Tan, Grace Chu, Vijay Vasudevan, and Yukun Zhu. Searching for MobileNetV3. InICCV, pages 1314–1324, 2019

  38. [38]

    Measuring the Tendency of CNNs to Learn Surface Statistical Regularities, 2017

    Jason Jo and Yoshua Bengio. Measuring the Tendency of CNNs to Learn Surface Statistical Regularities, 2017

  39. [39]

    Learning Multiple Layers of Features from Tiny Images

    Alex Krizhevsky. Learning Multiple Layers of Features from Tiny Images. Technical report, 2009. 26

  40. [40]

    Sustainable Clothing Design: Use Matters.Journal of Design Research, 10(1–2):121–139, 2012

    Kirsi Laitala and Casper Boks. Sustainable Clothing Design: Use Matters.Journal of Design Research, 10(1–2):121–139, 2012

  41. [41]

    Unmasking Clever Hans Predictors and Assessing What Machines Really Learn.Nature Communications, 10(1), 2019

    Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans Predictors and Assessing What Machines Really Learn.Nature Communications, 10(1), 2019

  42. [42]

    Investigating and Explaining the Frequency Bias in Image Classification

    Zhiyu Lin, Yifei Gao, and Jitao Sang. Investigating and Explaining the Frequency Bias in Image Classification. InIJCAI, pages 717–723, 2022

  43. [43]

    Exploring Semantic Segmentation on the DCT Repre- sentation

    Shao-Yuan Lo and Hsueh-Ming Hang. Exploring Semantic Segmentation on the DCT Repre- sentation. In1st ACM International Conference on Multimedia in Asia (MMASIA), pages 1–6, 2019

  44. [44]

    Automatic Shortcut Removal for Self-supervised Representation Learning

    Matthias Minderer, Olivier Bachem, Neil Houlsby, and Michael Tschannen. Automatic Shortcut Removal for Self-supervised Representation Learning. InICML, 2020

  45. [45]

    Woodhead Publishing, 2018

    Subramanian Senthilkannan Muthu.Circular Economy in Textiles and Apparel: Processing, Manufacturing, and Design. Woodhead Publishing, 2018

  46. [46]

    Uncovering and Correct- ing Shortcut Learning in Machine Learning Models for Skin Cancer Diagnosis.Diagnostics, 12 (1), 2022

    Meike Nauta, Robert Walsh, Andrew Dubowski, and Christin Seifert. Uncovering and Correct- ing Shortcut Learning in Machine Learning Models for Skin Cancer Diagnosis.Diagnostics, 12 (1), 2022

  47. [47]

    SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models, 2025

    Dmitry Nechaev, Alexey Pchelnikov, and Ekaterina Ivanova. SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models, 2025

  48. [48]

    Roadblocks for Temporarily Disabling Shortcuts and Learning New Knowledge

    Hongjing Niu, Hanting Li, Feng Zhao, and Bin Li. Roadblocks for Temporarily Disabling Shortcuts and Learning New Knowledge. InNeurIPS, pages 29064–29075, 2022

  49. [49]

    Fast Vision Transformers with HiLo Attention

    Zizheng Pan, Jianfei Cai, and Bohan Zhuang. Fast Vision Transformers with HiLo Attention. InNeurIPS, pages 14541–14554, 2022

  50. [50]

    Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning, 2018

    Nicolas Papernot and Patrick McDaniel. Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning, 2018

  51. [51]

    How Do Vision Transformers Work? InICLR, 2022

    Namuk Park and Songkuk Kim. How Do Vision Transformers Work? InICLR, 2022

  52. [52]

    Gradient Starvation: A Learning Proclivity in Neural Networks

    Mohammad Pezeshki, Oumar Kaba, Yoshua Bengio, Aaron C Courville, Doina Precup, and Guillaume Lajoie. Gradient Starvation: A Learning Proclivity in Neural Networks. InNeurIPS, volume 34, pages 1256–1272, 2021

  53. [53]

    URL https://docs.pytorch

    PyTorch.PyTorch — ResNet-50 Model Documentation, 2025. URL https://docs.pytorch. org/vision/main/models/generated/torchvision.models.resnet50.html. Ac- cessed: 2026-01-07

  54. [54]

    Learning Transferable Visual Models From Natural Language Supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning Transferable Visual Models From Natural Language Supervision. InICML, pages 8748–8763, 2021

  55. [55]

    On the Spectral Bias of Neural Networks

    Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio, and Aaron Courville. On the Spectral Bias of Neural Networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,PMLR, volume 97, pages 5301–5310, 2019

  56. [56]

    Ramaswamy, Sunnie S

    Vikram V . Ramaswamy, Sunnie S. Y . Kim, Ruth Fong, and Olga Russakovsky. Overlooked Factors in Concept-Based Explanations: Dataset Choice, Concept Learnability, and Human Capability. InCVPR, pages 10932–10941, 2023

  57. [57]

    Global Filter Networks for Image Classification

    Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, and Jie Zhou. Global Filter Networks for Image Classification. InNeurIPS, pages 980–993, 2021

  58. [58]

    The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

    Basri Ronen, David Jacobs, Yoni Kasten, and Shira Kritchman. The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies. InNeurIPS, volume 32, 2019. 27

  59. [59]

    The Good, The Bad, and The Ugly: Neural Networks Straight From JPEG

    Samuel Felipe dos Santos, Nicu Sebe, and Jurandy Almeida. The Good, The Bad, and The Ugly: Neural Networks Straight From JPEG. In27th IEEE International Conference on Image Processing (ICIP), pages 1896–1900, 2020

  60. [60]

    The Pitfalls of Simplicity Bias in Neural Networks

    Harshay Shah, Kaustav Tamuly, Aditi Raghunathan, Prateek Jain, and Praneeth Netrapalli. The Pitfalls of Simplicity Bias in Neural Networks. InNeurIPS, 2020

  61. [61]

    Road Recognition for Autonomous Vehicles Based on Intelligent Tire and SE-CNN

    Runwu Shi, Shichun Yang, Yuyi Chen, Rui Wang, Jiayi Lu, Zhaowen Pang, and Yaoguang Cao. Road Recognition for Autonomous Vehicles Based on Intelligent Tire and SE-CNN. In Intelligent Systems and Pattern Recognition, volume 1589, pages 291–305. 2022

  62. [62]

    TextileNet: Material taxonomy-based fashion textile dataset

    Shu Zhong. TextileNet: Material taxonomy-based fashion textile dataset. https://github. com/hahashu/TextileNet, 2023. Accessed: 2026-01-07

  63. [63]

    The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format.Proc

    Utku Sirin and Stratos Idreos. The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format.Proc. ACM Manag. Data, 2(1), 2024

  64. [64]

    Frequency-Store: Scaling Image AI by A Column-Store for Images

    Utku Sirin, Victoria Kauffman, Aadit Saluja, Florian Klein, Jeremy Hsu, and Stratos Idreos. Frequency-Store: Scaling Image AI by A Column-Store for Images. InCIDR, 2025

  65. [65]

    Srinidhi, Ozan Ciga, and Anne L

    Chetan L. Srinidhi, Ozan Ciga, and Anne L. Martel. Deep neural network models for computa- tional histopathology: A survey.Medical Image Analysis, 67, 2021

  66. [66]

    Majaj, and Denis G

    Ajay Subramanian, Elena Sizikova, Najib J. Majaj, and Denis G. Pelli. Spatial-frequency Channels, Shape Bias, and Adversarial Robustness. InNeurIPS, 2023

  67. [67]

    Neural Redshift: Random Networks Are Not Random Functions

    Damien Teney, Armand Mihai Nicolicioiu, Valentin Hartmann, and Ehsan Abbasnejad. Neural Redshift: Random Networks Are Not Random Functions. InCVPR, pages 4786–4796, 2024

  68. [68]

    Training Data-Efficient Image Transformers & Distillation Through Attention

    Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herve Jegou. Training Data-Efficient Image Transformers & Distillation Through Attention. In ICML, pages 10347–10357, 2021

  69. [69]

    MaxViT: Multi-axis Vision Transformer

    Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, and Yinxiao Li. MaxViT: Multi-axis Vision Transformer. InECCV, page 459–479, 2022

  70. [70]

    Griffiths

    Shikhar Tuli, Ishita Dasgupta, Erin Grant, and Thomas L. Griffiths. Are Convolutional Neural Networks or Transformers More Like Human Vision? InProceedings of the 43rd Annual Meeting of the Cognitive Science Society, pages 1844–1850, 2021

  71. [71]

    Interpretable Neural Network Classification Model Using First-order Logic Rules.Neurocomputing, 614(1):128–840, 2025

    Haiming Tuo, Zuqiang Meng, Zihao Shi, and Daosheng Zhang. Interpretable Neural Network Classification Model Using First-order Logic Rules.Neurocomputing, 614(1):128–840, 2025

  72. [72]

    E-commerce Worldwide—Statistics & Facts

    Koen van Gelder. E-commerce Worldwide—Statistics & Facts. https://www.statista. com/topics/871/online-shopping/, 2025. Accessed: 2026-01-07

  73. [73]

    Mike Walmsley, Chris Lintott, Tobias Géron, Sandor Kruk, Coleman Krawczyk, Kyle W Willett, Steven Bamford, Lee S Kelvin, Lucy Fortson, Yarin Gal, William Keel, Karen L Masters, Vihang Mehta, Brooke D Simmons, Rebecca Smethurst, Lewis Smith, Elisabeth M Baeten, and Christine Macmillan. Galaxy Zoo DECaLS: Detailed Visual Morphology Measurements from V olunt...

  74. [74]

    Lipton, and Eric P

    Haohan Wang, Songwei Ge, Zachary C. Lipton, and Eric P. Xing. Learning Robust Global Representations by Penalizing Local Predictive Power. InNeurIPS, pages 10506–10518, 2019

  75. [75]

    Haohan Wang, Xindi Wu, Zeyi Huang, and Eric P. Xing. High-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks. InCVPR, pages 8681–8691, 2020

  76. [76]

    Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice

    Peihao Wang, Wenqing Zheng, Tianlong Chen, and Zhangyang Wang. Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice. InICLR, 2022. 28

  77. [77]

    What Do Neural Networks Learn in Image Classification? A Frequency Shortcut Perspective

    Shunxin Wang, Raymond Veldhuis, Christoph Brune, and Nicola Strisciuglio. What Do Neural Networks Learn in Image Classification? A Frequency Shortcut Perspective. InICCV, pages 1433–1442, 2023

  78. [78]

    A Survey on the Robustness of Computer Vision Models against Common Corruptions, 2024

    Shunxin Wang, Raymond Veldhuis, Christoph Brune, and Nicola Strisciuglio. A Survey on the Robustness of Computer Vision Models against Common Corruptions, 2024

  79. [79]

    Do ImageNet-trained Models Learn Shortcuts? The Impact of Frequency Shortcuts on Generalization

    Shunxin Wang, Raymond Veldhuis, and Nicola Strisciuglio. Do ImageNet-trained Models Learn Shortcuts? The Impact of Frequency Shortcuts on Generalization. InCVPR, pages 25198–25207, 2025

  80. [80]

    VTC-LFC: Vision Transformer Compression with Low-Frequency Components

    Zhenyu Wang, Hao Luo, Pichao W ANG, Feng Ding, Fan Wang, and Hao Li. VTC-LFC: Vision Transformer Compression with Low-Frequency Components. InNeurIPS, pages 13974–13988, 2022

Showing first 80 references.