pith. machine review for the scientific record. sign in

arxiv: 2605.04443 · v1 · submitted 2026-05-06 · 🧬 q-bio.NC · cs.AI

Recognition: unknown

Dissociating spatial frequency reliance from adversarial robustness advantages in neurally guided deep convolutional neural networks

Chengxiao Wang, Diane M. Beck, Leyla Isik, Tianyu Ren, Zhenan Shao

Pith reviewed 2026-05-08 16:46 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.AI
keywords adversarial robustnessspatial frequency biasneural alignmentdeep convolutional neural networkshuman visual cortexventral visual streamobject recognition
0
0 comments X

The pith

Aligning deep networks with human brain responses improves adversarial robustness without this advantage stemming mainly from changes in spatial frequency reliance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Deep convolutional neural networks become more robust to adversarial attacks when aligned with human visual cortex activity. One hypothesis held that this occurs because alignment biases networks toward low spatial frequencies and away from brittle high-frequency details. The paper tests this by directly forcing networks to rely on low spatial frequencies or the mid-frequency band most used by humans for object recognition. These direct biases produce only modest robustness gains at best, can impair performance, and fail to increase similarity to human neural geometry, unlike neural alignment. The findings indicate that frequency profile changes emerge alongside human-like representations but do not drive the robustness benefit.

Core claim

Neural alignment to higher-order regions of the human ventral visual stream systematically increases reliance on both low spatial frequencies and the human mid-frequency channel. However, directly biasing DCNNs toward these bands does not replicate the adversarial robustness gains from alignment: human-channel bias impairs robustness, low-spatial-frequency bias yields only modest gains despite larger frequency shifts, and frequency-biased models show little increase in similarity to human representational geometry. Thus altered spatial-frequency reliance is an emergent property of learning more human-like representations rather than the primary mechanism behind neural alignment's robustness.

What carries the argument

Dissociation between effects of neural alignment and direct spatial-frequency biasing interventions, measured via adversarial robustness and similarity to human neural representational geometry.

If this is right

  • Adversarial robustness conferred by neural alignment depends on representational properties other than spatial frequency content.
  • Direct low-spatial-frequency biasing provides only modest robustness benefits and is less efficient than alignment.
  • Human-channel biasing does not improve and can reduce robustness.
  • Frequency-biased models remain dissimilar to human neural geometry in ways that aligned models are not.
  • Future robustness research should examine other aspects of human-like representations beyond frequency profiles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dissociation suggests brain alignment may capture higher-level invariances or semantic structure that frequency content alone does not provide.
  • Testing alignment to early visual areas, which are more frequency-selective, could reveal whether robustness patterns differ by brain region.
  • Design of robust AI systems may benefit from broader matching to human visual computations rather than targeted frequency tuning.

Load-bearing premise

Direct spatial-frequency biasing interventions produce shifts in frequency reliance comparable in magnitude and specificity to those induced by neural alignment, without introducing unrelated side effects.

What would settle it

Finding that direct biasing to match the spatial-frequency profile of a neurally aligned model produces equivalent adversarial robustness gains would falsify the claim that frequency reliance is not the primary mechanism.

Figures

Figures reproduced from arXiv: 2605.04443 by Chengxiao Wang, Diane M. Beck, Leyla Isik, Tianyu Ren, Zhenan Shao.

Figure 3
Figure 3. Figure 3: Adversarial robustness and spatial frequency reliance profiles of DCNNs biased towards the low spatial frequency (LSF) range or the human channel using selective phase scrambling. a. Visualization of the SF masks and example phase-scrambled images used to a. c. Human channel Extreme LSF Fixed-! Blur Mixed-! Blur Baseline Fixed-! LSF (! = 1.5) Mixed-! LSF (! = 1 … 8) … Human-channel Extreme LSF … … b. d view at source ↗
Figure 4
Figure 4. Figure 4: Spatial frequency reliance and adversarial robustness of models jointly biased view at source ↗
read the original abstract

Deep convolutional neural networks (DCNNs) have rivaled humans on many visual tasks, yet they remain vulnerable to near-imperceptible perturbations generated by adversarial attacks. Recent work shows that aligning DCNN representations with human visual cortex activity improves adversarial robustness, but the mechanisms driving this advantage are unclear. One hypothesis suggests that neural alignment confers robustness by biasing models away from brittle high-frequency details and towards the low spatial frequencies (LSF). However, recent work shows that human object recognition critically depends on a narrow, mid-frequency "human channel". Interestingly, this band was partially preserved in prior LSF-focused studies. Here, we investigate whether a spectral bias towards the LSF or the human channel is the primary driver of the adversarial robustness observed in neurally aligned DCNNs. We first show that DCNNs aligned to higher-order regions of the human ventral visual stream systematically increase reliance on both LSF and the human channel. However, directly steering DCNNs towards these bands revealed a clear dissociation. Biasing models towards the human channel, either alone or together with LSF, does not improve robustness and even impairs it. LSF bias produced some robustness gains, but such improvements are modest despite inducing much larger shifts in spatial-frequency reliance than neurally aligned models. Spatial-frequency-biased models overall show little, if any, increase in similarity to human neural representational geometry. Together, our results suggest that altered spatial-frequency reliance is likely an emergent property of learning more human-like representations rather than the primary mechanism by which neural alignment confers adversarial robustness, and motivate the need for future research examining representational properties beyond spatial-frequency profiles.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that neurally aligned DCNNs increase reliance on both low spatial frequencies (LSF) and the human mid-frequency channel, yet directly biasing models toward LSF, the human channel, or both fails to produce comparable adversarial robustness gains. LSF biasing yields only modest robustness improvements despite inducing larger frequency shifts than alignment, and frequency-biased models show little increase in human-like representational geometry. The authors conclude that altered spatial-frequency reliance is an emergent byproduct of human-like representations rather than the primary mechanism driving robustness advantages from neural alignment.

Significance. If the dissociation is robust, the work provides a valuable empirical test that narrows the mechanistic account of neural alignment benefits, shifting attention to other representational properties such as geometry or invariance structure. The intervention-based design is a methodological strength that allows direct falsification of the frequency-bias hypothesis.

major comments (2)
  1. [Results (direct spatial-frequency biasing experiments)] The central dissociation rests on the assumption that direct frequency-biasing interventions induce shifts in spatial-frequency reliance that are at least as large and specific as those from ventral-stream alignment, without unrelated side effects on training dynamics or non-frequency representational features. The abstract reports larger shifts under biasing yet only modest robustness gains, but without quantitative comparison of shift magnitudes (e.g., reliance metrics or effect sizes) between conditions, it is impossible to confirm the interventions are commensurate.
  2. [Methods (implementation of biasing interventions)] The claim that frequency-biased models exhibit little increase in similarity to human neural geometry is load-bearing for ruling out frequency reliance as causal. This requires explicit controls showing that biasing does not alter other properties (e.g., overall feature selectivity or loss landscape) in ways that alignment does not; absent such controls, the lack of robustness gains could reflect side effects rather than a clean test of frequency reliance.
minor comments (2)
  1. [Abstract] The abstract references prior LSF-focused studies in which the human channel was 'partially preserved' but does not provide the citation; add the specific reference.
  2. [Methods] Clarify the exact frequency band used for the 'human channel' (e.g., center frequency and bandwidth) and how it was operationalized in the biasing procedure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which help clarify the interpretation of our dissociation between spatial-frequency biases and adversarial robustness. We address each major comment below and have revised the manuscript to incorporate quantitative comparisons and additional controls as requested.

read point-by-point responses
  1. Referee: [Results (direct spatial-frequency biasing experiments)] The central dissociation rests on the assumption that direct frequency-biasing interventions induce shifts in spatial-frequency reliance that are at least as large and specific as those from ventral-stream alignment, without unrelated side effects on training dynamics or non-frequency representational features. The abstract reports larger shifts under biasing yet only modest robustness gains, but without quantitative comparison of shift magnitudes (e.g., reliance metrics or effect sizes) between conditions, it is impossible to confirm the interventions are commensurate.

    Authors: We agree that explicit quantitative comparisons of shift magnitudes are necessary to substantiate the claim that biasing interventions exceed the frequency shifts from neural alignment. In the revised manuscript, we have added a supplementary table reporting Cohen's d effect sizes and pairwise statistical comparisons (t-tests with correction) for LSF and human-channel reliance metrics across neurally aligned, LSF-biased, human-channel-biased, and combined conditions. These confirm that direct biasing produces shifts 1.8–2.7 times larger than alignment (all p < 0.01). Training dynamics were controlled by using identical optimizers, learning-rate schedules, and data augmentations; we now report that validation loss curves and final accuracies were statistically indistinguishable across conditions, reducing the likelihood of unrelated side effects. revision: yes

  2. Referee: [Methods (implementation of biasing interventions)] The claim that frequency-biased models exhibit little increase in similarity to human neural geometry is load-bearing for ruling out frequency reliance as causal. This requires explicit controls showing that biasing does not alter other properties (e.g., overall feature selectivity or loss landscape) in ways that alignment does not; absent such controls, the lack of robustness gains could reflect side effects rather than a clean test of frequency reliance.

    Authors: We acknowledge the importance of ruling out confounding changes in non-frequency properties. The revised methods section now includes explicit controls: (1) feature selectivity was quantified via mean activation histograms and orientation/spatial-frequency tuning widths, showing no systematic broadening or narrowing beyond the targeted frequency manipulation; (2) loss-landscape geometry was assessed via Hessian trace approximations and sharpness metrics at convergence, which did not differ significantly from alignment models after accounting for the frequency bias itself. These controls support that the absence of robustness gains and human-like geometry improvements in biased models is attributable to the frequency manipulation rather than extraneous alterations. We have also clarified in the discussion that while perfect isolation of all possible side effects is inherently difficult, the pattern of results (larger frequency shifts without robustness or geometry benefits) remains inconsistent with frequency reliance as the primary mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dissociation from controlled interventions

full rationale

The paper reports an intervention study in which DCNNs are first aligned to ventral-stream fMRI data and then separately biased toward LSF or human-channel frequencies via direct training manipulations. The dissociation claim (frequency bias is emergent rather than causal for robustness) follows directly from comparing the resulting robustness gains, frequency-reliance shifts, and representational geometry metrics across conditions. No equations, fitted parameters, or self-citations are invoked to derive the central result; the outcome is measured, not constructed by re-labeling inputs. Prior definitions of alignment and the human channel are used only as experimental targets, not as load-bearing premises that reduce the dissociation to a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The work rests on standard assumptions from prior neural-alignment literature and the definition of the human channel; no new free parameters or invented entities are introduced in the reported experiments.

axioms (2)
  • domain assumption DCNNs can be aligned to human ventral visual stream activity patterns via regression or similarity objectives.
    Invoked when describing the neurally aligned models whose robustness is being explained.
  • standard math Standard adversarial attack methods (e.g., PGD) provide a valid measure of robustness.
    Used to quantify the robustness advantage under study.
invented entities (1)
  • human channel independent evidence
    purpose: Narrow mid-frequency band critical for human object recognition.
    Cited from recent human psychophysics work; treated as an established construct rather than newly invented.

pith-pipeline@v0.9.0 · 5612 in / 1359 out tokens · 53776 ms · 2026-05-08T16:46:45.681500+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

  1. [1]

    Probing Human Visual Robustness with Neurally-Guided Deep Neural Networks

    Shao Z, Ma L, Zhou Y, Zhang YJ, Koyejo S, Li B, et al. Probing Human Visual Robustness with Neurally-Guided Deep Neural Networks. arXiv; 2025. doi:10.48550/arXiv.2405.02564 13. Dapello J, Kar K, Schrimpf M, Geary R, Ferguson M, Cox DD, et al. Aligning model and macaque inferior temporal cortex representations improves model-to-human behavioral alignment a...

  2. [2]

    I Look in Your Eyes, Honey

    Morrison DJ, Schyns PG. Usage of spatial scales for the categorization of faces, objects, and scenes. Psychon Bull Rev. 2001;8: 454–469. doi:10.3758/BF03196180 23. Sekuler R, Blake R. Perception. Hauptbd. 1994. New York London: McGraw-Hill; 1994. 24. De Cesarei A, Loftus GR. Global and local vision in natural scene identification. Psychon Bull Rev. 2011;1...

  3. [3]

    ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness,

    Rosca M, Weber T, Gretton A, Mohamed S. A case for new neural network smoothness constraints. NeurIPS Workshops on ”I Can’t Believe It’s Not Better!”. 2020. pp. 21–32. Available: https://proceedings.mlr.press/v137/rosca20a.html 36. Gulcehre C, Moczulski M, Denil M, Bengio Y. Noisy Activation Functions. Proceedings of The 33rd International Conference on M...

  4. [4]

    =3/255 ℓ!-basedPGDattack,

    De Valois RL, De Valois KK. Spatial vision. Annu Rev Psychol. 1980;31: 309–341. doi:10.1146/annurev.ps.31.020180.001521 49. Fiorentini A, Pirchio M, Spinelli D. Electrophysiological evidence for spatial frequency selective mechanisms in adults and infants. Vision Res. 1983;23: 119–127. doi:10.1016/0042-6989(83)90134-7 50. Yin D, Lopes RG, Shlens J, Cubuk ...