arxiv: 2605.04443 · v1 · submitted 2026-05-06 · 🧬 q-bio.NC · cs.AI

Recognition: unknown

Dissociating spatial frequency reliance from adversarial robustness advantages in neurally guided deep convolutional neural networks

Chengxiao Wang, Diane M. Beck, Leyla Isik, Tianyu Ren, Zhenan Shao

Pith reviewed 2026-05-08 16:46 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.AI

keywords adversarial robustnessspatial frequency biasneural alignmentdeep convolutional neural networkshuman visual cortexventral visual streamobject recognition

0 comments

The pith

Aligning deep networks with human brain responses improves adversarial robustness without this advantage stemming mainly from changes in spatial frequency reliance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Deep convolutional neural networks become more robust to adversarial attacks when aligned with human visual cortex activity. One hypothesis held that this occurs because alignment biases networks toward low spatial frequencies and away from brittle high-frequency details. The paper tests this by directly forcing networks to rely on low spatial frequencies or the mid-frequency band most used by humans for object recognition. These direct biases produce only modest robustness gains at best, can impair performance, and fail to increase similarity to human neural geometry, unlike neural alignment. The findings indicate that frequency profile changes emerge alongside human-like representations but do not drive the robustness benefit.

Core claim

Neural alignment to higher-order regions of the human ventral visual stream systematically increases reliance on both low spatial frequencies and the human mid-frequency channel. However, directly biasing DCNNs toward these bands does not replicate the adversarial robustness gains from alignment: human-channel bias impairs robustness, low-spatial-frequency bias yields only modest gains despite larger frequency shifts, and frequency-biased models show little increase in similarity to human representational geometry. Thus altered spatial-frequency reliance is an emergent property of learning more human-like representations rather than the primary mechanism behind neural alignment's robustness.

What carries the argument

Dissociation between effects of neural alignment and direct spatial-frequency biasing interventions, measured via adversarial robustness and similarity to human neural representational geometry.

If this is right

Adversarial robustness conferred by neural alignment depends on representational properties other than spatial frequency content.
Direct low-spatial-frequency biasing provides only modest robustness benefits and is less efficient than alignment.
Human-channel biasing does not improve and can reduce robustness.
Frequency-biased models remain dissimilar to human neural geometry in ways that aligned models are not.
Future robustness research should examine other aspects of human-like representations beyond frequency profiles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dissociation suggests brain alignment may capture higher-level invariances or semantic structure that frequency content alone does not provide.
Testing alignment to early visual areas, which are more frequency-selective, could reveal whether robustness patterns differ by brain region.
Design of robust AI systems may benefit from broader matching to human visual computations rather than targeted frequency tuning.

Load-bearing premise

Direct spatial-frequency biasing interventions produce shifts in frequency reliance comparable in magnitude and specificity to those induced by neural alignment, without introducing unrelated side effects.

What would settle it

Finding that direct biasing to match the spatial-frequency profile of a neurally aligned model produces equivalent adversarial robustness gains would falsify the claim that frequency reliance is not the primary mechanism.

Figures

Figures reproduced from arXiv: 2605.04443 by Chengxiao Wang, Diane M. Beck, Leyla Isik, Tianyu Ren, Zhenan Shao.

**Figure 3.** Figure 3: Adversarial robustness and spatial frequency reliance profiles of DCNNs biased towards the low spatial frequency (LSF) range or the human channel using selective phase scrambling. a. Visualization of the SF masks and example phase-scrambled images used to a. c. Human channel Extreme LSF Fixed-! Blur Mixed-! Blur Baseline Fixed-! LSF (! = 1.5) Mixed-! LSF (! = 1 … 8) … Human-channel Extreme LSF … … b. d view at source ↗

**Figure 4.** Figure 4: Spatial frequency reliance and adversarial robustness of models jointly biased view at source ↗

read the original abstract

Deep convolutional neural networks (DCNNs) have rivaled humans on many visual tasks, yet they remain vulnerable to near-imperceptible perturbations generated by adversarial attacks. Recent work shows that aligning DCNN representations with human visual cortex activity improves adversarial robustness, but the mechanisms driving this advantage are unclear. One hypothesis suggests that neural alignment confers robustness by biasing models away from brittle high-frequency details and towards the low spatial frequencies (LSF). However, recent work shows that human object recognition critically depends on a narrow, mid-frequency "human channel". Interestingly, this band was partially preserved in prior LSF-focused studies. Here, we investigate whether a spectral bias towards the LSF or the human channel is the primary driver of the adversarial robustness observed in neurally aligned DCNNs. We first show that DCNNs aligned to higher-order regions of the human ventral visual stream systematically increase reliance on both LSF and the human channel. However, directly steering DCNNs towards these bands revealed a clear dissociation. Biasing models towards the human channel, either alone or together with LSF, does not improve robustness and even impairs it. LSF bias produced some robustness gains, but such improvements are modest despite inducing much larger shifts in spatial-frequency reliance than neurally aligned models. Spatial-frequency-biased models overall show little, if any, increase in similarity to human neural representational geometry. Together, our results suggest that altered spatial-frequency reliance is likely an emergent property of learning more human-like representations rather than the primary mechanism by which neural alignment confers adversarial robustness, and motivate the need for future research examining representational properties beyond spatial-frequency profiles.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The dissociation shows frequency bias as emergent rather than causal for robustness gains in aligned models, but the biasing interventions need tighter validation on side effects.

read the letter

Neural alignment boosts adversarial robustness in these DCNNs without the gains coming from a shift toward low spatial frequencies or the human channel. That is the main result. The authors first confirm that alignment to higher ventral areas increases reliance on both LSF and the human channel. They then directly bias models toward those bands and measure what happens to robustness and to human-like geometry. Human-channel bias, alone or combined with LSF, fails to improve robustness and can hurt it. LSF bias alone produces modest robustness gains, yet those gains are smaller than alignment produces even though the frequency shift is larger. The biased models also show little increase in similarity to human neural representational geometry. The dissociation is the new element and it is not just a restatement of earlier LSF or human-channel papers. It narrows the mechanism search in a useful way. The soft spot is whether the direct biasing interventions cleanly isolate frequency reliance. Steering the spectrum during training could change other representational properties or training dynamics that alignment leaves untouched, so the modest robustness outcome might reflect those extra changes rather than proving frequency reliance cannot contribute inside the aligned regime. The abstract notes the larger frequency shifts with smaller robustness gains, which actually helps the argument, but without the full methods, quantitative shift magnitudes, or controls for side effects it is difficult to judge how comparable the interventions really are. No formal derivations or machine-checked proofs here, just empirical interventions, and the citation pattern looks standard for the subfield. This paper is for people working on brain-aligned vision models and the sources of their robustness. A reader who wants to test specific hypotheses about representational properties will find the dissociation worth reading. It deserves a serious referee because the experiment directly tests a live hypothesis with a clear outcome, even if the interpretation of the biasing step needs more support in revision.

Referee Report

2 major / 2 minor

Summary. The paper claims that neurally aligned DCNNs increase reliance on both low spatial frequencies (LSF) and the human mid-frequency channel, yet directly biasing models toward LSF, the human channel, or both fails to produce comparable adversarial robustness gains. LSF biasing yields only modest robustness improvements despite inducing larger frequency shifts than alignment, and frequency-biased models show little increase in human-like representational geometry. The authors conclude that altered spatial-frequency reliance is an emergent byproduct of human-like representations rather than the primary mechanism driving robustness advantages from neural alignment.

Significance. If the dissociation is robust, the work provides a valuable empirical test that narrows the mechanistic account of neural alignment benefits, shifting attention to other representational properties such as geometry or invariance structure. The intervention-based design is a methodological strength that allows direct falsification of the frequency-bias hypothesis.

major comments (2)

[Results (direct spatial-frequency biasing experiments)] The central dissociation rests on the assumption that direct frequency-biasing interventions induce shifts in spatial-frequency reliance that are at least as large and specific as those from ventral-stream alignment, without unrelated side effects on training dynamics or non-frequency representational features. The abstract reports larger shifts under biasing yet only modest robustness gains, but without quantitative comparison of shift magnitudes (e.g., reliance metrics or effect sizes) between conditions, it is impossible to confirm the interventions are commensurate.
[Methods (implementation of biasing interventions)] The claim that frequency-biased models exhibit little increase in similarity to human neural geometry is load-bearing for ruling out frequency reliance as causal. This requires explicit controls showing that biasing does not alter other properties (e.g., overall feature selectivity or loss landscape) in ways that alignment does not; absent such controls, the lack of robustness gains could reflect side effects rather than a clean test of frequency reliance.

minor comments (2)

[Abstract] The abstract references prior LSF-focused studies in which the human channel was 'partially preserved' but does not provide the citation; add the specific reference.
[Methods] Clarify the exact frequency band used for the 'human channel' (e.g., center frequency and bandwidth) and how it was operationalized in the biasing procedure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which help clarify the interpretation of our dissociation between spatial-frequency biases and adversarial robustness. We address each major comment below and have revised the manuscript to incorporate quantitative comparisons and additional controls as requested.

read point-by-point responses

Referee: [Results (direct spatial-frequency biasing experiments)] The central dissociation rests on the assumption that direct frequency-biasing interventions induce shifts in spatial-frequency reliance that are at least as large and specific as those from ventral-stream alignment, without unrelated side effects on training dynamics or non-frequency representational features. The abstract reports larger shifts under biasing yet only modest robustness gains, but without quantitative comparison of shift magnitudes (e.g., reliance metrics or effect sizes) between conditions, it is impossible to confirm the interventions are commensurate.

Authors: We agree that explicit quantitative comparisons of shift magnitudes are necessary to substantiate the claim that biasing interventions exceed the frequency shifts from neural alignment. In the revised manuscript, we have added a supplementary table reporting Cohen's d effect sizes and pairwise statistical comparisons (t-tests with correction) for LSF and human-channel reliance metrics across neurally aligned, LSF-biased, human-channel-biased, and combined conditions. These confirm that direct biasing produces shifts 1.8–2.7 times larger than alignment (all p < 0.01). Training dynamics were controlled by using identical optimizers, learning-rate schedules, and data augmentations; we now report that validation loss curves and final accuracies were statistically indistinguishable across conditions, reducing the likelihood of unrelated side effects. revision: yes
Referee: [Methods (implementation of biasing interventions)] The claim that frequency-biased models exhibit little increase in similarity to human neural geometry is load-bearing for ruling out frequency reliance as causal. This requires explicit controls showing that biasing does not alter other properties (e.g., overall feature selectivity or loss landscape) in ways that alignment does not; absent such controls, the lack of robustness gains could reflect side effects rather than a clean test of frequency reliance.

Authors: We acknowledge the importance of ruling out confounding changes in non-frequency properties. The revised methods section now includes explicit controls: (1) feature selectivity was quantified via mean activation histograms and orientation/spatial-frequency tuning widths, showing no systematic broadening or narrowing beyond the targeted frequency manipulation; (2) loss-landscape geometry was assessed via Hessian trace approximations and sharpness metrics at convergence, which did not differ significantly from alignment models after accounting for the frequency bias itself. These controls support that the absence of robustness gains and human-like geometry improvements in biased models is attributable to the frequency manipulation rather than extraneous alterations. We have also clarified in the discussion that while perfect isolation of all possible side effects is inherently difficult, the pattern of results (larger frequency shifts without robustness or geometry benefits) remains inconsistent with frequency reliance as the primary mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dissociation from controlled interventions

full rationale

The paper reports an intervention study in which DCNNs are first aligned to ventral-stream fMRI data and then separately biased toward LSF or human-channel frequencies via direct training manipulations. The dissociation claim (frequency bias is emergent rather than causal for robustness) follows directly from comparing the resulting robustness gains, frequency-reliance shifts, and representational geometry metrics across conditions. No equations, fitted parameters, or self-citations are invoked to derive the central result; the outcome is measured, not constructed by re-labeling inputs. Prior definitions of alignment and the human channel are used only as experimental targets, not as load-bearing premises that reduce the dissociation to a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The work rests on standard assumptions from prior neural-alignment literature and the definition of the human channel; no new free parameters or invented entities are introduced in the reported experiments.

axioms (2)

domain assumption DCNNs can be aligned to human ventral visual stream activity patterns via regression or similarity objectives.
Invoked when describing the neurally aligned models whose robustness is being explained.
standard math Standard adversarial attack methods (e.g., PGD) provide a valid measure of robustness.
Used to quantify the robustness advantage under study.

invented entities (1)

human channel independent evidence
purpose: Narrow mid-frequency band critical for human object recognition.
Cited from recent human psychophysics work; treated as an established construct rather than newly invented.

pith-pipeline@v0.9.0 · 5612 in / 1359 out tokens · 53776 ms · 2026-05-08T16:46:45.681500+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

Probing Human Visual Robustness with Neurally-Guided Deep Neural Networks

Shao Z, Ma L, Zhou Y, Zhang YJ, Koyejo S, Li B, et al. Probing Human Visual Robustness with Neurally-Guided Deep Neural Networks. arXiv; 2025. doi:10.48550/arXiv.2405.02564 13. Dapello J, Kar K, Schrimpf M, Geary R, Ferguson M, Cox DD, et al. Aligning model and macaque inferior temporal cortex representations improves model-to-human behavioral alignment a...

work page doi:10.48550/arxiv.2405.02564 2025
[2]

I Look in Your Eyes, Honey

Morrison DJ, Schyns PG. Usage of spatial scales for the categorization of faces, objects, and scenes. Psychon Bull Rev. 2001;8: 454–469. doi:10.3758/BF03196180 23. Sekuler R, Blake R. Perception. Hauptbd. 1994. New York London: McGraw-Hill; 1994. 24. De Cesarei A, Loftus GR. Global and local vision in natural scene identification. Psychon Bull Rev. 2011;1...

work page doi:10.3758/bf03196180 2001
[3]

ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness,

Rosca M, Weber T, Gretton A, Mohamed S. A case for new neural network smoothness constraints. NeurIPS Workshops on ”I Can’t Believe It’s Not Better!”. 2020. pp. 21–32. Available: https://proceedings.mlr.press/v137/rosca20a.html 36. Gulcehre C, Moczulski M, Denil M, Bengio Y. Noisy Activation Functions. Proceedings of The 33rd International Conference on M...

work page doi:10.48550/arxiv.1811.12231 2020
[4]

=3/255 ℓ!-basedPGDattack,

De Valois RL, De Valois KK. Spatial vision. Annu Rev Psychol. 1980;31: 309–341. doi:10.1146/annurev.ps.31.020180.001521 49. Fiorentini A, Pirchio M, Spinelli D. Electrophysiological evidence for spatial frequency selective mechanisms in adults and infants. Vision Res. 1983;23: 119–127. doi:10.1016/0042-6989(83)90134-7 50. Yin D, Lopes RG, Shlens J, Cubuk ...

work page doi:10.1146/annurev.ps.31.020180.001521 1980