Recognition: unknown
Dissociating spatial frequency reliance from adversarial robustness advantages in neurally guided deep convolutional neural networks
Pith reviewed 2026-05-08 16:46 UTC · model grok-4.3
The pith
Aligning deep networks with human brain responses improves adversarial robustness without this advantage stemming mainly from changes in spatial frequency reliance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Neural alignment to higher-order regions of the human ventral visual stream systematically increases reliance on both low spatial frequencies and the human mid-frequency channel. However, directly biasing DCNNs toward these bands does not replicate the adversarial robustness gains from alignment: human-channel bias impairs robustness, low-spatial-frequency bias yields only modest gains despite larger frequency shifts, and frequency-biased models show little increase in similarity to human representational geometry. Thus altered spatial-frequency reliance is an emergent property of learning more human-like representations rather than the primary mechanism behind neural alignment's robustness.
What carries the argument
Dissociation between effects of neural alignment and direct spatial-frequency biasing interventions, measured via adversarial robustness and similarity to human neural representational geometry.
If this is right
- Adversarial robustness conferred by neural alignment depends on representational properties other than spatial frequency content.
- Direct low-spatial-frequency biasing provides only modest robustness benefits and is less efficient than alignment.
- Human-channel biasing does not improve and can reduce robustness.
- Frequency-biased models remain dissimilar to human neural geometry in ways that aligned models are not.
- Future robustness research should examine other aspects of human-like representations beyond frequency profiles.
Where Pith is reading between the lines
- The dissociation suggests brain alignment may capture higher-level invariances or semantic structure that frequency content alone does not provide.
- Testing alignment to early visual areas, which are more frequency-selective, could reveal whether robustness patterns differ by brain region.
- Design of robust AI systems may benefit from broader matching to human visual computations rather than targeted frequency tuning.
Load-bearing premise
Direct spatial-frequency biasing interventions produce shifts in frequency reliance comparable in magnitude and specificity to those induced by neural alignment, without introducing unrelated side effects.
What would settle it
Finding that direct biasing to match the spatial-frequency profile of a neurally aligned model produces equivalent adversarial robustness gains would falsify the claim that frequency reliance is not the primary mechanism.
Figures
read the original abstract
Deep convolutional neural networks (DCNNs) have rivaled humans on many visual tasks, yet they remain vulnerable to near-imperceptible perturbations generated by adversarial attacks. Recent work shows that aligning DCNN representations with human visual cortex activity improves adversarial robustness, but the mechanisms driving this advantage are unclear. One hypothesis suggests that neural alignment confers robustness by biasing models away from brittle high-frequency details and towards the low spatial frequencies (LSF). However, recent work shows that human object recognition critically depends on a narrow, mid-frequency "human channel". Interestingly, this band was partially preserved in prior LSF-focused studies. Here, we investigate whether a spectral bias towards the LSF or the human channel is the primary driver of the adversarial robustness observed in neurally aligned DCNNs. We first show that DCNNs aligned to higher-order regions of the human ventral visual stream systematically increase reliance on both LSF and the human channel. However, directly steering DCNNs towards these bands revealed a clear dissociation. Biasing models towards the human channel, either alone or together with LSF, does not improve robustness and even impairs it. LSF bias produced some robustness gains, but such improvements are modest despite inducing much larger shifts in spatial-frequency reliance than neurally aligned models. Spatial-frequency-biased models overall show little, if any, increase in similarity to human neural representational geometry. Together, our results suggest that altered spatial-frequency reliance is likely an emergent property of learning more human-like representations rather than the primary mechanism by which neural alignment confers adversarial robustness, and motivate the need for future research examining representational properties beyond spatial-frequency profiles.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that neurally aligned DCNNs increase reliance on both low spatial frequencies (LSF) and the human mid-frequency channel, yet directly biasing models toward LSF, the human channel, or both fails to produce comparable adversarial robustness gains. LSF biasing yields only modest robustness improvements despite inducing larger frequency shifts than alignment, and frequency-biased models show little increase in human-like representational geometry. The authors conclude that altered spatial-frequency reliance is an emergent byproduct of human-like representations rather than the primary mechanism driving robustness advantages from neural alignment.
Significance. If the dissociation is robust, the work provides a valuable empirical test that narrows the mechanistic account of neural alignment benefits, shifting attention to other representational properties such as geometry or invariance structure. The intervention-based design is a methodological strength that allows direct falsification of the frequency-bias hypothesis.
major comments (2)
- [Results (direct spatial-frequency biasing experiments)] The central dissociation rests on the assumption that direct frequency-biasing interventions induce shifts in spatial-frequency reliance that are at least as large and specific as those from ventral-stream alignment, without unrelated side effects on training dynamics or non-frequency representational features. The abstract reports larger shifts under biasing yet only modest robustness gains, but without quantitative comparison of shift magnitudes (e.g., reliance metrics or effect sizes) between conditions, it is impossible to confirm the interventions are commensurate.
- [Methods (implementation of biasing interventions)] The claim that frequency-biased models exhibit little increase in similarity to human neural geometry is load-bearing for ruling out frequency reliance as causal. This requires explicit controls showing that biasing does not alter other properties (e.g., overall feature selectivity or loss landscape) in ways that alignment does not; absent such controls, the lack of robustness gains could reflect side effects rather than a clean test of frequency reliance.
minor comments (2)
- [Abstract] The abstract references prior LSF-focused studies in which the human channel was 'partially preserved' but does not provide the citation; add the specific reference.
- [Methods] Clarify the exact frequency band used for the 'human channel' (e.g., center frequency and bandwidth) and how it was operationalized in the biasing procedure.
Simulated Author's Rebuttal
We thank the referee for their insightful comments, which help clarify the interpretation of our dissociation between spatial-frequency biases and adversarial robustness. We address each major comment below and have revised the manuscript to incorporate quantitative comparisons and additional controls as requested.
read point-by-point responses
-
Referee: [Results (direct spatial-frequency biasing experiments)] The central dissociation rests on the assumption that direct frequency-biasing interventions induce shifts in spatial-frequency reliance that are at least as large and specific as those from ventral-stream alignment, without unrelated side effects on training dynamics or non-frequency representational features. The abstract reports larger shifts under biasing yet only modest robustness gains, but without quantitative comparison of shift magnitudes (e.g., reliance metrics or effect sizes) between conditions, it is impossible to confirm the interventions are commensurate.
Authors: We agree that explicit quantitative comparisons of shift magnitudes are necessary to substantiate the claim that biasing interventions exceed the frequency shifts from neural alignment. In the revised manuscript, we have added a supplementary table reporting Cohen's d effect sizes and pairwise statistical comparisons (t-tests with correction) for LSF and human-channel reliance metrics across neurally aligned, LSF-biased, human-channel-biased, and combined conditions. These confirm that direct biasing produces shifts 1.8–2.7 times larger than alignment (all p < 0.01). Training dynamics were controlled by using identical optimizers, learning-rate schedules, and data augmentations; we now report that validation loss curves and final accuracies were statistically indistinguishable across conditions, reducing the likelihood of unrelated side effects. revision: yes
-
Referee: [Methods (implementation of biasing interventions)] The claim that frequency-biased models exhibit little increase in similarity to human neural geometry is load-bearing for ruling out frequency reliance as causal. This requires explicit controls showing that biasing does not alter other properties (e.g., overall feature selectivity or loss landscape) in ways that alignment does not; absent such controls, the lack of robustness gains could reflect side effects rather than a clean test of frequency reliance.
Authors: We acknowledge the importance of ruling out confounding changes in non-frequency properties. The revised methods section now includes explicit controls: (1) feature selectivity was quantified via mean activation histograms and orientation/spatial-frequency tuning widths, showing no systematic broadening or narrowing beyond the targeted frequency manipulation; (2) loss-landscape geometry was assessed via Hessian trace approximations and sharpness metrics at convergence, which did not differ significantly from alignment models after accounting for the frequency bias itself. These controls support that the absence of robustness gains and human-like geometry improvements in biased models is attributable to the frequency manipulation rather than extraneous alterations. We have also clarified in the discussion that while perfect isolation of all possible side effects is inherently difficult, the pattern of results (larger frequency shifts without robustness or geometry benefits) remains inconsistent with frequency reliance as the primary mechanism. revision: yes
Circularity Check
No circularity: empirical dissociation from controlled interventions
full rationale
The paper reports an intervention study in which DCNNs are first aligned to ventral-stream fMRI data and then separately biased toward LSF or human-channel frequencies via direct training manipulations. The dissociation claim (frequency bias is emergent rather than causal for robustness) follows directly from comparing the resulting robustness gains, frequency-reliance shifts, and representational geometry metrics across conditions. No equations, fitted parameters, or self-citations are invoked to derive the central result; the outcome is measured, not constructed by re-labeling inputs. Prior definitions of alignment and the human channel are used only as experimental targets, not as load-bearing premises that reduce the dissociation to a tautology.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption DCNNs can be aligned to human ventral visual stream activity patterns via regression or similarity objectives.
- standard math Standard adversarial attack methods (e.g., PGD) provide a valid measure of robustness.
invented entities (1)
-
human channel
independent evidence
Reference graph
Works this paper leans on
-
[1]
Probing Human Visual Robustness with Neurally-Guided Deep Neural Networks
Shao Z, Ma L, Zhou Y, Zhang YJ, Koyejo S, Li B, et al. Probing Human Visual Robustness with Neurally-Guided Deep Neural Networks. arXiv; 2025. doi:10.48550/arXiv.2405.02564 13. Dapello J, Kar K, Schrimpf M, Geary R, Ferguson M, Cox DD, et al. Aligning model and macaque inferior temporal cortex representations improves model-to-human behavioral alignment a...
-
[2]
Morrison DJ, Schyns PG. Usage of spatial scales for the categorization of faces, objects, and scenes. Psychon Bull Rev. 2001;8: 454–469. doi:10.3758/BF03196180 23. Sekuler R, Blake R. Perception. Hauptbd. 1994. New York London: McGraw-Hill; 1994. 24. De Cesarei A, Loftus GR. Global and local vision in natural scene identification. Psychon Bull Rev. 2011;1...
-
[3]
Rosca M, Weber T, Gretton A, Mohamed S. A case for new neural network smoothness constraints. NeurIPS Workshops on ”I Can’t Believe It’s Not Better!”. 2020. pp. 21–32. Available: https://proceedings.mlr.press/v137/rosca20a.html 36. Gulcehre C, Moczulski M, Denil M, Bengio Y. Noisy Activation Functions. Proceedings of The 33rd International Conference on M...
-
[4]
De Valois RL, De Valois KK. Spatial vision. Annu Rev Psychol. 1980;31: 309–341. doi:10.1146/annurev.ps.31.020180.001521 49. Fiorentini A, Pirchio M, Spinelli D. Electrophysiological evidence for spatial frequency selective mechanisms in adults and infants. Vision Res. 1983;23: 119–127. doi:10.1016/0042-6989(83)90134-7 50. Yin D, Lopes RG, Shlens J, Cubuk ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.