Recognition: no theorem link
CHASM: Cross-frequency Harmonized Axis-Separable Mixing for Spectral Token Operators
Pith reviewed 2026-05-15 04:35 UTC · model grok-4.3
The pith
CHASM shares one channel eigenbasis across frequencies while keeping per-frequency positive gains to improve spectral token mixers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CHASM separates a shared channel eigenbasis, used by every frequency, from frequency-specific positive spectral gains, creating cross-frequency harmonization that strengthens spectral token operators when inserted into standard vision backbones.
What carries the argument
Shared channel eigenbasis spectral operator with per-frequency positive gains, applied separably along spatial axes.
If this is right
- Higher reconstruction quality in accelerated MRI tasks compared to same-backbone baselines.
- Improved segmentation accuracy on undersampled MRI data.
- Better results in natural-image reconstruction using the same backbone.
- Ablations confirm that dropping the shared-basis constraint weakens the observed benefit.
Where Pith is reading between the lines
- The same shared-basis idea could be tested in other frequency-domain operators beyond Fourier mixers.
- Coherent sampling geometry may prove important for realizing cross-frequency benefits in related architectures.
- The structured separation of shared and specific components might help control parameter count while retaining adaptivity.
Load-bearing premise
Enforcing a shared channel eigenbasis across frequencies supplies a useful inductive bias whose benefit is not merely an artifact of extra parameters or particular training setups.
What would settle it
An experiment in which removing the shared-basis constraint leaves performance unchanged or randomizing coherent sampling geometry eliminates the reported gains.
Figures
read the original abstract
Spectral token mixers based on Fourier transforms provide an efficient way to model global interactions in visual feature maps. Existing designs often either apply filter-wise spectral responses along fixed channel axes, or learn adaptive frequency-indexed channel mixing without explicitly aligning the channel directions used across frequencies. We propose CHASM, a Cross-frequency Harmonized Axis-Separable Mixer, as a structured middle ground. CHASM separates what should be shared from what should remain frequency-specific: all frequencies share a learned channel eigenbasis, while each frequency retains its own positive spectral gains. The shared basis makes channel directions comparable across the spectrum, whereas the positive gains preserve local spectral adaptivity. CHASM applies this structured operator separably along the height and width axes and is used as a drop-in replacement mixer inside existing backbones. We provide a structural characterization of the shared-basis operator family and evaluate CHASM through controlled same-backbone comparisons. Across accelerated MRI reconstruction, undersampled MRI segmentation, and natural-image reconstruction, CHASM consistently improves over same-backbone spectral-mixer baselines. Ablations show that removing the shared-basis constraint weakens performance, and randomizing coherent sampling geometry substantially reduces the gain, supporting cross-frequency harmonization as a useful inductive bias for spectral token operators.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes CHASM, a spectral token mixer for visual feature maps that enforces a shared learned channel eigenbasis across frequencies while allowing per-frequency positive spectral gains, applied axis-separably along height and width. It is positioned as a drop-in replacement in existing backbones and is evaluated via controlled same-backbone comparisons on accelerated MRI reconstruction, undersampled MRI segmentation, and natural-image reconstruction tasks, where it reports consistent gains over spectral-mixer baselines. Ablations are cited to show that removing the shared-basis constraint weakens performance and that randomizing sampling geometry reduces the benefit, framing the shared eigenbasis as a useful inductive bias for cross-frequency harmonization.
Significance. If the reported gains prove robust after parameter-matched controls and statistical verification, CHASM would supply a concrete structural prior for spectral operators that separates shared channel directions from frequency-specific scaling. This could be useful in domains like medical imaging where global frequency interactions matter and where existing Fourier-based mixers lack explicit cross-frequency alignment. The structural characterization of the shared-basis family is a positive element that could support future analysis.
major comments (1)
- [Abstract] Abstract and operator description: the central claim that the shared channel eigenbasis supplies a useful inductive bias (rather than a capacity artifact) rests on same-backbone comparisons and ablations, yet no statement confirms that CHASM and the frequency-indexed baselines have identical parameter counts. The construction (shared eigenbasis plus per-frequency gains) appears to add parameters relative to purely frequency-indexed mixing; without explicit matching or an ablation that isolates the basis while holding total parameters fixed, the performance delta cannot be attributed to cross-frequency harmonization.
minor comments (2)
- [Abstract] The abstract states 'consistent improvements' and 'ablations show' but supplies no quantitative values, error bars, exact baseline implementations, data splits, or statistical tests; these details are required to assess robustness.
- [Method] The positive spectral gains are described as 'positive' but the precise constraint (e.g., ReLU, softplus, or projection) and its effect on the operator's spectral properties should be stated explicitly.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and for identifying the need to clarify parameter counts in our comparisons. We address the concern directly below and will update the manuscript to make the parameter analysis explicit.
read point-by-point responses
-
Referee: [Abstract] Abstract and operator description: the central claim that the shared channel eigenbasis supplies a useful inductive bias (rather than a capacity artifact) rests on same-backbone comparisons and ablations, yet no statement confirms that CHASM and the frequency-indexed baselines have identical parameter counts. The construction (shared eigenbasis plus per-frequency gains) appears to add parameters relative to purely frequency-indexed mixing; without explicit matching or an ablation that isolates the basis while holding total parameters fixed, the performance delta cannot be attributed to cross-frequency harmonization.
Authors: We appreciate this observation. In fact, CHASM uses substantially fewer parameters than a purely frequency-indexed mixer. A frequency-indexed baseline applies an independent channel-mixing matrix at each frequency, incurring O(F·C²) parameters. CHASM instead learns one shared eigenbasis (O(C²)) and a scalar positive gain per frequency (O(F)), for a total of O(C² + F) parameters. The reported gains are therefore obtained with a strictly smaller model, which reinforces rather than undermines the value of the shared-basis inductive bias. We will revise the manuscript to (i) report exact parameter counts for CHASM and every baseline in the experimental tables, (ii) add a brief statement in the abstract and operator section confirming the parameter relationship, and (iii) include an additional ablation that inflates the baseline capacity to match or exceed CHASM’s parameter budget. revision: yes
Circularity Check
No circularity: empirical gains rest on controlled external-task comparisons, not self-defining derivations
full rationale
The paper presents CHASM as an architectural operator (shared channel eigenbasis plus per-frequency positive gains, applied axis-separably) and supports its utility via same-backbone empirical comparisons and ablations on accelerated MRI reconstruction, undersampled MRI segmentation, and natural-image tasks. No equations, structural characterizations, or first-principles derivations are shown that reduce the reported performance deltas to quantities defined by the same fitted parameters or by self-citation chains. The ablations (removing shared-basis constraint, randomizing sampling geometry) are described as independent checks on the inductive bias, and the evaluation framing explicitly uses external benchmarks rather than internal redefinitions. This keeps the central claim self-contained against the provided evidence.
Axiom & Free-Parameter Ledger
free parameters (2)
- shared channel eigenbasis
- per-frequency positive spectral gains
axioms (2)
- domain assumption Shared eigenbasis makes channel directions comparable across the spectrum
- domain assumption Positive gains preserve local spectral adaptivity
Reference graph
Works this paper leans on
-
[1]
International Conference on Learning Representations , year =
Fourier Neural Operator for Parametric Partial Differential Equations , author =. International Conference on Learning Representations , year =
-
[2]
International Conference on Learning Representations , year =
Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers , author =. International Conference on Learning Representations , year =
-
[3]
Advances in Neural Information Processing Systems , volume =
Global Filter Networks for Image Classification , author =. Advances in Neural Information Processing Systems , volume =
-
[4]
Advances in Neural Information Processing Systems , volume =
Fast Fourier Convolution , author =. Advances in Neural Information Processing Systems , volume =
-
[5]
Lustig, Michael and Donoho, David and Pauly, John M. , journal =. Sparse. 2007 , doi =
work page 2007
-
[6]
and Bruno, Mary and Defazio, Aaron and Parente, Marc and Geras, Krzysztof J
Zbontar, Jure and Knoll, Florian and Sriram, Anuroop and Muckley, Matthew J. and Bruno, Mary and Defazio, Aaron and Parente, Marc and Geras, Krzysztof J. and Katsnelson, Joe and Chandarana, Hersh and others , journal =
-
[7]
Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas , booktitle =. 2015 , doi =
work page 2015
-
[8]
International Conference on Learning Representations , year =
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author =. International Conference on Learning Representations , year =
-
[9]
First Conference on Language Modeling , year =
Mamba: Linear-Time Sequence Modeling with Selective State Spaces , author =. First Conference on Language Modeling , year =
-
[10]
International Conference on Learning Representations (ICLR) , year =
Fourier Neural Operator for Parametric Partial Differential Equations , author =. International Conference on Learning Representations (ICLR) , year =
-
[11]
Journal of Machine Learning Research , volume =
Fourier Neural Operator with Learned Deformations for PDEs on General Geometries , author =. Journal of Machine Learning Research , volume =. 2023 , url =
work page 2023
-
[12]
Advances in Neural Information Processing Systems (NeurIPS) , year =
Global Filter Networks for Image Classification , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[13]
International Conference on Learning Representations (ICLR) , year =
Efficient Token Mixing for Transformers via Adaptive Fourier Neural Operators , author =. International Conference on Learning Representations (ICLR) , year =
-
[14]
Advances in Neural Information Processing Systems (NeurIPS) , year =
Fast Fourier Convolution , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[15]
European Conference on Computer Vision (ECCV) , pages =
When Fast Fourier Transform Meets Transformer for Image Restoration , author =. European Conference on Computer Vision (ECCV) , pages =
-
[16]
Patro, Badri N. and Namboodiri, Vinay P. and Agneeswaran, Vijay S. , title =. Proceedings of the Winter Conference on Applications of Computer Vision (WACV) , month =. 2025 , pages =
work page 2025
-
[17]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =
Liu, Xiaoyi and Tang, Hao , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2025 , pages =
work page 2025
-
[18]
Chen, Hongli and Fang, Pengcheng and Chen, Yuxia and Ren, Yingxuan and Hao, Jing and Tang, Fangfang and Cai, Xiaohao and Shan, Shanshan and Liu, Feng , journal =. 2025 , doi =
work page 2025
-
[19]
Fang, Pengcheng and Chen, Hongli and Yao, Guangzhen and Shi, Jian and Tang, Fangfang and Cai, Xiaohao and Shan, Shanshan and Liu, Feng , journal =. 2025 , doi =
work page 2025
-
[20]
Tolstikhin, Ilya O. and Houlsby, Neil and Kolesnikov, Alexander and Beyer, Lucas and Zhai, Xiaohua and Unterthiner, Thomas and Yung, Jessica and Steiner, Andreas and Keysers, Daniel and Uszkoreit, Jakob and Lucic, Mario and Dosovitskiy, Alexey , booktitle =
-
[21]
Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng , booktitle =. 2022 , doi =
work page 2022
-
[22]
European Conference on Computer Vision (ECCV) , pages =
Frequency-Spatial Entanglement Learning for Camouflaged Object Detection , author =. European Conference on Computer Vision (ECCV) , pages =. 2024 , url =
work page 2024
-
[23]
IEEE Transactions on Medical Imaging , volume=
Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation , author=. IEEE Transactions on Medical Imaging , volume=. 2004 , publisher=
work page 2004
-
[24]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Scene parsing through ade20k dataset , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[25]
The scope of PSNR in image and video quality assessment , author=. Electronics letters , volume=. 2008 , publisher=
work page 2008
-
[26]
IEEE Transactions on Image Processing , volume=
Image quality assessment: from error visibility to structural similarity , author=. IEEE Transactions on Image Processing , volume=. 2004 , publisher=
work page 2004
-
[27]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
A convnet for the 2020s , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[28]
European conference on computer vision , pages=
Swin-unet: Unet-like pure transformer for medical image segmentation , author=. European conference on computer vision , pages=. 2022 , organization=
work page 2022
-
[29]
HiFi-Mamba: Dual-Stream W-Laplacian Enhanced Mamba for High-Fidelity MRI Reconstruction
HiFi-Mamba: Dual-Stream W-Laplacian Enhanced Mamba for High-Fidelity MRI Reconstruction , author=. arXiv preprint arXiv:2508.09179 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[30]
IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=
GFNet: Global filter networks for visual recognition , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2023 , publisher=
work page 2023
-
[31]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Fft-based dynamic token mixer for vision , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[33]
arXiv preprint arXiv:2502.18394 , year=
The fft strikes again: An efficient alternative to self-attention , author=. arXiv preprint arXiv:2502.18394 , year=
-
[34]
Fnet: Mixing tokens with fourier transforms , author=. Proceedings of the 2022 Conference of the north American chapter of the Association for Computational Linguistics: human language technologies , pages=
work page 2022
-
[35]
Proceedings of the IEEE/CVF international conference on computer vision , pages=
Adaptive frequency filters as efficient global token mixers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
-
[36]
Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
Spanet: Frequency-balancing token mixer using spectral pooling aggregation modulation , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
-
[37]
arXiv preprint arXiv:2111.13587 , year=
Adaptive fourier neural operators: Efficient token mixers for transformers , author=. arXiv preprint arXiv:2111.13587 , year=
-
[38]
IEEE Transactions on Artificial Intelligence , year=
Fourier-driven Lightweight Token Mixing Model for Efficient Time Series Forecasting , author=. IEEE Transactions on Artificial Intelligence , year=
-
[39]
Computers in Biology and Medicine , volume=
A global-frequency-domain network for medical image segmentation , author=. Computers in Biology and Medicine , volume=. 2023 , publisher=
work page 2023
-
[40]
IEEE Robotics and Automation Letters , year=
A frequency-based attention neural network and subject-adaptive transfer learning for sEMG hand gesture classification , author=. IEEE Robotics and Automation Letters , year=
-
[41]
2024 IEEE International Symposium on Biomedical Imaging (ISBI) , pages=
GLFNET: Global-Local (frequency) Filter Networks for efficient medical image segmentation , author=. 2024 IEEE International Symposium on Biomedical Imaging (ISBI) , pages=. 2024 , organization=
work page 2024
-
[42]
Computers in Biology and Medicine , volume=
Dual-domain faster Fourier convolution based network for MR image reconstruction , author=. Computers in Biology and Medicine , volume=. 2024 , publisher=
work page 2024
-
[43]
Medical Image Analysis , volume=
Fourier Convolution Block with global receptive field for MRI reconstruction , author=. Medical Image Analysis , volume=. 2025 , publisher=
work page 2025
-
[44]
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
Taco: Enhancing multimodal in-context learning via task mapping-guided sequence configuration , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
work page 2025
-
[45]
Yanshu Li and Yi Cao and Hongyang He and Qisen Cheng and Xiang Fu and Xi Xiao and Tianyang Wang and Ruixiang Tang , booktitle=. M. 2025 , url=
work page 2025
-
[46]
arXiv preprint arXiv:2508.07871 , year=
CATP: Contextually Adaptive Token Pruning for Efficient and Enhanced Multimodal In-Context Learning , author=. arXiv preprint arXiv:2508.07871 , year=
-
[47]
arXiv preprint arXiv:2505.17097 , year=
Cama: Enhancing multimodal in-context learning with context-aware modulated attention , author=. arXiv preprint arXiv:2505.17097 , year=
-
[48]
2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) , pages=
Frequency-aware Adaptive Filtering Network for Few-Shot Medical Image Segmentation , author=. 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) , pages=. 2024 , organization=
work page 2024
-
[49]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
Deep frequency filtering for domain generalization , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[50]
Available at SSRN 5509758 , year=
Brain-Inspired Frequency-Based Transformer with Neuromorphic Memory Consolidation for Natural Language Understanding , author=. Available at SSRN 5509758 , year=
-
[51]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Frequency-adaptive dilated convolution for semantic segmentation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[52]
arXiv preprint arXiv:2403.15360 , year=
Simba: Simplified mamba-based architecture for vision and multivariate time series , author=. arXiv preprint arXiv:2403.15360 , year=
-
[53]
International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=
Global k-space interpolation for dynamic MRI reconstruction using masked image modeling , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2023 , organization=
work page 2023
-
[54]
arXiv preprint arXiv:2507.15364 , year=
EEG-based Epileptic Prediction via a Two-stage Channel-aware Set Transformer Network , author=. arXiv preprint arXiv:2507.15364 , year=
- [55]
-
[56]
2025 International Joint Conference on Neural Networks (IJCNN) , pages=
TSNet: A Transformer-based Medical Image Segmentation Algorithm for Improving Channel Interaction , author=. 2025 International Joint Conference on Neural Networks (IJCNN) , pages=. 2025 , organization=
work page 2025
-
[57]
StarMA Net: A star-shape multi-scale attention network for medical imaging classification , author=. iScience , volume=. 2025 , publisher=
work page 2025
-
[58]
European Conference on Computer Vision , pages=
When fast fourier transform meets transformer for image restoration , author=. European Conference on Computer Vision , pages=. 2024 , organization=
work page 2024
-
[59]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
Scconv: Spatial and channel reconstruction convolution for feature redundancy , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[60]
An open, multi-vendor, multi-field-strength brain
Souza, Roberto and Lucena, Oeslle and Garrafa, Julia and Gobbi, David and Saluzzi, Marina and Appenzeller, Simone and Rittner, Let. An open, multi-vendor, multi-field-strength brain. NeuroImage , volume =. 2018 , doi =
work page 2018
-
[61]
arXiv preprint arXiv:2603.15569 , year=
Mamba-3: Improved sequence modeling using state space principles , author=. arXiv preprint arXiv:2603.15569 , year=
-
[62]
Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li , booktitle =. 2009 , doi =
work page 2009
-
[63]
Menze, Bjoern H. and Jakab, Andras and Bauer, Stefan and Kalpathy-Cramer, Jayashree and Farahani, Keyvan and Kirby, Justin and Burren, Yuliya and Porz, Nicole and Slotboom, Johannes and Wiest, Roland and others , journal =. The Multimodal Brain Tumor Image Segmentation Benchmark (. 2015 , doi =
work page 2015
-
[64]
IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=
Metaformer baselines for vision , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2023 , publisher=
work page 2023
-
[65]
arXiv preprint arXiv:2308.13363 , year=
CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing , author=. arXiv preprint arXiv:2308.13363 , year=
-
[66]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
Convnext v2: Co-designing and scaling convnets with masked autoencoders , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
- [67]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.