PIMSM: Physics-Informed Multi-Scale Mamba for Stable Neural Representations under Distribution Shift
Pith reviewed 2026-05-20 23:32 UTC · model grok-4.3
The pith
Mapping spectrum-estimated knee frequencies to scale-specific discretization in a Mamba state-space model stabilizes representations under distribution shifts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PIMSM is a state-space architecture that maps spectrum-estimated transition points between frequency regimes (knee frequencies) to scale-specific discretization parameters and anchors them to acquisition time units. This alignment prevents temporal kernel mismatch and the resulting representation drift, yielding more stable neural representations that transfer across changes in temporal context, data volume, and dynamical regime.
What carries the argument
The mapping of spectrum-estimated knee frequencies to scale-specific discretization parameters in the multi-scale Mamba state-space model, anchored to physical acquisition time units.
If this is right
- Improved robustness to severe temporal-context truncation on Human Connectome Project fMRI.
- Better representation stability under extreme low-resource transfer scenarios.
- Enhanced generalization from resting-state to task-state fMRI without retraining.
- Lowest variable-wise MAE across horizons and variables on Weather-5K held-out-station spatial out-of-distribution forecasting.
Where Pith is reading between the lines
- The same knee-frequency anchoring could be tested on other multi-scale signals such as climate or neural recordings from different modalities.
- If the mechanism holds, it suggests a general inductive bias for scientific foundation models: explicit timescale alignment may outperform pure data-driven fitting when structure must be preserved across domains.
- One could check whether replacing the spectrum-based knee detection with learned or fixed scales removes the reported gains, isolating the contribution of physical anchoring.
Load-bearing premise
That natural dynamical systems are organized by interacting processes across multiple physical timescales and that failing to preserve this multiscale structure is the main cause of brittleness under distribution shift.
What would settle it
A direct comparison in which a standard single-scale Mamba, trained identically, matches or exceeds PIMSM performance on the same fMRI truncation, transfer, and weather out-of-distribution tasks.
Figures
read the original abstract
Scientific foundation models are expected to reuse representations under changes in dataset, acquisition protocol, and deployment domain, yet many sequence backbones treat scientific temporal structure as an unconstrained pattern to be fitted. We argue that this misses a central property of natural dynamical systems: neural and atmospheric time series are organized by interacting processes across multiple physical timescales, and failure to preserve this multiscale structure contributes to brittleness under distribution shift. We formalize this failure mode as temporal kernel mismatch, where a model fits in-distribution dynamics with an effective memory policy that is not anchored to the signal's physical timescales, leading to representation drift and degraded transfer. We propose Physics-Informed Multi-Scale Mamba (PIMSM), a state-space architecture that maps spectrum-estimated transition points between frequency regimes (knee frequencies) to scale-specific discretization parameters and anchors them to acquisition time units. On Human Connectome Project fMRI, PIMSM improves robustness and representation stability under severe temporal-context truncation, extreme low-resource transfer, and resting-state-to-task-state generalization. Without modality-specific adaptation, the same architecture also attains the lowest variable-wise MAE across all reported horizons and variables on Weather-5K held-out-station spatial out-of-distribution forecasting. These results support temporal-scale alignment as a practical inductive bias for scientific foundation models that must preserve structure, not only fit correlations, under deployment shift.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Physics-Informed Multi-Scale Mamba (PIMSM), a state-space model that estimates knee frequencies from the power spectrum of training data and maps them to scale-specific discretization parameters in the Mamba SSM. The central claim is that anchoring the model to these physical timescales reduces temporal kernel mismatch and improves representation stability under distribution shift. Experiments on Human Connectome Project fMRI report gains in robustness under temporal truncation, low-resource transfer, and resting-to-task generalization; the same architecture is reported to achieve lowest variable-wise MAE on Weather-5K held-out-station spatial OOD forecasting without modality-specific changes.
Significance. If the results and their attribution to multiscale physical anchoring hold, the work supplies a concrete inductive bias for scientific sequence models that must generalize across acquisition changes and domains. The approach of deriving discretization parameters from in-distribution spectra is a clear attempt to inject domain knowledge rather than treat dynamics as unconstrained patterns. Reproducible code or parameter-free derivations are not mentioned, but the falsifiable prediction that knee-frequency stability should correlate with robustness gains is a positive feature of the framing.
major comments (2)
- [§3 and §4] §3 (PIMSM construction) and §4 (knee-frequency procedure): the mapping from spectrum-estimated knee frequencies (computed on training data) to discretization parameters is load-bearing for the claim that robustness follows from temporal-scale alignment rather than incidental capacity or regularization. No analysis is provided showing that these knee locations remain stable under the exact shifts tested (temporal-context truncation, spatial OOD, resting-to-task). If knees shift, the physics-informed component reduces to a data-dependent heuristic whose benefit is not guaranteed by the multiscale premise.
- [§5] §5 (experimental results): the reported gains on HCP fMRI and Weather-5K are presented without ablations that isolate the contribution of the spectrum-derived discretization from a plain multi-scale Mamba or from other regularization choices. Without such controls it is difficult to attribute improvements specifically to preservation of physical timescales rather than increased model flexibility.
minor comments (2)
- [Abstract] Abstract: quantitative performance numbers, error bars, and the number of runs are omitted; adding the key MAE or stability metrics with statistical context would make the summary self-contained.
- [§3] Notation: the precise functional form that converts a knee frequency into a scale-specific discretization step (e.g., relation to dt or to the SSM time constant) should be written as an explicit equation rather than described in prose.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We appreciate the recognition of the potential significance of injecting domain knowledge via spectrum-derived discretization parameters. Below we respond to each major comment and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3 and §4] §3 (PIMSM construction) and §4 (knee-frequency procedure): the mapping from spectrum-estimated knee frequencies (computed on training data) to discretization parameters is load-bearing for the claim that robustness follows from temporal-scale alignment rather than incidental capacity or regularization. No analysis is provided showing that these knee locations remain stable under the exact shifts tested (temporal-context truncation, spatial OOD, resting-to-task). If knees shift, the physics-informed component reduces to a data-dependent heuristic whose benefit is not guaranteed by the multiscale premise.
Authors: We agree that an explicit analysis of knee-frequency stability under the distribution shifts would provide stronger evidence for the physical anchoring interpretation. While the underlying physical timescales in fMRI (e.g., BOLD signal oscillations) and weather data are expected to be relatively invariant to the tested shifts, we will add to the revised manuscript a new subsection or supplementary figure that recomputes knee frequencies on the shifted data partitions (truncated time series, task-state scans, and held-out weather stations) and quantifies their deviation from the training estimates. This will allow readers to assess whether the discretization parameters remain approximately consistent or if additional factors contribute to the observed robustness. revision: yes
-
Referee: [§5] §5 (experimental results): the reported gains on HCP fMRI and Weather-5K are presented without ablations that isolate the contribution of the spectrum-derived discretization from a plain multi-scale Mamba or from other regularization choices. Without such controls it is difficult to attribute improvements specifically to preservation of physical timescales rather than increased model flexibility.
Authors: We acknowledge that the current experiments do not include direct ablations against a plain multi-scale Mamba baseline or alternative regularization approaches. In the revision, we will incorporate additional ablation studies: (1) a multi-scale Mamba variant with fixed discretization parameters independent of the spectrum, (2) a version with randomly sampled discretization scales, and (3) comparisons to standard regularization techniques such as dropout or weight decay adjustments. These controls will help isolate the specific benefit of the physics-informed, spectrum-derived mapping. revision: yes
Circularity Check
No significant circularity; preprocessing step is independent and results validated on external shifted benchmarks
full rationale
The paper computes knee frequencies via spectrum estimation on training data as a fixed preprocessing step to set discretization parameters in the Mamba SSM. This mapping is not derived from or dependent on the model's fitted performance, target metrics, or post-training quantities. Robustness claims are supported by direct empirical evaluation on held-out data under temporal truncation, low-resource transfer, resting-to-task shifts, and spatial OOD forecasting on Weather-5K. No equation or step reduces the claimed stability gains to a tautological re-expression of the inputs, and no load-bearing premise relies on unverified self-citation. The derivation remains self-contained against the reported benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Neural and atmospheric time series are organized by interacting processes across multiple physical timescales.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
maps spectrum-estimated transition points between frequency regimes (knee frequencies) to scale-specific discretization parameters and anchors them to acquisition time units
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
multi-scale kernels gK(t) = sum αk e^{-t/τk} with ordered τ1 ≥ τ2 ≥ …
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
The WU-Minn human connectome project: an overview , author=. Neuroimage , volume=. 2013 , publisher=
work page 2013
-
[2]
Function in the human connectome: task-fMRI and individual differences in behavior , author=. Neuroimage , volume=. 2013 , publisher=
work page 2013
-
[3]
A multi-modal parcellation of human cerebral cortex , author=. Nature , volume=. 2016 , publisher=
work page 2016
-
[4]
arXiv preprint arXiv:2504.07654 , year=
ms-Mamba: Multi-scale Mamba for Time-Series Forecasting , author=. arXiv preprint arXiv:2504.07654 , year=
-
[5]
arXiv preprint arXiv:2508.12247 , year=
STM3: Mixture of Multiscale Mamba for Long-Term Spatio-Temporal Time-Series Prediction , author=. arXiv preprint arXiv:2508.12247 , year=
work page internal anchor Pith review arXiv
-
[6]
Multi-scale Wavelet-Mamba framework for spatiotemporal traffic forecasting , author=. Scientific Reports , year=
-
[7]
NeurIPS 2025 Workshop on Foundation Models for the Brain and Body , year=
Evaluating Foundation Models for the Brain: A Dynamical Systems Perspective , author=. NeurIPS 2025 Workshop on Foundation Models for the Brain and Body , year=
work page 2025
-
[8]
arXiv preprint arXiv:2503.23394 , year=
Spatiotemporal Learning of Brain Dynamics from fMRI Using Frequency-Specific Multi-Band Attention for Cognitive and Psychiatric Applications , author=. arXiv preprint arXiv:2503.23394 , year=
-
[9]
Journal of Neuroscience , volume=
Scale-free properties of the functional magnetic resonance imaging signal during rest and task , author=. Journal of Neuroscience , volume=. 2011 , publisher=
work page 2011
-
[10]
Frontiers in physiology , volume=
Scale-free and multifractal time dynamics of fMRI signals during rest and task , author=. Frontiers in physiology , volume=. 2012 , publisher=
work page 2012
-
[11]
Boundary-Layer Meteorology , volume=
Power-law Scaling of Turbulence Cospectra for the Stably Stratified Atmospheric Boundary Layer , author=. Boundary-Layer Meteorology , volume=. 2020 , doi=
work page 2020
-
[12]
Journal of Geophysical Research: Atmospheres , volume=
A model for turbulence spectra in the equilibrium range of the stable atmospheric boundary layer , author=. Journal of Geophysical Research: Atmospheres , volume=. 2020 , publisher=
work page 2020
-
[13]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Mamba: Linear-time sequence modeling with selective state spaces , author=. arXiv preprint arXiv:2312.00752 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Transformers are ssms: Generalized models and efficient algorithms through structured state space duality , author=. arXiv preprint arXiv:2405.21060 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
Efficiently Modeling Long Sequences with Structured State Spaces
Efficiently modeling long sequences with structured state spaces , author=. arXiv preprint arXiv:2111.00396 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[16]
International conference on machine learning , pages=
Similarity of neural network representations revisited , author=. International conference on machine learning , pages=. 2019 , organization=
work page 2019
-
[17]
Gábor J. Székely and Maria L. Rizzo and Nail K. Bakirov , journal =. Measuring and Testing Dependence by Correlation of Distances , urldate =
- [18]
-
[19]
Statistical Forecasting for Inventory Control , author=. 1959 , publisher=
work page 1959
-
[20]
MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling , author=. ArXiv , year=
-
[21]
Nature Communications , volume=
Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data , author=. Nature Communications , volume=. 2021 , publisher=
work page 2021
-
[22]
Developmental cognitive neuroscience , volume=
The adolescent brain cognitive development (ABCD) study: imaging acquisition across 21 sites , author=. Developmental cognitive neuroscience , volume=. 2018 , publisher=
work page 2018
-
[23]
A generalizable foundation model for analysis of human brain MRI , author=. Nature Neuroscience , pages=. 2026 , publisher=
work page 2026
-
[24]
Qin, Guo and Chen, Zhi and Liu, Yong and Shi, Zhiyuan and Liu, Haixuan and Huang, Xiangdong and Wang, Jianmin and Long, Mingsheng , journal=
-
[25]
Han, Lu and Liu, Yu and Li, Lan and Deng, Qiwen and Jiang, Jian and Sun, Yinbo and Yu, Zhe and Wang, Binfeng and Lu, Xingyu and Ma, Lintao and Ye, Han-Jia and Zhan, De-Chuan , booktitle=
-
[26]
arXiv preprint arXiv:2406.14399 , year=
How far are today's time-series models from real-world weather forecasting applications? , author=. arXiv preprint arXiv:2406.14399 , year=
-
[27]
Forty-first International Conference on Machine Learning (ICML) , year=
A decoder-only foundation model for time-series forecasting , author=. Forty-first International Conference on Machine Learning (ICML) , year=
-
[28]
Chronos: Learning the Language of Time Series
Chronos: Learning the language of time series , author=. arXiv preprint arXiv:2403.07815 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[29]
Forty-first International Conference on Machine Learning , year=
Unified training of universal time series forecasting transformers , author=. Forty-first International Conference on Machine Learning , year=
-
[30]
International conference on learning representations , year=
Reversible instance normalization for accurate time-series forecasting against distribution shift , author=. International conference on learning representations , year=
-
[31]
arXiv preprint arXiv:2503.01925 , year=
Volume-Wise Task fMRI Decoding with Deep Learning: Enhancing Temporal Resolution and Cognitive Function Analysis , author=. arXiv preprint arXiv:2503.01925 , year=
-
[32]
arXiv preprint arXiv:2506.11167 , year=
Towards a general-purpose foundation model for fMRI analysis , author=. arXiv preprint arXiv:2506.11167 , year=
-
[33]
arXiv preprint arXiv:2512.21881 , year=
SLIM-Brain: A Data-and Training-Efficient Foundation Model for fMRI Data Analysis , author=. arXiv preprint arXiv:2512.21881 , year=
-
[34]
arXiv preprint arXiv:2601.23090 , year=
Omni-fMRI: A Universal Atlas-Free fMRI Foundation Model , author=. arXiv preprint arXiv:2601.23090 , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.