pith. sign in

arxiv: 2605.16351 · v1 · pith:UOK5YVF4new · submitted 2026-05-08 · 💻 cs.LG · cs.AI

PIMSM: Physics-Informed Multi-Scale Mamba for Stable Neural Representations under Distribution Shift

Pith reviewed 2026-05-20 23:32 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords multi-scale modelingdistribution shiftstate-space modelstemporal representationsfMRIweather forecastingphysics-informedMamba
0
0 comments X

The pith

Mapping spectrum-estimated knee frequencies to scale-specific discretization in a Mamba state-space model stabilizes representations under distribution shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that scientific time series are structured by interacting processes at multiple physical timescales, and that models become brittle under shifts when their memory policies do not match these timescales. This mismatch is formalized as temporal kernel mismatch, where in-distribution fitting produces representations that drift when context, resources, or task states change. PIMSM addresses the issue by estimating knee frequencies from the power spectrum to set discretization parameters for different scales, then anchoring those parameters to the signal's acquisition time units. The resulting architecture shows improved stability on fMRI data under truncation, low-resource transfer, and resting-to-task generalization, plus lower errors on held-out weather stations without any modality-specific tuning.

Core claim

PIMSM is a state-space architecture that maps spectrum-estimated transition points between frequency regimes (knee frequencies) to scale-specific discretization parameters and anchors them to acquisition time units. This alignment prevents temporal kernel mismatch and the resulting representation drift, yielding more stable neural representations that transfer across changes in temporal context, data volume, and dynamical regime.

What carries the argument

The mapping of spectrum-estimated knee frequencies to scale-specific discretization parameters in the multi-scale Mamba state-space model, anchored to physical acquisition time units.

If this is right

  • Improved robustness to severe temporal-context truncation on Human Connectome Project fMRI.
  • Better representation stability under extreme low-resource transfer scenarios.
  • Enhanced generalization from resting-state to task-state fMRI without retraining.
  • Lowest variable-wise MAE across horizons and variables on Weather-5K held-out-station spatial out-of-distribution forecasting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same knee-frequency anchoring could be tested on other multi-scale signals such as climate or neural recordings from different modalities.
  • If the mechanism holds, it suggests a general inductive bias for scientific foundation models: explicit timescale alignment may outperform pure data-driven fitting when structure must be preserved across domains.
  • One could check whether replacing the spectrum-based knee detection with learned or fixed scales removes the reported gains, isolating the contribution of physical anchoring.

Load-bearing premise

That natural dynamical systems are organized by interacting processes across multiple physical timescales and that failing to preserve this multiscale structure is the main cause of brittleness under distribution shift.

What would settle it

A direct comparison in which a standard single-scale Mamba, trained identically, matches or exceeds PIMSM performance on the same fMRI truncation, transfer, and weather out-of-distribution tasks.

Figures

Figures reproduced from arXiv: 2605.16351 by Jiook Cha, Sangyoon Bae, Shinjae Yoo.

Figure 1
Figure 1. Figure 1: Physics-informed multi-scale parameterization of temporal dynamics. (a) Spectral￾HyperNet computes the input PSD, estimates knee frequencies and spectral exponents, and fits a piecewise power-law PSD. (b1–b3) The knees partition the frequency axis, not the time series; an energy-weighted representative frequency from each regime is mapped to ordered discretization parameters in acquisition units (TRs for f… view at source ↗
Figure 2
Figure 2. Figure 2: Representation stability under temporal-context truncation (HCP motor task fMRI). (a) Relationship between representation similarity (CKA; full motor block vs early-window inputs) and performance gap (full block → 2TR / 3TR). (b) PCA visualization of latent embeddings under full-block, 3TR, and 2TR inputs. (c) Mean ℓ2 latent drift between full-block and early-window conditions. We assess robustness to temp… view at source ↗
Figure 3
Figure 3. Figure 3: Scaling behavior under temporal-context truncation (full block → 2TR). (a) Decoding accuracy vs model size. (b) Accuracy gap relative to full-block input. (c–d) CKA and dCor between full-block and 2TR representations. Model Full block 2 TRs CKA (Full block↔2TR) dCor (Full block↔2TR) PIMSM 0.987±0.007 0.933±0.016 0.812±0.033 0.846±0.022 Mamba2+drift 0.987±0.004 0.923±0.009 0.837±0.021 0.854±0.019 [PITH_FUL… view at source ↗
Figure 4
Figure 4. Figure 4: WeightWatcher layer-wise α distributions under full-block input (HCP motor task fMRI). Each point represents the power-law exponent α of the empirical spectral density of a weight matrix. The red dashed line (α = 6) marks the boundary above which weights are statistically indistinguishable from random matrices; the green dashed line (α = 2) indicates the onset of over￾training. PIMSM exhibits heterogeneous… view at source ↗
Figure 5
Figure 5. Figure 5: WeightWatcher layer-wise α distributions under 2 TR early-window condition (extreme temporal truncation). Under the 2 TR condition—an extreme data-scarce scenario in which only two time points are available per trial—PIMSM maintains a wide α spread (1.75–5.29), indicating that layer-wise representational diversity is preserved despite the severely reduced input. In contrast, Mamba2 concentrates more unifor… view at source ↗
Figure 6
Figure 6. Figure 6: Piecewise power-law structure of empirical PSD across neural datasets. (top left) HCP resting-state fMRI (TR= 0.72 s, 360 cortical ROIs). The BOLD PSD exhibits distinct scaling regimes separated by knee frequencies (f1, f2), consistent with the piecewise power-law model assumed by SpectralHyperNet. (top right) HCP motor task fMRI (TR= 0.72 s, 360 ROIs). Task-evoked BOLD signals display a similar piecewise … view at source ↗
Figure 7
Figure 7. Figure 7: Piecewise power-law structure of empirical PSD for Weather-5K (per variable). Weather-5K meteorological time series: temperature (tmp), dew point (dew), sea-level pressure (slp), wind angle (wnd_angle), and wind rate (wnd_rate). Because PIMSM fits knee frequencies per variable, each meteorological channel is analyzed independently rather than aggregated. The piecewise power-law structure is consistent acro… view at source ↗
read the original abstract

Scientific foundation models are expected to reuse representations under changes in dataset, acquisition protocol, and deployment domain, yet many sequence backbones treat scientific temporal structure as an unconstrained pattern to be fitted. We argue that this misses a central property of natural dynamical systems: neural and atmospheric time series are organized by interacting processes across multiple physical timescales, and failure to preserve this multiscale structure contributes to brittleness under distribution shift. We formalize this failure mode as temporal kernel mismatch, where a model fits in-distribution dynamics with an effective memory policy that is not anchored to the signal's physical timescales, leading to representation drift and degraded transfer. We propose Physics-Informed Multi-Scale Mamba (PIMSM), a state-space architecture that maps spectrum-estimated transition points between frequency regimes (knee frequencies) to scale-specific discretization parameters and anchors them to acquisition time units. On Human Connectome Project fMRI, PIMSM improves robustness and representation stability under severe temporal-context truncation, extreme low-resource transfer, and resting-state-to-task-state generalization. Without modality-specific adaptation, the same architecture also attains the lowest variable-wise MAE across all reported horizons and variables on Weather-5K held-out-station spatial out-of-distribution forecasting. These results support temporal-scale alignment as a practical inductive bias for scientific foundation models that must preserve structure, not only fit correlations, under deployment shift.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Physics-Informed Multi-Scale Mamba (PIMSM), a state-space model that estimates knee frequencies from the power spectrum of training data and maps them to scale-specific discretization parameters in the Mamba SSM. The central claim is that anchoring the model to these physical timescales reduces temporal kernel mismatch and improves representation stability under distribution shift. Experiments on Human Connectome Project fMRI report gains in robustness under temporal truncation, low-resource transfer, and resting-to-task generalization; the same architecture is reported to achieve lowest variable-wise MAE on Weather-5K held-out-station spatial OOD forecasting without modality-specific changes.

Significance. If the results and their attribution to multiscale physical anchoring hold, the work supplies a concrete inductive bias for scientific sequence models that must generalize across acquisition changes and domains. The approach of deriving discretization parameters from in-distribution spectra is a clear attempt to inject domain knowledge rather than treat dynamics as unconstrained patterns. Reproducible code or parameter-free derivations are not mentioned, but the falsifiable prediction that knee-frequency stability should correlate with robustness gains is a positive feature of the framing.

major comments (2)
  1. [§3 and §4] §3 (PIMSM construction) and §4 (knee-frequency procedure): the mapping from spectrum-estimated knee frequencies (computed on training data) to discretization parameters is load-bearing for the claim that robustness follows from temporal-scale alignment rather than incidental capacity or regularization. No analysis is provided showing that these knee locations remain stable under the exact shifts tested (temporal-context truncation, spatial OOD, resting-to-task). If knees shift, the physics-informed component reduces to a data-dependent heuristic whose benefit is not guaranteed by the multiscale premise.
  2. [§5] §5 (experimental results): the reported gains on HCP fMRI and Weather-5K are presented without ablations that isolate the contribution of the spectrum-derived discretization from a plain multi-scale Mamba or from other regularization choices. Without such controls it is difficult to attribute improvements specifically to preservation of physical timescales rather than increased model flexibility.
minor comments (2)
  1. [Abstract] Abstract: quantitative performance numbers, error bars, and the number of runs are omitted; adding the key MAE or stability metrics with statistical context would make the summary self-contained.
  2. [§3] Notation: the precise functional form that converts a knee frequency into a scale-specific discretization step (e.g., relation to dt or to the SSM time constant) should be written as an explicit equation rather than described in prose.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We appreciate the recognition of the potential significance of injecting domain knowledge via spectrum-derived discretization parameters. Below we respond to each major comment and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3 and §4] §3 (PIMSM construction) and §4 (knee-frequency procedure): the mapping from spectrum-estimated knee frequencies (computed on training data) to discretization parameters is load-bearing for the claim that robustness follows from temporal-scale alignment rather than incidental capacity or regularization. No analysis is provided showing that these knee locations remain stable under the exact shifts tested (temporal-context truncation, spatial OOD, resting-to-task). If knees shift, the physics-informed component reduces to a data-dependent heuristic whose benefit is not guaranteed by the multiscale premise.

    Authors: We agree that an explicit analysis of knee-frequency stability under the distribution shifts would provide stronger evidence for the physical anchoring interpretation. While the underlying physical timescales in fMRI (e.g., BOLD signal oscillations) and weather data are expected to be relatively invariant to the tested shifts, we will add to the revised manuscript a new subsection or supplementary figure that recomputes knee frequencies on the shifted data partitions (truncated time series, task-state scans, and held-out weather stations) and quantifies their deviation from the training estimates. This will allow readers to assess whether the discretization parameters remain approximately consistent or if additional factors contribute to the observed robustness. revision: yes

  2. Referee: [§5] §5 (experimental results): the reported gains on HCP fMRI and Weather-5K are presented without ablations that isolate the contribution of the spectrum-derived discretization from a plain multi-scale Mamba or from other regularization choices. Without such controls it is difficult to attribute improvements specifically to preservation of physical timescales rather than increased model flexibility.

    Authors: We acknowledge that the current experiments do not include direct ablations against a plain multi-scale Mamba baseline or alternative regularization approaches. In the revision, we will incorporate additional ablation studies: (1) a multi-scale Mamba variant with fixed discretization parameters independent of the spectrum, (2) a version with randomly sampled discretization scales, and (3) comparisons to standard regularization techniques such as dropout or weight decay adjustments. These controls will help isolate the specific benefit of the physics-informed, spectrum-derived mapping. revision: yes

Circularity Check

0 steps flagged

No significant circularity; preprocessing step is independent and results validated on external shifted benchmarks

full rationale

The paper computes knee frequencies via spectrum estimation on training data as a fixed preprocessing step to set discretization parameters in the Mamba SSM. This mapping is not derived from or dependent on the model's fitted performance, target metrics, or post-training quantities. Robustness claims are supported by direct empirical evaluation on held-out data under temporal truncation, low-resource transfer, resting-to-task shifts, and spatial OOD forecasting on Weather-5K. No equation or step reduces the claimed stability gains to a tautological re-expression of the inputs, and no load-bearing premise relies on unverified self-citation. The derivation remains self-contained against the reported benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that dynamical systems possess identifiable multi-scale physical structure captured by knee frequencies; no free parameters or invented entities are explicitly introduced in the abstract, though the mapping step may implicitly involve choices in frequency estimation.

axioms (1)
  • domain assumption Neural and atmospheric time series are organized by interacting processes across multiple physical timescales.
    Stated directly in the abstract as the central property missed by standard sequence backbones.

pith-pipeline@v0.9.0 · 5778 in / 1272 out tokens · 53267 ms · 2026-05-20T23:32:41.284165+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 5 internal anchors

  1. [1]

    Neuroimage , volume=

    The WU-Minn human connectome project: an overview , author=. Neuroimage , volume=. 2013 , publisher=

  2. [2]

    Neuroimage , volume=

    Function in the human connectome: task-fMRI and individual differences in behavior , author=. Neuroimage , volume=. 2013 , publisher=

  3. [3]

    Nature , volume=

    A multi-modal parcellation of human cerebral cortex , author=. Nature , volume=. 2016 , publisher=

  4. [4]

    arXiv preprint arXiv:2504.07654 , year=

    ms-Mamba: Multi-scale Mamba for Time-Series Forecasting , author=. arXiv preprint arXiv:2504.07654 , year=

  5. [5]

    arXiv preprint arXiv:2508.12247 , year=

    STM3: Mixture of Multiscale Mamba for Long-Term Spatio-Temporal Time-Series Prediction , author=. arXiv preprint arXiv:2508.12247 , year=

  6. [6]

    Scientific Reports , year=

    Multi-scale Wavelet-Mamba framework for spatiotemporal traffic forecasting , author=. Scientific Reports , year=

  7. [7]

    NeurIPS 2025 Workshop on Foundation Models for the Brain and Body , year=

    Evaluating Foundation Models for the Brain: A Dynamical Systems Perspective , author=. NeurIPS 2025 Workshop on Foundation Models for the Brain and Body , year=

  8. [8]

    arXiv preprint arXiv:2503.23394 , year=

    Spatiotemporal Learning of Brain Dynamics from fMRI Using Frequency-Specific Multi-Band Attention for Cognitive and Psychiatric Applications , author=. arXiv preprint arXiv:2503.23394 , year=

  9. [9]

    Journal of Neuroscience , volume=

    Scale-free properties of the functional magnetic resonance imaging signal during rest and task , author=. Journal of Neuroscience , volume=. 2011 , publisher=

  10. [10]

    Frontiers in physiology , volume=

    Scale-free and multifractal time dynamics of fMRI signals during rest and task , author=. Frontiers in physiology , volume=. 2012 , publisher=

  11. [11]

    Boundary-Layer Meteorology , volume=

    Power-law Scaling of Turbulence Cospectra for the Stably Stratified Atmospheric Boundary Layer , author=. Boundary-Layer Meteorology , volume=. 2020 , doi=

  12. [12]

    Journal of Geophysical Research: Atmospheres , volume=

    A model for turbulence spectra in the equilibrium range of the stable atmospheric boundary layer , author=. Journal of Geophysical Research: Atmospheres , volume=. 2020 , publisher=

  13. [13]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    Mamba: Linear-time sequence modeling with selective state spaces , author=. arXiv preprint arXiv:2312.00752 , year=

  14. [14]

    Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

    Transformers are ssms: Generalized models and efficient algorithms through structured state space duality , author=. arXiv preprint arXiv:2405.21060 , year=

  15. [15]

    Efficiently Modeling Long Sequences with Structured State Spaces

    Efficiently modeling long sequences with structured state spaces , author=. arXiv preprint arXiv:2111.00396 , year=

  16. [16]

    International conference on machine learning , pages=

    Similarity of neural network representations revisited , author=. International conference on machine learning , pages=. 2019 , organization=

  17. [17]

    Székely and Maria L

    Gábor J. Székely and Maria L. Rizzo and Nail K. Bakirov , journal =. Measuring and Testing Dependence by Correlation of Distances , urldate =

  18. [18]

    1999 , publisher=

    Discrete-time signal processing , author=. 1999 , publisher=

  19. [19]

    1959 , publisher=

    Statistical Forecasting for Inventory Control , author=. 1959 , publisher=

  20. [20]

    ArXiv , year=

    MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling , author=. ArXiv , year=

  21. [21]

    Nature Communications , volume=

    Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data , author=. Nature Communications , volume=. 2021 , publisher=

  22. [22]

    Developmental cognitive neuroscience , volume=

    The adolescent brain cognitive development (ABCD) study: imaging acquisition across 21 sites , author=. Developmental cognitive neuroscience , volume=. 2018 , publisher=

  23. [23]

    Nature Neuroscience , pages=

    A generalizable foundation model for analysis of human brain MRI , author=. Nature Neuroscience , pages=. 2026 , publisher=

  24. [24]

    Qin, Guo and Chen, Zhi and Liu, Yong and Shi, Zhiyuan and Liu, Haixuan and Huang, Xiangdong and Wang, Jianmin and Long, Mingsheng , journal=

  25. [25]

    Han, Lu and Liu, Yu and Li, Lan and Deng, Qiwen and Jiang, Jian and Sun, Yinbo and Yu, Zhe and Wang, Binfeng and Lu, Xingyu and Ma, Lintao and Ye, Han-Jia and Zhan, De-Chuan , booktitle=

  26. [26]

    arXiv preprint arXiv:2406.14399 , year=

    How far are today's time-series models from real-world weather forecasting applications? , author=. arXiv preprint arXiv:2406.14399 , year=

  27. [27]

    Forty-first International Conference on Machine Learning (ICML) , year=

    A decoder-only foundation model for time-series forecasting , author=. Forty-first International Conference on Machine Learning (ICML) , year=

  28. [28]

    Chronos: Learning the Language of Time Series

    Chronos: Learning the language of time series , author=. arXiv preprint arXiv:2403.07815 , year=

  29. [29]

    Forty-first International Conference on Machine Learning , year=

    Unified training of universal time series forecasting transformers , author=. Forty-first International Conference on Machine Learning , year=

  30. [30]

    International conference on learning representations , year=

    Reversible instance normalization for accurate time-series forecasting against distribution shift , author=. International conference on learning representations , year=

  31. [31]

    arXiv preprint arXiv:2503.01925 , year=

    Volume-Wise Task fMRI Decoding with Deep Learning: Enhancing Temporal Resolution and Cognitive Function Analysis , author=. arXiv preprint arXiv:2503.01925 , year=

  32. [32]

    arXiv preprint arXiv:2506.11167 , year=

    Towards a general-purpose foundation model for fMRI analysis , author=. arXiv preprint arXiv:2506.11167 , year=

  33. [33]

    arXiv preprint arXiv:2512.21881 , year=

    SLIM-Brain: A Data-and Training-Efficient Foundation Model for fMRI Data Analysis , author=. arXiv preprint arXiv:2512.21881 , year=

  34. [34]

    arXiv preprint arXiv:2601.23090 , year=

    Omni-fMRI: A Universal Atlas-Free fMRI Foundation Model , author=. arXiv preprint arXiv:2601.23090 , year=