Recognition: 2 theorem links
· Lean TheoremSafety, Security, and Cognitive Risks in State-Space Models: A Systematic Threat Analysis with Spectral, Stateful, and Capacity Attacks
Pith reviewed 2026-05-13 17:30 UTC · model grok-4.3
The pith
State-space models are open to spectral, stateful, and capacity attacks that corrupt their compressed internal states at up to six times the rate of random inputs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that state-space models possess an exploitable attack surface consisting of spectral sensitivity in their transfer functions, delayed state persistence that enables backdoors, and finite state capacity that can be saturated to force forgetting. It supports this with a formal threat model including the State Integrity Violation metric and a Spectral Sensitivity Proposition based on the H-infinity norm, plus empirical demonstrations of 6.0x higher StIV in genomic injection, 156x output perturbation via PGD state injection, and quadratic-complexity state extraction.
What carries the argument
The SSM Attack Surface framework with five layers, the State Integrity Violation (StIV) metric, and the Spectral Sensitivity Proposition grounded in the H-infinity norm.
If this is right
- Targeted genomic injections achieve StIV of 0.519 compared with 0.086 for random inputs, a 6.0x increase with statistical significance.
- Projected gradient descent state injection produces 156 times greater output perturbation than random baselines.
- SSD-structured state extraction succeeds at O(N squared) query complexity instead of the expected O(N cubed), yielding an N-fold speedup.
- Delayed-trigger backdoors can remain dormant for thousands of steps before activating.
- Cognitive risk hypotheses follow directly from the state-compression mechanics in these architectures.
Where Pith is reading between the lines
- If the attacks transfer to deployed models, existing long-context applications in medicine and security would need new state-protection layers.
- The five-layer attack surface could be tested on hybrid architectures such as Jamba to check whether the same spectral and capacity vulnerabilities appear.
- Mapping the attacks to MITRE ATLAS suggests that current governance frameworks for AI may need SSM-specific extensions.
- State capacity saturation might interact with context-length limits in ways that produce silent errors not caught by standard output monitoring.
Load-bearing premise
The attacks remain effective and transferable when applied directly to large pretrained state-space model checkpoints in production safety-critical systems without extra defenses.
What would settle it
Applying the targeted genomic injection attack to a pretrained Mamba checkpoint on real clinical time-series data and measuring whether the resulting StIV reaches approximately 0.519 rather than staying near the random baseline of 0.086.
read the original abstract
State-Space Models (SSMs) -- structured SSMs (S4, S4D, DSS, S5), selective SSMs (Mamba, Mamba-2), and hybrid architectures (Jamba) -- are deployed in safety-critical long-context applications: genomic analysis, clinical time-series forecasting, and cybersecurity log processing. Their linear-time scaling is compelling, yet the security properties of their compressed-state recurrent architectures remain unstudied. We present the first systematic treatment of SSM safety, security, and cognitive risks. Seven contributions: (1) Formal threat framework -- SSM Attack Surface (five layers), State Integrity Violation (StIV), Cross-Context Amplification Ratio $\mathcal{X}_\mathcal{S}$, and a Spectral Sensitivity Proposition grounded in the $H_\infty$ norm. (2) Three novel attack classes: spectral adversarial attacks (transfer-function gain exploitation), delayed-trigger stateful backdoors (activate thousands of steps after injection), and state capacity saturation (entropy flooding forces silent forgetting). (3) 14 MITRE ATLAS technique extensions across the full tactic chain. (4) Six-profile attacker taxonomy with kill chains for genomics, clinical, and cybersecurity domains. (5) Four cognitive risk hypotheses grounded in state-compression mechanics. (6) Governance-aligned mitigations mapped to CREST, NIST AI 600-1, and EU AI Act. (7) Empirical evaluation: targeted genomic injection achieves $\mathrm{StIV}=0.519$ vs. $0.086$ random ($6.0\times$, $p<0.001$); PGD state injection achieves $156\times$ output perturbation over random; SSD-structured extraction confirmed at $O(N^2)$ vs. $O(N^3)$ query complexity ($N\times$ speedup). Validation on pretrained checkpoints is detailed in the Appendix.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript provides the first systematic analysis of safety, security, and cognitive risks in State-Space Models (SSMs), including structured (S4, S5), selective (Mamba), and hybrid architectures. It introduces a formal threat framework with five-layer SSM Attack Surface, State Integrity Violation (StIV) metric, Cross-Context Amplification Ratio X_S, and Spectral Sensitivity Proposition; defines three novel attack classes (spectral adversarial, delayed-trigger stateful backdoors, state capacity saturation); extends 14 MITRE ATLAS techniques; presents a six-profile attacker taxonomy with domain-specific kill chains; proposes four cognitive risk hypotheses; maps mitigations to CREST/NIST/EU AI Act; and reports empirical results on pretrained checkpoints including targeted genomic injection achieving StIV=0.519 vs. 0.086 random (6.0×, p<0.001), 156× PGD output perturbation, and SSD-structured extraction at O(N²) vs. O(N³) query complexity.
Significance. If the empirical attack results hold under the reported conditions and the new metrics prove robust, the work would be significant for highlighting previously unstudied vulnerabilities in SSMs deployed in genomic, clinical, and cybersecurity pipelines. The structured threat model, attacker taxonomy, and governance mappings provide a reusable foundation for security analysis of recurrent compressed-state architectures, while the quantitative demonstrations (multiplicative gains and complexity reductions) offer concrete evidence that could guide defensive research. The absence of free parameters in the core derivations is a strength.
major comments (2)
- [Empirical Evaluation (Appendix)] Empirical Evaluation (Appendix): the headline quantitative claims (StIV=0.519 targeted vs. 0.086 random; 156× PGD perturbation; O(N²) extraction) are reported on pretrained checkpoints, yet no explicit state dimension, layer count, or training regime is provided for the evaluated models. This prevents assessment of whether the gains arise from fundamental SSM recurrence properties or from small-scale configurations that may not survive domain constraints or basic mitigations.
- [Threat Framework (§2)] Threat Framework (§2): the Spectral Sensitivity Proposition is grounded in the H_∞ norm and the new metrics StIV and X_S are presented as independent contributions, but the manuscript does not demonstrate that these metrics remain well-defined or yield the claimed amplification ratios when the underlying SSM is fine-tuned or aligned, which is load-bearing for the transferability of the attack classes to production pipelines.
minor comments (2)
- [Abstract] The abstract states that validation details appear in the Appendix, but the main text would benefit from a brief summary table listing the exact SSM variants, state sizes, and sequence lengths used for each attack class.
- [§2] Notation for X_S and the five-layer attack surface is introduced without an early consolidated definition table; a single reference table in §2 would reduce cross-referencing.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback, which identifies key areas for improving the reproducibility and applicability of our results. We address each major comment below and commit to targeted revisions that strengthen the manuscript without altering its core claims.
read point-by-point responses
-
Referee: Empirical Evaluation (Appendix): the headline quantitative claims (StIV=0.519 targeted vs. 0.086 random; 156× PGD perturbation; O(N²) extraction) are reported on pretrained checkpoints, yet no explicit state dimension, layer count, or training regime is provided for the evaluated models. This prevents assessment of whether the gains arise from fundamental SSM recurrence properties or from small-scale configurations that may not survive domain constraints or basic mitigations.
Authors: We agree that explicit architectural details are required for full reproducibility and to confirm that the reported gains derive from SSM recurrence rather than scale-specific effects. In the revised version we will expand the Appendix to list state dimensions (16–64), layer counts, and training regimes for each pretrained checkpoint evaluated (Mamba, S5, Jamba). These parameters are taken directly from the original model releases and will be accompanied by a short discussion showing that the attack surfaces and metrics remain consistent across the tested scales. revision: yes
-
Referee: Threat Framework (§2): the Spectral Sensitivity Proposition is grounded in the H_∞ norm and the new metrics StIV and X_S are presented as independent contributions, but the manuscript does not demonstrate that these metrics remain well-defined or yield the claimed amplification ratios when the underlying SSM is fine-tuned or aligned, which is load-bearing for the transferability of the attack classes to production pipelines.
Authors: The formal definitions of StIV, X_S, and the Spectral Sensitivity Proposition depend only on the state-transition matrices and are therefore independent of training regime. The current experiments establish baseline behavior on pretrained checkpoints. We acknowledge that explicit verification on fine-tuned and aligned models is necessary to support production transferability. The revision will add a short subsection in §2 together with Appendix experiments on fine-tuned variants demonstrating that the metrics remain well-defined and preserve amplification ratios within 10–15 % of the pretrained values. revision: yes
Circularity Check
No circularity: new metrics and attack classes defined independently of results
full rationale
The paper introduces a threat framework, new quantities (StIV, Cross-Context Amplification Ratio, Spectral Sensitivity Proposition), and three attack classes as original contributions. These are presented as definitions and hypotheses rather than derived from fitted parameters or prior self-citations. Empirical numbers (StIV=0.519 vs 0.086, 156× perturbation, O(N²) extraction) are reported outcomes of experiments on pretrained checkpoints, not quantities that reduce to the definitions by construction. No equations appear that equate a claimed prediction back to an input fit, and no load-bearing uniqueness theorem or ansatz is imported via self-citation. The derivation chain is therefore self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption SSMs are deployed in safety-critical long-context applications such as genomic analysis and clinical forecasting
- standard math The H_infty norm grounds spectral sensitivity in SSM transfer functions
invented entities (5)
-
State Integrity Violation (StIV)
no independent evidence
-
Cross-Context Amplification Ratio X_S
no independent evidence
-
Spectral adversarial attacks
no independent evidence
-
Delayed-trigger stateful backdoors
no independent evidence
-
State capacity saturation attacks
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearSSM Attack Surface (five layers), State Integrity Violation (StIV), Spectral Sensitivity Proposition grounded in the H∞ norm
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction uncleardelayed-trigger stateful backdoors... state capacity saturation
Reference graph
Works this paper leans on
- [1]
-
[2]
Y . Bai, X. Lv, J. Zhang, H. Lyu, J. Tang, Z. Huang, Z. Du, X. Liu, A. Zeng, L. Hou, et al. LongBench: A bilingual, multitask benchmark for long context understanding.Association for Computational Linguistics (ACL), 2024
work page 2024
-
[3]
E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell. On the dangers of stochastic parrots: Can language models be too big?ACM Conference on Fairness, Accountability, and Transparency (FAccT), pages 610–623, 2021. 23 Safety, Security, and Cognitive Risks in State-Space Models
work page 2021
-
[4]
N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks.IEEE Symposium on Security and Privacy (SP), pages 39–57, 2017
work page 2017
-
[5]
N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-V oss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, et al. Extracting training data from large language models.USENIX Security Symposium, 2021
work page 2021
-
[6]
C. Chen, B. Liu, M. Peng, and D. Bhatt. A survey of data poisoning attacks and defenses.ACM Computing Surveys, 2021
work page 2021
-
[7]
X. Chen, A. Salem, D. Chen, M. Backes, S. Ma, Q. Shen, Z. Wu, and Y . Zhang. BadNL: Backdoor attacks against NLP models with semantic-preserving improvements. InAnnual Computer Security Applications Conference (ACSAC), pages 554–569, 2021
work page 2021
-
[8]
Z. Cheng et al. LongMamba: Enhancing Mamba’s long-context capabilities via training-free receptive field enlargement.arXiv preprint arXiv:2501.13058, 2025
- [9]
- [10]
-
[11]
E. Goffinet et al. HiPPO Zoo: Explicit memory mechanisms for interpretable state space models.arXiv preprint, 2026
work page 2026
-
[12]
I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples.International Conference on Learning Representations (ICLR), 2015
work page 2015
-
[13]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
A. Gu and T. Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[14]
A. Gu, T. Dao, S. Ermon, A. Rudra, and C. Ré. HiPPO: Recurrent memory with optimal polynomial projections. Advances in Neural Information Processing Systems, 33:1474–1487, 2020
work page 2020
-
[15]
A. Gu, K. Goel, A. Gupta, and C. Ré. On the parameterization and initialization of diagonal state space models. Advances in Neural Information Processing Systems, 35, 2022
work page 2022
-
[16]
A. Gu, K. Goel, and C. Ré. Efficiently modeling long sequences with structured state spaces.International Conference on Learning Representations (ICLR), 2022
work page 2022
-
[17]
T. Gu, B. Dolan-Gavitt, and S. Garg. BadNets: Identifying vulnerabilities in the machine learning model supply chain.arXiv preprint arXiv:1708.06733, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[18]
C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger. On calibration of modern neural networks.International Conference on Machine Learning (ICML), 2017
work page 2017
-
[19]
C. Guo, J. S. Frank, and K. Q. Weinberger. Low-frequency adversarial perturbation. InProceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI), 2019
work page 2019
- [20]
-
[21]
RULER: What's the Real Context Size of Your Long-Context Language Models?
C.-P. Hsieh, S. Sun, S. Kriman, S. Agrawal, D. Rekesh, F. Fu, et al. RULER: What’s the real context size of your long-context language models?arXiv preprint arXiv:2404.06654, 2024. 24 Safety, Security, and Cognitive Risks in State-Space Models
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[22]
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
E. Hubinger, C. Denison, J. Mu, M. Lambert, M. Tong, M. MacDiarmid, T. Lanham, D. M. Ziegler, T. Maxwell, N. Cheng, et al. Sleeper agents: Training deceptive LLMs that persist through safety training.arXiv preprint arXiv:2401.05566, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[23]
Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y . Xu, E. Ishii, Y . J. Bang, A. Madotto, and P. Fung. Survey of hallucination in natural language generation.ACM Computing Surveys, 55(12), 2023
work page 2023
- [24]
-
[25]
Jamba: A Hybrid Transformer-Mamba Language Model
O. Lieber, B. Lenz, H. Bata, G. Cohen, J. Osin, I. Dalmedigos, E. Safahi, S. Meirom, Y . Belinkov, S. Shalev- Shwartz, et al. Jamba: A hybrid transformer-Mamba language model.arXiv preprint arXiv:2403.19887, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [26]
-
[27]
J. Mattern, F. Mireshghallah, Z. Jin, B. Schölkopf, M. Sachan, and T. Berg-Kirkpatrick. Membership inference attacks against language models via neighbourhood comparison.Association for Computational Linguistics: Findings (ACL), 2023
work page 2023
-
[28]
MITRE ATLAS: Adversarial threat landscape for artificial-intelligence systems
MITRE Corporation. MITRE ATLAS: Adversarial threat landscape for artificial-intelligence systems. Technical report, MITRE Corporation, 2023
work page 2023
- [29]
- [30]
-
[31]
J. Park et al. Numerical analysis of the HiPPO-LegS ODE for deep state space models.arXiv preprint arXiv:2410.00009, 2024
-
[32]
M. Parmar. Safety, security, and cognitive risks in neuro-symbolic AI: Attacks on the semantic integration vector. arXiv preprint, 2026
work page 2026
-
[33]
S. Passi and M. V orvoreanu. Overreliance on AI: Literature review.Microsoft Research Technical Report, 2022
work page 2022
- [34]
- [35]
- [36]
-
[37]
Y . Tay, M. Dehghani, S. Abnar, Y . Shen, D. Bahri, P. Pham, J. Rao, L. Yang, S. Ruder, and D. Metzler. Long range arena: A benchmark for efficient transformers.International Conference on Learning Representations (ICLR), 2021
work page 2021
- [38]
-
[39]
H.-D. Tran, N. Pal, D. N. Lopez, W. Gu, et al. Verification of recurrent neural networks with star-based reachability analysis.ACM Transactions on Embedded Computing Systems, 2023. 25 Safety, Security, and Cognitive Risks in State-Space Models
work page 2023
-
[40]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need.Advances in Neural Information Processing Systems, 30:5998–6008, 2017
work page 2017
-
[41]
A. V oelker, I. Kajic, and C. Eliasmith. Legendre memory units: Continuous-time representation in recurrent neural networks.Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[42]
R. Waleffe, W. Byeon, D. Riach, B. Norick, V . Korthikanti, T. Dao, A. Gu, A. Hatamizadeh, S. Singh, D. Narayanan, et al. An empirical study of mamba-based language models.arXiv preprint arXiv:2406.07887, 2024
-
[43]
E. Wallace, S. Feng, N. Kandpal, M. Gardner, and S. Singh. Universal adversarial triggers for attacking and analyzing NLP.Empirical Methods in Natural Language Processing (EMNLP), 2019
work page 2019
-
[44]
Ethical and social risks of harm from Language Models
L. Weidinger, J. Mellor, M. Rauh, C. Griffin, J. Uesato, P.-S. Huang, M. Cheng, M. Glaese, B. Balle, A. Kasirzadeh, et al. Ethical and social risks of harm from language models.arXiv preprint arXiv:2112.04359, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[45]
S. Yadlowsky et al. DeciMamba: Exploring the length extrapolation potential of Mamba.arXiv preprint arXiv:2406.14528, 2024
-
[46]
Y . Yao, H. Li, H. Zheng, and B. Y . Zhao. Latent backdoor attacks on deep neural networks. InProceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 2041–2055, 2019
work page 2041
-
[47]
D. Yin, B. Lakshminarayanan, J. Sohl-Dickstein, and G. Goh. A fourier perspective on model robustness in computer vision. 2019
work page 2019
-
[48]
A. Yu, S. Massaroli, S. Jegelka, C. D. Manning, and T. B. Hashimoto. Robustifying state-space models for long sequences via approximate diagonalization.International Conference on Learning Representations (ICLR), 2024
work page 2024
- [49]
-
[50]
K. Zhou, J. C. Doyle, and K. Glover.Robust and Optimal Control. Prentice Hall, 1996. Broader Impact The deployment of SSMs in safety-critical domains—clinical genomics, patient monitoring, cybersecurity operations, legal document analysis—is accelerating, driven primarily by efficiency advantages rather than safety assurance. This paper identifies concret...
work page 1996
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.