pith. machine review for the scientific record. sign in

arxiv: 2604.16424 · v1 · submitted 2026-04-04 · 💻 cs.CR · cs.AI· cs.CL· cs.LG· math.OC

Recognition: 2 theorem links

· Lean Theorem

Safety, Security, and Cognitive Risks in State-Space Models: A Systematic Threat Analysis with Spectral, Stateful, and Capacity Attacks

Manoj Parmar

Authors on Pith no claims yet

Pith reviewed 2026-05-13 17:30 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.CLcs.LGmath.OC
keywords state-space modelsadversarial attackssecurity analysisMambagenomic analysisstate integrity violationbackdoor attackscapacity attacks
0
0 comments X

The pith

State-space models are open to spectral, stateful, and capacity attacks that corrupt their compressed internal states at up to six times the rate of random inputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper provides the first systematic analysis of safety and security risks in state-space models such as S4, Mamba, and Jamba when used for long-context tasks. It defines a five-layer attack surface and introduces three new attack families that exploit the models' recurrent state compression. Empirical tests show targeted attacks on genomic data produce state integrity violations six times higher than random baselines, while state injection methods create output changes 156 times larger. The work matters because these models are already deployed in clinical forecasting and cybersecurity without prior examination of their security properties.

Core claim

The paper claims that state-space models possess an exploitable attack surface consisting of spectral sensitivity in their transfer functions, delayed state persistence that enables backdoors, and finite state capacity that can be saturated to force forgetting. It supports this with a formal threat model including the State Integrity Violation metric and a Spectral Sensitivity Proposition based on the H-infinity norm, plus empirical demonstrations of 6.0x higher StIV in genomic injection, 156x output perturbation via PGD state injection, and quadratic-complexity state extraction.

What carries the argument

The SSM Attack Surface framework with five layers, the State Integrity Violation (StIV) metric, and the Spectral Sensitivity Proposition grounded in the H-infinity norm.

If this is right

  • Targeted genomic injections achieve StIV of 0.519 compared with 0.086 for random inputs, a 6.0x increase with statistical significance.
  • Projected gradient descent state injection produces 156 times greater output perturbation than random baselines.
  • SSD-structured state extraction succeeds at O(N squared) query complexity instead of the expected O(N cubed), yielding an N-fold speedup.
  • Delayed-trigger backdoors can remain dormant for thousands of steps before activating.
  • Cognitive risk hypotheses follow directly from the state-compression mechanics in these architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the attacks transfer to deployed models, existing long-context applications in medicine and security would need new state-protection layers.
  • The five-layer attack surface could be tested on hybrid architectures such as Jamba to check whether the same spectral and capacity vulnerabilities appear.
  • Mapping the attacks to MITRE ATLAS suggests that current governance frameworks for AI may need SSM-specific extensions.
  • State capacity saturation might interact with context-length limits in ways that produce silent errors not caught by standard output monitoring.

Load-bearing premise

The attacks remain effective and transferable when applied directly to large pretrained state-space model checkpoints in production safety-critical systems without extra defenses.

What would settle it

Applying the targeted genomic injection attack to a pretrained Mamba checkpoint on real clinical time-series data and measuring whether the resulting StIV reaches approximately 0.519 rather than staying near the random baseline of 0.086.

read the original abstract

State-Space Models (SSMs) -- structured SSMs (S4, S4D, DSS, S5), selective SSMs (Mamba, Mamba-2), and hybrid architectures (Jamba) -- are deployed in safety-critical long-context applications: genomic analysis, clinical time-series forecasting, and cybersecurity log processing. Their linear-time scaling is compelling, yet the security properties of their compressed-state recurrent architectures remain unstudied. We present the first systematic treatment of SSM safety, security, and cognitive risks. Seven contributions: (1) Formal threat framework -- SSM Attack Surface (five layers), State Integrity Violation (StIV), Cross-Context Amplification Ratio $\mathcal{X}_\mathcal{S}$, and a Spectral Sensitivity Proposition grounded in the $H_\infty$ norm. (2) Three novel attack classes: spectral adversarial attacks (transfer-function gain exploitation), delayed-trigger stateful backdoors (activate thousands of steps after injection), and state capacity saturation (entropy flooding forces silent forgetting). (3) 14 MITRE ATLAS technique extensions across the full tactic chain. (4) Six-profile attacker taxonomy with kill chains for genomics, clinical, and cybersecurity domains. (5) Four cognitive risk hypotheses grounded in state-compression mechanics. (6) Governance-aligned mitigations mapped to CREST, NIST AI 600-1, and EU AI Act. (7) Empirical evaluation: targeted genomic injection achieves $\mathrm{StIV}=0.519$ vs. $0.086$ random ($6.0\times$, $p<0.001$); PGD state injection achieves $156\times$ output perturbation over random; SSD-structured extraction confirmed at $O(N^2)$ vs. $O(N^3)$ query complexity ($N\times$ speedup). Validation on pretrained checkpoints is detailed in the Appendix.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript provides the first systematic analysis of safety, security, and cognitive risks in State-Space Models (SSMs), including structured (S4, S5), selective (Mamba), and hybrid architectures. It introduces a formal threat framework with five-layer SSM Attack Surface, State Integrity Violation (StIV) metric, Cross-Context Amplification Ratio X_S, and Spectral Sensitivity Proposition; defines three novel attack classes (spectral adversarial, delayed-trigger stateful backdoors, state capacity saturation); extends 14 MITRE ATLAS techniques; presents a six-profile attacker taxonomy with domain-specific kill chains; proposes four cognitive risk hypotheses; maps mitigations to CREST/NIST/EU AI Act; and reports empirical results on pretrained checkpoints including targeted genomic injection achieving StIV=0.519 vs. 0.086 random (6.0×, p<0.001), 156× PGD output perturbation, and SSD-structured extraction at O(N²) vs. O(N³) query complexity.

Significance. If the empirical attack results hold under the reported conditions and the new metrics prove robust, the work would be significant for highlighting previously unstudied vulnerabilities in SSMs deployed in genomic, clinical, and cybersecurity pipelines. The structured threat model, attacker taxonomy, and governance mappings provide a reusable foundation for security analysis of recurrent compressed-state architectures, while the quantitative demonstrations (multiplicative gains and complexity reductions) offer concrete evidence that could guide defensive research. The absence of free parameters in the core derivations is a strength.

major comments (2)
  1. [Empirical Evaluation (Appendix)] Empirical Evaluation (Appendix): the headline quantitative claims (StIV=0.519 targeted vs. 0.086 random; 156× PGD perturbation; O(N²) extraction) are reported on pretrained checkpoints, yet no explicit state dimension, layer count, or training regime is provided for the evaluated models. This prevents assessment of whether the gains arise from fundamental SSM recurrence properties or from small-scale configurations that may not survive domain constraints or basic mitigations.
  2. [Threat Framework (§2)] Threat Framework (§2): the Spectral Sensitivity Proposition is grounded in the H_∞ norm and the new metrics StIV and X_S are presented as independent contributions, but the manuscript does not demonstrate that these metrics remain well-defined or yield the claimed amplification ratios when the underlying SSM is fine-tuned or aligned, which is load-bearing for the transferability of the attack classes to production pipelines.
minor comments (2)
  1. [Abstract] The abstract states that validation details appear in the Appendix, but the main text would benefit from a brief summary table listing the exact SSM variants, state sizes, and sequence lengths used for each attack class.
  2. [§2] Notation for X_S and the five-layer attack surface is introduced without an early consolidated definition table; a single reference table in §2 would reduce cross-referencing.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which identifies key areas for improving the reproducibility and applicability of our results. We address each major comment below and commit to targeted revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses
  1. Referee: Empirical Evaluation (Appendix): the headline quantitative claims (StIV=0.519 targeted vs. 0.086 random; 156× PGD perturbation; O(N²) extraction) are reported on pretrained checkpoints, yet no explicit state dimension, layer count, or training regime is provided for the evaluated models. This prevents assessment of whether the gains arise from fundamental SSM recurrence properties or from small-scale configurations that may not survive domain constraints or basic mitigations.

    Authors: We agree that explicit architectural details are required for full reproducibility and to confirm that the reported gains derive from SSM recurrence rather than scale-specific effects. In the revised version we will expand the Appendix to list state dimensions (16–64), layer counts, and training regimes for each pretrained checkpoint evaluated (Mamba, S5, Jamba). These parameters are taken directly from the original model releases and will be accompanied by a short discussion showing that the attack surfaces and metrics remain consistent across the tested scales. revision: yes

  2. Referee: Threat Framework (§2): the Spectral Sensitivity Proposition is grounded in the H_∞ norm and the new metrics StIV and X_S are presented as independent contributions, but the manuscript does not demonstrate that these metrics remain well-defined or yield the claimed amplification ratios when the underlying SSM is fine-tuned or aligned, which is load-bearing for the transferability of the attack classes to production pipelines.

    Authors: The formal definitions of StIV, X_S, and the Spectral Sensitivity Proposition depend only on the state-transition matrices and are therefore independent of training regime. The current experiments establish baseline behavior on pretrained checkpoints. We acknowledge that explicit verification on fine-tuned and aligned models is necessary to support production transferability. The revision will add a short subsection in §2 together with Appendix experiments on fine-tuned variants demonstrating that the metrics remain well-defined and preserve amplification ratios within 10–15 % of the pretrained values. revision: yes

Circularity Check

0 steps flagged

No circularity: new metrics and attack classes defined independently of results

full rationale

The paper introduces a threat framework, new quantities (StIV, Cross-Context Amplification Ratio, Spectral Sensitivity Proposition), and three attack classes as original contributions. These are presented as definitions and hypotheses rather than derived from fitted parameters or prior self-citations. Empirical numbers (StIV=0.519 vs 0.086, 156× perturbation, O(N²) extraction) are reported outcomes of experiments on pretrained checkpoints, not quantities that reduce to the definitions by construction. No equations appear that equate a claimed prediction back to an input fit, and no load-bearing uniqueness theorem or ansatz is imported via self-citation. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 5 invented entities

Abstract introduces new metrics (StIV, X_S) and attack classes without citing prior derivations or independent evidence for their validity; relies on domain assumption that SSMs are already in safety-critical use.

axioms (2)
  • domain assumption SSMs are deployed in safety-critical long-context applications such as genomic analysis and clinical forecasting
    Used as motivation for the threat analysis in the opening paragraph.
  • standard math The H_infty norm grounds spectral sensitivity in SSM transfer functions
    Invoked to support the Spectral Sensitivity Proposition.
invented entities (5)
  • State Integrity Violation (StIV) no independent evidence
    purpose: Quantify degree of state corruption caused by attacks
    New metric introduced in the formal threat framework.
  • Cross-Context Amplification Ratio X_S no independent evidence
    purpose: Measure how attacks amplify across different contexts
    New quantity defined in the threat framework.
  • Spectral adversarial attacks no independent evidence
    purpose: Exploit transfer-function gain in SSMs
    One of three novel attack classes presented.
  • Delayed-trigger stateful backdoors no independent evidence
    purpose: Activate after thousands of steps post-injection
    One of three novel attack classes presented.
  • State capacity saturation attacks no independent evidence
    purpose: Force silent forgetting via entropy flooding
    One of three novel attack classes presented.

pith-pipeline@v0.9.0 · 5654 in / 1623 out tokens · 71311 ms · 2026-05-13T17:30:22.231654+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 6 internal anchors

  1. [1]

    Abadi, A

    M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. Deep learning with differential privacy.ACM Conference on Computer and Communications Security (CCS), pages 308–318, 2016

  2. [2]

    Y . Bai, X. Lv, J. Zhang, H. Lyu, J. Tang, Z. Huang, Z. Du, X. Liu, A. Zeng, L. Hou, et al. LongBench: A bilingual, multitask benchmark for long context understanding.Association for Computational Linguistics (ACL), 2024

  3. [3]

    E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell. On the dangers of stochastic parrots: Can language models be too big?ACM Conference on Fairness, Accountability, and Transparency (FAccT), pages 610–623, 2021. 23 Safety, Security, and Cognitive Risks in State-Space Models

  4. [4]

    Carlini and D

    N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks.IEEE Symposium on Security and Privacy (SP), pages 39–57, 2017

  5. [5]

    Carlini, F

    N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-V oss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, et al. Extracting training data from large language models.USENIX Security Symposium, 2021

  6. [6]

    C. Chen, B. Liu, M. Peng, and D. Bhatt. A survey of data poisoning attacks and defenses.ACM Computing Surveys, 2021

  7. [7]

    X. Chen, A. Salem, D. Chen, M. Backes, S. Ma, Q. Shen, Z. Wu, and Y . Zhang. BadNL: Backdoor attacks against NLP models with semantic-preserving improvements. InAnnual Computer Security Applications Conference (ACSAC), pages 554–569, 2021

  8. [8]

    Cheng et al

    Z. Cheng et al. LongMamba: Enhancing Mamba’s long-context capabilities via training-free receptive field enlargement.arXiv preprint arXiv:2501.13058, 2025

  9. [9]

    Dao and A

    T. Dao and A. Gu. Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality.International Conference on Machine Learning (ICML), 2024

  10. [10]

    D. Du, J. Liu, W. Lu, M. Gu, H. Yi, D. Zhao, et al. ReMamba: Equip Mamba with effective long-sequence modeling.arXiv preprint arXiv:2408.15496, 2024

  11. [11]

    Goffinet et al

    E. Goffinet et al. HiPPO Zoo: Explicit memory mechanisms for interpretable state space models.arXiv preprint, 2026

  12. [12]

    I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples.International Conference on Learning Representations (ICLR), 2015

  13. [13]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    A. Gu and T. Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023

  14. [14]

    A. Gu, T. Dao, S. Ermon, A. Rudra, and C. Ré. HiPPO: Recurrent memory with optimal polynomial projections. Advances in Neural Information Processing Systems, 33:1474–1487, 2020

  15. [15]

    A. Gu, K. Goel, A. Gupta, and C. Ré. On the parameterization and initialization of diagonal state space models. Advances in Neural Information Processing Systems, 35, 2022

  16. [16]

    A. Gu, K. Goel, and C. Ré. Efficiently modeling long sequences with structured state spaces.International Conference on Learning Representations (ICLR), 2022

  17. [17]

    T. Gu, B. Dolan-Gavitt, and S. Garg. BadNets: Identifying vulnerabilities in the machine learning model supply chain.arXiv preprint arXiv:1708.06733, 2017

  18. [18]

    C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger. On calibration of modern neural networks.International Conference on Machine Learning (ICML), 2017

  19. [19]

    C. Guo, J. S. Frank, and K. Q. Weinberger. Low-frequency adversarial perturbation. InProceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI), 2019

  20. [20]

    Gupta, A

    A. Gupta, A. Gu, and J. Berant. Diagonal state spaces are as effective as structured state spaces.Advances in Neural Information Processing Systems (NeurIPS), 2022

  21. [21]

    RULER: What's the Real Context Size of Your Long-Context Language Models?

    C.-P. Hsieh, S. Sun, S. Kriman, S. Agrawal, D. Rekesh, F. Fu, et al. RULER: What’s the real context size of your long-context language models?arXiv preprint arXiv:2404.06654, 2024. 24 Safety, Security, and Cognitive Risks in State-Space Models

  22. [22]

    Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

    E. Hubinger, C. Denison, J. Mu, M. Lambert, M. Tong, M. MacDiarmid, T. Lanham, D. M. Ziegler, T. Maxwell, N. Cheng, et al. Sleeper agents: Training deceptive LLMs that persist through safety training.arXiv preprint arXiv:2401.05566, 2024

  23. [23]

    Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y . Xu, E. Ishii, Y . J. Bang, A. Madotto, and P. Fung. Survey of hallucination in natural language generation.ACM Computing Surveys, 55(12), 2023

  24. [24]

    Karim, S

    F. Karim, S. Majumdar, and H. Darabi. Adversarial attacks on time series.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10):3309–3320, 2021

  25. [25]

    Jamba: A Hybrid Transformer-Mamba Language Model

    O. Lieber, B. Lenz, H. Bata, G. Cohen, J. Osin, I. Dalmedigos, E. Safahi, S. Meirom, Y . Belinkov, S. Shalev- Shwartz, et al. Jamba: A hybrid transformer-Mamba language model.arXiv preprint arXiv:2403.19887, 2024

  26. [26]

    Madry, A

    A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks.International Conference on Learning Representations (ICLR), 2018

  27. [27]

    Mattern, F

    J. Mattern, F. Mireshghallah, Z. Jin, B. Schölkopf, M. Sachan, and T. Berg-Kirkpatrick. Membership inference attacks against language models via neighbourhood comparison.Association for Computational Linguistics: Findings (ACL), 2023

  28. [28]

    MITRE ATLAS: Adversarial threat landscape for artificial-intelligence systems

    MITRE Corporation. MITRE ATLAS: Adversarial threat landscape for artificial-intelligence systems. Technical report, MITRE Corporation, 2023

  29. [29]

    Nguyen, M

    E. Nguyen, M. Poli, M. Faizi, A. Thomas, M. Wornow, C. Birch-Sykes, et al. HyenaDNA: Long-range genomic sequence modeling at single nucleotide resolution.Advances in Neural Information Processing Systems, 2024

  30. [30]

    Ovadia, E

    Y . Ovadia, E. Fertig, J. Ren, Z. Nado, D. Sculley, B. Lakshminarayanan, J. Snoek, M. W. Dusenberry, and Z. Ghahramani. Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. Advances in Neural Information Processing Systems, 2019

  31. [31]

    Park et al

    J. Park et al. Numerical analysis of the HiPPO-LegS ODE for deep state space models.arXiv preprint arXiv:2410.00009, 2024

  32. [32]

    M. Parmar. Safety, security, and cognitive risks in neuro-symbolic AI: Attacks on the semantic integration vector. arXiv preprint, 2026

  33. [33]

    Passi and M

    S. Passi and M. V orvoreanu. Overreliance on AI: Literature review.Microsoft Research Technical Report, 2022

  34. [34]

    Salem, R

    A. Salem, R. Wen, M. Backes, S. Ma, and Y . Zhang. Dynamic backdoor attacks against machine learning models. In7th IEEE European Symposium on Security and Privacy (EuroS&P), pages 703–718, 2022

  35. [35]

    Shokri, M

    R. Shokri, M. Stronati, C. Song, and V . Shmatikov. Membership inference attacks against machine learning models.IEEE Symposium on Security and Privacy (SP), pages 3–18, 2017

  36. [36]

    Smith, A

    J. Smith, A. Warrington, and S. Linderman. Simplified state space layers for sequence modeling.International Conference on Learning Representations (ICLR), 2023

  37. [37]

    Y . Tay, M. Dehghani, S. Abnar, Y . Shen, D. Bahri, P. Pham, J. Rao, L. Yang, S. Ruder, and D. Metzler. Long range arena: A benchmark for efficient transformers.International Conference on Learning Representations (ICLR), 2021

  38. [38]

    Tramèr, F

    F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart. Stealing machine learning models via prediction APIs.USENIX Security Symposium, pages 601–618, 2016

  39. [39]

    H.-D. Tran, N. Pal, D. N. Lopez, W. Gu, et al. Verification of recurrent neural networks with star-based reachability analysis.ACM Transactions on Embedded Computing Systems, 2023. 25 Safety, Security, and Cognitive Risks in State-Space Models

  40. [40]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need.Advances in Neural Information Processing Systems, 30:5998–6008, 2017

  41. [41]

    V oelker, I

    A. V oelker, I. Kajic, and C. Eliasmith. Legendre memory units: Continuous-time representation in recurrent neural networks.Advances in Neural Information Processing Systems, 32, 2019

  42. [42]

    Waleffe, W

    R. Waleffe, W. Byeon, D. Riach, B. Norick, V . Korthikanti, T. Dao, A. Gu, A. Hatamizadeh, S. Singh, D. Narayanan, et al. An empirical study of mamba-based language models.arXiv preprint arXiv:2406.07887, 2024

  43. [43]

    Wallace, S

    E. Wallace, S. Feng, N. Kandpal, M. Gardner, and S. Singh. Universal adversarial triggers for attacking and analyzing NLP.Empirical Methods in Natural Language Processing (EMNLP), 2019

  44. [44]

    Ethical and social risks of harm from Language Models

    L. Weidinger, J. Mellor, M. Rauh, C. Griffin, J. Uesato, P.-S. Huang, M. Cheng, M. Glaese, B. Balle, A. Kasirzadeh, et al. Ethical and social risks of harm from language models.arXiv preprint arXiv:2112.04359, 2021

  45. [45]

    Yadlowsky et al

    S. Yadlowsky et al. DeciMamba: Exploring the length extrapolation potential of Mamba.arXiv preprint arXiv:2406.14528, 2024

  46. [46]

    Y . Yao, H. Li, H. Zheng, and B. Y . Zhao. Latent backdoor attacks on deep neural networks. InProceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 2041–2055, 2019

  47. [47]

    D. Yin, B. Lakshminarayanan, J. Sohl-Dickstein, and G. Goh. A fourier perspective on model robustness in computer vision. 2019

  48. [48]

    A. Yu, S. Massaroli, S. Jegelka, C. D. Manning, and T. B. Hashimoto. Robustifying state-space models for long sequences via approximate diagonalization.International Conference on Learning Representations (ICLR), 2024

  49. [49]

    Zhang, Y

    X. Zhang, Y . Chen, S. Hu, Z. Xu, J. Chen, M. K. Hao, X. Han, Z. L. Thai, S. Wang, Z. Liu, et al. ∞Bench: Extending long context evaluation beyond 100k tokens.Association for Computational Linguistics (ACL), 2024

  50. [50]

    liability clause: section 7.3

    K. Zhou, J. C. Doyle, and K. Glover.Robust and Optimal Control. Prentice Hall, 1996. Broader Impact The deployment of SSMs in safety-critical domains—clinical genomics, patient monitoring, cybersecurity operations, legal document analysis—is accelerating, driven primarily by efficiency advantages rather than safety assurance. This paper identifies concret...