pith. sign in

arxiv: 2408.01129 · v8 · submitted 2024-08-02 · 💻 cs.LG · cs.AI

A Survey of Mamba

Pith reviewed 2026-05-23 22:06 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords MambaState Space ModelsTransformersFoundation ModelsSequence ModelingDeep LearningScalabilityAttention Mechanisms
0
0 comments X

The pith

Mamba matches Transformers with near-linear sequence scaling

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys the rapid development of Mamba-based models as an alternative to Transformers in deep learning. It reviews the basics of Mamba-1 and Mamba-2, then examines architecture designs, methods to adapt Mamba to different data types, and its applications in various domains. The survey also identifies limitations and outlines future research directions. A reader would care because it organizes the growing literature on this architecture that promises to overcome the quadratic complexity issue in attention mechanisms. This consolidation helps understand Mamba's potential for building more efficient foundation models.

Core claim

Mamba, drawing inspiration from classical state space models, has emerged as a promising alternative for building foundation models, delivering comparable modeling abilities to Transformers while preserving near-linear scalability concerning sequence length, as shown by the increasing number of studies achieving impressive performance across diverse domains.

What carries the argument

The selective state space model mechanism in Mamba that enables efficient sequence processing with linear complexity in length.

If this is right

  • Mamba models can achieve better efficiency in inference for long sequences compared to Transformers.
  • Adaptation techniques allow Mamba to excel in non-text data such as images and audio.
  • Applications in multiple domains demonstrate Mamba's versatility beyond language modeling.
  • The identified limitations suggest specific areas for architectural improvements in future Mamba variants.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Exploring combinations of Mamba with other architectures could yield hybrid models with enhanced capabilities.
  • The survey's overview may inspire theoretical analyses of why state space models perform well in practice.
  • Future surveys could track Mamba's progress beyond August 2024 to update the understanding of its potential.
  • Developers might prioritize Mamba for resource-constrained environments handling long contexts.

Load-bearing premise

The body of Mamba-related papers published by August 2024 is already large and representative enough for a systematic consolidation to provide a comprehensive understanding of the architecture's potential.

What would settle it

Demonstration through large-scale experiments that Mamba fails to match Transformer performance on standard benchmarks or exhibits worse scaling properties would undermine the survey's central narrative.

Figures

Figures reproduced from arXiv: 2408.01129 by Haohao Qu, Hui Liu, Liangbo Ning, Qing Li, Rui An, Tyler Derr, Wenqi Fan, Xin Xu.

Figure 1
Figure 1. Figure 1: Examples of the applications of Mamba-based models for different downstream tasks. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An illustration of representative model architectures, namely Recurrent Neural Network (RNN), Transformer, and State Space [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the Selective State Space Model with hardware-aware state expansions. The selective mechanism introduces [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The block architectures of Mamba-1 and Mamba-2. [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Representative examples of improved Mamba models based on the perspective of block design: (a) Integration methods [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Recently developed scanning methods in Mamba-based models: Flatten Scans (a-c) involve flattening the model input into [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Representative strategies exist for adapting Mamba to diverse types of data. (a-e) The Mamba architecture, imbued with [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
read the original abstract

As one of the most representative DL techniques, Transformer architecture has empowered numerous advanced models, especially the large language models (LLMs) that comprise billions of parameters, becoming a cornerstone in deep learning. Despite the impressive achievements, Transformers still face inherent limitations, particularly the time-consuming inference resulting from the quadratic computation complexity of attention calculation. Recently, a novel architecture named Mamba, drawing inspiration from classical state space models (SSMs), has emerged as a promising alternative for building foundation models, delivering comparable modeling abilities to Transformers while preserving near-linear scalability concerning sequence length. This has sparked an increasing number of studies actively exploring Mamba's potential to achieve impressive performance across diverse domains. Given such rapid evolution, there is a critical need for a systematic review that consolidates existing Mamba-empowered models, offering a comprehensive understanding of this emerging model architecture. In this survey, we therefore conduct an in-depth investigation of recent Mamba-associated studies, covering three main aspects: the advancements of Mamba-based models, the techniques of adapting Mamba to diverse data, and the applications where Mamba can excel. Specifically, we first review the foundational knowledge of various representative deep learning models and the details of Mamba-1&2 as preliminaries. Then, to showcase the significance of Mamba for AI, we comprehensively review the related studies focusing on Mamba models' architecture design, data adaptability, and applications. Finally, we present a discussion of current limitations and explore various promising research directions to provide deeper insights for future investigations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript is a literature survey on the Mamba architecture (inspired by state-space models) as an alternative to Transformers. It first reviews preliminaries on representative deep learning models and the details of Mamba-1 and Mamba-2, then surveys Mamba-based model architectures, techniques for adapting Mamba to diverse data modalities, and applications across domains, before discussing current limitations and promising research directions.

Significance. If the coverage is representative, the survey provides a timely consolidation of the rapidly growing Mamba literature (post-2023), which could help researchers identify patterns in architecture variants, data adaptations, and application successes. The explicit three-part structure (preliminaries, models/data/applications, limitations) and grounding in prior empirical claims about near-linear scaling are strengths for a survey in this fast-moving area.

major comments (2)
  1. [Abstract, §1] Abstract and §1: the claim that the survey conducts a 'systematic review' and 'in-depth investigation' is not supported by any description of search strategy, inclusion/exclusion criteria, or database sources; without these the representativeness of the consolidated studies cannot be assessed.
  2. [§3 (architecture/data/applications review)] The weakest assumption noted in the reader report (that the August 2024 corpus is already large and representative) is not addressed; the survey should include a quantitative summary (e.g., number of papers per category, publication timeline) to substantiate that the body of work merits consolidation.
minor comments (2)
  1. [Preliminaries section] Notation for Mamba-1 vs. Mamba-2 parameters and selective SSM equations should be introduced once in the preliminaries and used consistently thereafter to avoid reader confusion when comparing variants.
  2. [Tables/figures in §3] Figure captions and table headers listing surveyed models should include publication year and venue for quick reference; several entries currently omit these.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive recommendation for minor revision. We address the two major comments below and will update the manuscript accordingly to improve transparency and substantiation of the survey's scope.

read point-by-point responses
  1. Referee: [Abstract, §1] Abstract and §1: the claim that the survey conducts a 'systematic review' and 'in-depth investigation' is not supported by any description of search strategy, inclusion/exclusion criteria, or database sources; without these the representativeness of the consolidated studies cannot be assessed.

    Authors: We agree that the abstract and §1 would be strengthened by greater methodological transparency. The survey was compiled via ongoing literature tracking on arXiv and related venues up to the August 2024 cutoff, but no formal search protocol was described. In revision we will add a short 'Literature Search Methodology' paragraph (or subsection) in §1 that states the primary sources (arXiv, Google Scholar), core keywords (Mamba, state-space model, selective SSM, etc.), and high-level inclusion criteria (peer-reviewed or preprint works proposing Mamba variants or applications). If space constraints arise we will also soften the phrasing from 'systematic review' to 'comprehensive survey' while retaining the claim of in-depth coverage. revision: yes

  2. Referee: [§3 (architecture/data/applications review)] The weakest assumption noted in the reader report (that the August 2024 corpus is already large and representative) is not addressed; the survey should include a quantitative summary (e.g., number of papers per category, publication timeline) to substantiate that the body of work merits consolidation.

    Authors: We concur that a quantitative overview would better justify the decision to consolidate the literature. The current text notes rapid growth qualitatively but provides no counts or timeline. In the revised manuscript we will insert a new table (or figure) early in §3 that reports: (i) total papers reviewed, (ii) breakdown by the three main categories (architecture variants, modality adaptations, domain applications), and (iii) a simple publication-year histogram or cumulative count showing the post-2023 surge. This addition will directly address the representativeness concern while remaining concise. revision: yes

Circularity Check

0 steps flagged

No significant circularity: literature survey with no derivations

full rationale

This manuscript is a survey paper that consolidates existing literature on Mamba models without presenting any original derivations, predictions, fitted parameters, or modeling inferences. Its claims about Mamba's capabilities are explicitly attributed to prior publications as background rather than derived internally. No equations, self-citations, or ansatzes function as load-bearing steps that reduce to the paper's own inputs by construction. The structure is self-contained as a review, with no circularity patterns applicable.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper. It contains no original mathematical derivations, fitted parameters, or postulated entities; the content is a synthesis of prior published work on Mamba.

pith-pipeline@v0.9.0 · 5813 in / 1070 out tokens · 27149 ms · 2026-05-23T22:06:52.881466+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. mKG-RAG: Leveraging Multimodal Knowledge Graphs in Retrieval-Augmented Generation for Knowledge-intensive VQA

    cs.CV 2025-08 unverdicted novelty 7.0

    mKG-RAG constructs multimodal KGs via MLLM-driven extraction and vision-text matching then applies dual-stage query-aware retrieval to achieve new state-of-the-art results on knowledge-based VQA.

  2. DeMa: Dual-Path Delay-Aware Mamba for Efficient Multivariate Time Series Analysis

    cs.LG 2026-01 unverdicted novelty 6.0

    DeMa is a dual-path delay-aware Mamba architecture that decomposes MTS into intra-series temporal and inter-series variate paths to achieve SOTA performance with linear complexity on forecasting, imputation, anomaly d...

  3. Predicting one-year clinical instability and mortality in heart failure patients using sequence modeling

    cs.LG 2025-11 unverdicted novelty 4.0

    Sequence models on EHR data from a Swedish heart failure cohort achieve AUPRCs of 0.555 to 0.854 for one-year instability and mortality predictions and support four care pathways.

  4. When control meets large language models: From words to dynamics

    eess.SY 2026-02 unverdicted novelty 3.0

    The paper proposes a bidirectional continuum between LLMs and control systems, covering LLM-assisted controller design, control-based LLM steering, and state-space modeling of LLMs.

  5. Advancing Intelligent Sequence Modeling: Evolution, Trade-offs, and Applications of State- Space Architectures from S4 to Mamba

    cs.LG 2025-03 unverdicted

    A survey tracing the evolution of state-space models like S4 and Mamba, their efficiency trade-offs, and applications in NLP, vision, and other domains.

Reference graph

Works this paper leans on

247 extracted references · 247 canonical work pages · cited by 5 Pith papers · 10 internal anchors

  1. [1]

    Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu. 2014. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on audio, speech, and language processing 22, 10 (2014), 1533–1545

  2. [2]

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

  3. [3]

    Md Atik Ahamed and Qiang Cheng. 2024. Timemachine: A time series is worth 4 mambas for long-term forecasting.arXiv preprint arXiv:2403.09898 (2024)

  4. [4]

    Md Atik Ahamed and Qiang Cheng. 2024. TSCMamba: Mamba Meets Multi-View Learning for Time Series Classification. arXiv preprint arXiv:2406.04419 (2024)

  5. [5]

    Quentin Anthony, Yury Tokpanov, Paolo Glorioso, and Beren Millidge. 2024. BlackMamba: Mixture of Experts for State-Space Models. arXiv preprint arXiv:2402.01771 (2024)

  6. [6]

    Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, and Cordelia Schmid. 2021. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF international conference on computer vision . 6836–6846

  7. [7]

    Zhongxin Bai and Xiao-Lei Zhang. 2021. Speaker recognition based on deep learning: An overview. Neural Networks 140 (2021), 65–99

  8. [8]

    Malyaban Bal and Abhronil Sengupta. 2024. Rethinking Spiking Neural Networks as State Space Models. arXiv preprint arXiv:2406.02923 (2024)

  9. [9]

    Ali Behrouz and Farnoosh Hashemi. 2024. Graph Mamba: Towards Learning on Graphs with State Space Models. arXiv preprint arXiv:2402.08678 (2024)

  10. [10]

    Ali Behrouz, Michele Santacatterina, and Ramin Zabih. 2024. Mambamixer: Efficient selective state space models with dual token and channel selection. arXiv preprint arXiv:2403.19888 (2024)

  11. [11]

    Saurabhchand Bhati, Yuan Gong, Leonid Karlinsky, Hilde Kuehne, Rogerio Feris, and James Glass. 2024. DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners. arXiv preprint arXiv:2407.04082 (2024)

  12. [12]

    Raunaq Bhirangi, Chenyu Wang, Venkatesh Pattabiraman, Carmel Majidi, Abhinav Gupta, Tess Hellebrekers, and Lerrel Pinto. 2024. Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling. arXiv preprint arXiv:2402.10211 (2024)

  13. [13]

    Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021)

  14. [14]

    Florian Le Bronnec, Song Duong, Mathieu Ravaut, Alexandre Allauzen, Nancy F Chen, Vincent Guigue, Alberto Lumbreras, Laure Soulier, and Patrick Gallinari. 2024. LOCOST: State-Space Models for Long Document Abstractive Summarization. arXiv preprint arXiv:2401.17919 (2024)

  15. [15]

    Jiahang Cao, Qiang Zhang, Ziqing Wang, Jiaxu Wang, Hao Cheng, Yecheng Shao, Wen Zhao, Gang Han, Yijie Guo, and Renjing Xu. 2024. Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning. arXiv preprint arXiv:2406.02013 (2024)

  16. [16]

    Yang Cao and Wei Zhang. 2024. Mamba4KT: An Efficient and Effective Mamba-based Knowledge Tracing Model. arXiv preprint arXiv:2405.16542 (2024)

  17. [17]

    Rong Chao, Wen-Huang Cheng, Moreno La Quatra, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Szu-Wei Fu, and Yu Tsao. 2024. An Investigation of Incorporating Mamba for Speech Enhancement. arXiv preprint arXiv:2405.06573 (2024)

  18. [18]

    Soumyabrata Chaudhuri and Saumik Bhattacharya. 2024. Simba: Mamba augmented U-ShiftGCN for Skeletal Action Recognition in Videos. arXiv preprint arXiv:2404.07645 (2024)

  19. [19]

    Chi-Sheng Chen, Guan-Ying Chen, Dong Zhou, Di Jiang, and Dai-Shi Chen. 2024. Res-VMamba: Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning. arXiv preprint arXiv:2402.15761 (2024)

  20. [20]

    Deli Chen, Yankai Lin, Wei Li, Peng Li, Jie Zhou, and Xu Sun. 2020. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI conference on artificial intelligence , Vol. 34. 3438–3445. Manuscript submitted to ACM 32 Qu et al

  21. [21]

    Hongruixuan Chen, Jian Song, Chengxi Han, Junshi Xia, and Naoto Yokoya. 2024. Changemamba: Remote sensing change detection with spatio-temporal state space model. arXiv preprint arXiv:2404.03425 (2024)

  22. [22]

    Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. 2024. Benchmarking large language models in retrieval-augmented generation. In Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 38. 17754–17762

  23. [23]

    Keyan Chen, Bowen Chen, Chenyang Liu, Wenyuan Li, Zhengxia Zou, and Zhenwei Shi. 2024. Rsmamba: Remote sensing image classification with state space model. arXiv preprint arXiv:2403.19654 (2024)

  24. [24]

    Tianxiang Chen, Zhentao Tan, Tao Gong, Qi Chu, Yue Wu, Bin Liu, Jieping Ye, and Nenghai Yu. 2024. Mim-istd: Mamba-in-mamba for efficient infrared small target detection. arXiv preprint arXiv:2403.02148 (2024)

  25. [25]

    Xiao Chen, Wenqi Fan, Jingfan Chen, Haochen Liu, Zitao Liu, Zhaoxiang Zhang, and Qing Li. 2023. Fairly adaptive negative sampling for recommendations. In Proceedings of the ACM Web Conference 2023 . 3723–3733

  26. [26]

    Ying Chen, Jiajing Xie, Yuxiang Lin, Yuhang Song, Wenxian Yang, and Rongshan Yu. 2024. Survmamba: State space model with multi-grained multi-modal interaction for survival prediction. arXiv preprint arXiv:2404.08027 (2024)

  27. [27]

    Yujie Chen, Jiangyan Yi, Jun Xue, Chenglong Wang, Xiaohui Zhang, Shunbo Dong, Siding Zeng, Jianhua Tao, Lv Zhao, and Cunhang Fan. 2024. RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection. arXiv preprint arXiv:2406.06086 (2024)

  28. [28]

    Tri Dao and Albert Gu. 2024. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality. In International Conference on Machine Learning (ICML)

  29. [29]

    Rui Deng and Tianpei Gu. 2024. CU-Mamba: Selective State Space Models with Channel Learning for Image Restoration. arXiv preprint arXiv:2404.11778 (2024)

  30. [30]

    Yujuan Ding, Wenqi Fan, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, and Qing Li. 2024. A survey on rag meets llms: Towards retrieval-augmented large language models. arXiv preprint arXiv:2405.06211 (2024)

  31. [31]

    Rares Dolga, Kai Biegun, Jake Cunningham, and David Barber. 2024. RotRNN: Modelling Long Sequences with Rotations. arXiv preprint arXiv:2407.07239 (2024)

  32. [32]

    Wenhao Dong, Haodong Zhu, Shaohui Lin, Xiaoyan Luo, Yunhang Shen, Xuhui Liu, Juan Zhang, Guodong Guo, and Baochang Zhang. 2024. Fusion-mamba for cross-modality object detection. arXiv preprint arXiv:2404.09146 (2024)

  33. [33]

    Xin Luna Dong, Seungwhan Moon, Yifan Ethan Xu, Kshitiz Malik, and Zhou Yu. 2023. Towards next-generation intelligent assistants leveraging llm techniques. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . 5792–5793

  34. [34]

    Filip Karlo Došilović, Mario Brčić, and Nikica Hlupić. 2018. Explainable artificial intelligence: A survey. In2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO) . IEEE, 0210–0215

  35. [35]

    Haruka Ezoe and Kazuhiro Sato. 2024. Learning method for S4 with Diagonal State Space Layers using Balanced Truncation. arXiv preprint arXiv:2402.15993 (2024)

  36. [36]

    Lili Fan, Junhao Wang, Yuanmeng Chang, Yuke Li, Yutong Wang, and Dongpu Cao. 2024. 4D mmWave radar for autonomous driving perception: a comprehensive survey. IEEE Transactions on Intelligent Vehicles (2024)

  37. [37]

    Wenqi Fan, Tyler Derr, Yao Ma, Jianping Wang, Jiliang Tang, and Qing Li. 2019. Deep Adversarial Social Recommendation. In28th International Joint Conference on Artificial Intelligence (IJCAI-19) . International Joint Conferences on Artificial Intelligence, 1351–1357

  38. [38]

    Wenqi Fan, Qing Li, and Min Cheng. 2018. Deep modeling of social relations for recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32

  39. [39]

    Wenqi Fan, Xiaorui Liu, Wei Jin, Xiangyu Zhao, Jiliang Tang, and Qing Li. 2022. Graph Trend Filtering Networks for Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval . 112–121

  40. [40]

    Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. 2019. Graph neural networks for social recommendation. InThe world wide web conference . 417–426

  41. [41]

    Wenqi Fan, Yao Ma, Qing Li, Jianping Wang, Guoyong Cai, Jiliang Tang, and Dawei Yin. 2020. A graph neural network framework for social recommendations. IEEE Transactions on Knowledge and Data Engineering 34, 5 (2020), 2033–2047

  42. [42]

    Wenqi Fan, Yao Ma, Dawei Yin, Jianping Wang, Jiliang Tang, and Qing Li. 2019. Deep social collaborative filtering. In Proceedings of the 13th ACM Conference on Recommender Systems . 305–313

  43. [43]

    Wenqi Fan, Shijie Wang, Jiani Huang, Zhikai Chen, Yu Song, Wenzhuo Tang, Haitao Mao, Hui Liu, Xiaorui Liu, Dawei Yin, et al. 2024. Graph machine learning in the era of large language models (llms). arXiv preprint arXiv:2404.14928 (2024)

  44. [44]

    Wenqi Fan, Xiangyu Zhao, Qing Li, Tyler Derr, Yao Ma, Hui Liu, Jianping Wang, and Jiliang Tang. 2023. Adversarial Attacks for Black-Box Recommender Systems Via Copying Transferable Cross-Domain User Profiles. IEEE Transactions on Knowledge and Data Engineering (2023)

  45. [45]

    William Fedus, Barret Zoph, and Noam Shazeer. 2022. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research 23, 120 (2022), 1–39

  46. [46]

    Zhengcong Fei, Mingyuan Fan, Changqian Yu, and Junshi Huang. 2024. Scalable Diffusion Models with State Space Backbone. arXiv preprint arXiv:2402.05608 (2024)

  47. [47]

    Daniel Y Fu, Elliot L Epstein, Eric Nguyen, Armin W Thomas, Michael Zhang, Tri Dao, Atri Rudra, and Christopher Ré. 2023. Simple hardware- efficient long convolutions for sequence modeling. In International Conference on Machine Learning . PMLR, 10373–10391

  48. [48]

    Guanyiman Fu, Fengchao Xiong, Jianfeng Lu, and Jun Zhou. 2024. Ssumamba: Spatial-spectral selective state space model for hyperspectral image denoising. IEEE Transactions on Geoscience and Remote Sensing (2024). Manuscript submitted to ACM A Survey of Mamba 33

  49. [49]

    Linjie Fu, Xia Li, Xiuding Cai, Yingkai Wang, Xueyao Wang, Yali Shen, and Yu Yao. 2024. MD-Dose: A Diffusion Model based on the Mamba for Radiotherapy Dose Prediction. arXiv preprint arXiv:2403.08479 (2024)

  50. [50]

    Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, and Yu Qiao. 2024. Clip-adapter: Better vision- language models with feature adapters. International Journal of Computer Vision 132, 2 (2024), 581–595

  51. [51]

    Ruisheng Gao, Zeyu Xiao, and Zhiwei Xiong. 2024. Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning. arXiv preprint arXiv:2406.16083 (2024)

  52. [52]

    Yu Gao, Jiancheng Huang, Xiaopeng Sun, Zequn Jie, Yujie Zhong, and Lin Ma. 2024. Matten: Video Generation with Mamba-Attention. arXiv preprint arXiv:2405.03025 (2024)

  53. [53]

    Negar Golestani and Mahta Moghaddam. 2020. Human activity recognition using magnetic induction-based motion signals and deep recurrent neural networks. Nature communications 11, 1 (2020), 1551

  54. [54]

    Haifan Gong, Luoyao Kang, Yitao Wang, Xiang Wan, and Haofeng Li. 2024. nnmamba: 3d biomedical image segmentation, classification and landmark detection with state space model. arXiv preprint arXiv:2402.03526 (2024)

  55. [55]

    Alex Graves and Alex Graves. 2012. Long short-term memory. Supervised sequence labelling with recurrent neural networks (2012), 37–45

  56. [56]

    Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)

  57. [57]

    Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, and Christopher Ré. 2020. Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems 33 (2020), 1474–1487

  58. [58]

    Albert Gu, Karan Goel, Ankit Gupta, and Christopher Ré. 2022. On the parameterization and initialization of diagonal state space models. Advances in Neural Information Processing Systems 35 (2022), 35971–35983

  59. [59]

    Albert Gu, Karan Goel, and Christopher Ré. 2021. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396 (2021)

  60. [60]

    Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and Christopher Ré. 2021. Combining recurrent, convolutional, and continuous-time models with linear state space layers. Advances in neural information processing systems 34 (2021), 572–585

  61. [61]

    Yanchen Guan, Haicheng Liao, Zhenning Li, Jia Hu, Runze Yuan, Yunjian Li, Guohui Zhang, and Chengzhong Xu. 2024. World models for autonomous driving: An initial survey. IEEE Transactions on Intelligent Vehicles (2024)

  62. [62]

    Jeff Guo and Philippe Schwaller. 2024. Saturn: Sample-efficient Generative Molecular Design using Memory Manipulation. arXiv preprint arXiv:2405.17066 (2024)

  63. [63]

    Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. 2020. Deep learning for 3d point clouds: A survey. IEEE transactions on pattern analysis and machine intelligence 43, 12 (2020), 4338–4364

  64. [64]

    Xu Han, Yuan Tang, Zhaoxuan Wang, and Xianzhi Li. 2024. Mamba3d: Enhancing local features for 3d point cloud analysis via state space model. arXiv preprint arXiv:2404.14966 (2024)

  65. [65]

    Mark Harris, Shubhabrata Sengupta, and John D Owens. 2007. Parallel prefix sum (scan) with CUDA. GPU gems 3, 39 (2007), 851–876

  66. [66]

    Ali Hatamizadeh and Jan Kautz. 2024. MambaVision: A Hybrid Mamba-Transformer Vision Backbone. arXiv preprint arXiv:2407.08083 (2024)

  67. [67]

    Haoyang He, Yuhu Bai, Jiangning Zhang, Qingdong He, Hongxu Chen, Zhenye Gan, Chengjie Wang, Xiangtai Li, Guanzhong Tian, and Lei Xie

  68. [68]

    arXiv preprint arXiv:2404.06564 (2024)

    Mambaad: Exploring state space models for multi-class unsupervised anomaly detection. arXiv preprint arXiv:2404.06564 (2024)

  69. [69]

    Wei He, Kai Han, Yehui Tang, Chengcheng Wang, Yujie Yang, Tianyu Guo, and Yunhe Wang. 2024. Densemamba: State space models with dense hidden connection for efficient large language models. arXiv preprint arXiv:2403.00818 (2024)

  70. [70]

    Xuanhua He, Ke Cao, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, and Man Zhou. 2024. Pan-Mamba: Effective pan-sharpening with State Space Model. arXiv preprint arXiv:2402.12192 (2024)

  71. [71]

    Michiel Hermans and Benjamin Schrauwen. 2013. Training and analysing deep recurrent neural networks. Advances in neural information processing systems 26 (2013)

  72. [72]

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851

  73. [73]

    Alireza Hosseini, Amirhossein Kazerouni, Saeed Akhavan, Michael Brudno, and Babak Taati. 2024. SUM: Saliency Unification through Mamba for Visual Attention Modeling. arXiv preprint arXiv:2406.17815 (2024)

  74. [74]

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)

  75. [75]

    Hao Hu and Guo-Jun Qi. 2017. State-frequency memory recurrent neural networks. In International Conference on Machine Learning . PMLR, 1568–1577

  76. [76]

    Lijie Hu, Yixin Liu, Ninghao Liu, Mengdi Huai, Lichao Sun, and Di Wang. 2023. Seat: stable and explainable attention. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 12907–12915

  77. [77]

    Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, and Bjorn Ommer. 2024. Zigma: Zigzag mamba diffusion model. arXiv preprint arXiv:2403.13802 (2024)

  78. [78]

    Chensen Huang, Guibo Zhu, Xuepeng Wang, Yifei Luo, Guojing Ge, Haoran Chen, Dong Yi, and Jinqiao Wang. 2024. Recurrent Context Compression: Efficiently Expanding the Context Window of LLM. arXiv preprint arXiv:2406.06110 (2024)

  79. [79]

    Kexin Huang, Cao Xiao, Lucas M Glass, Marinka Zitnik, and Jimeng Sun. 2020. SkipGNN: predicting molecular interactions with skip-graph networks. Scientific reports 10, 1 (2020), 21092. Manuscript submitted to ACM 34 Qu et al

  80. [80]

    Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and J Doug Tygar. 2011. Adversarial machine learning. In Proceedings of the 4th ACM workshop on Security and artificial intelligence . 43–58

Showing first 80 references.