Hungry hungry hippos: Towards language modeling with state space models

Daniel Y Fu, Tri Dao, Khaled K Saab, Armin W Thomas, Atri Rudra, Christopher Ré · 2022 · arXiv 2212.14052

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

representative citing papers

TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.

Structured Recurrent Mixers for Massively Parallelized Sequence Generation

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

Structured Recurrent Mixers enable algebraic switching between parallel training and recurrent inference representations, delivering higher efficiency, information capacity, and throughput than other linear-complexity models.

ZAYA1-8B Technical Report

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

The Impossibility Triangle of Long-Context Modeling

cs.CL · 2026-05-06 · unverdicted · novelty 6.0

No model can achieve efficiency, compactness, and recall capacity scaling with sequence length at once, as any two imply a strict bound of O(poly(d)/log V) on recallable facts.

CLIMB: Controllable Longitudinal Brain Image Generation using Mamba-based Latent Diffusion Model and Gaussian-aligned Autoencoder

cs.CV · 2026-04-17 · unverdicted · novelty 6.0

CLIMB generates controllable longitudinal brain MRI images from baseline scans using a Mamba-based latent diffusion model and Gaussian-aligned autoencoder, reporting SSIM 0.9433 on the ADNI dataset of 6306 scans.

Retentive Network: A Successor to Transformer for Large Language Models

cs.CL · 2023-07-17 · unverdicted · novelty 6.0

RetNet is a new sequence modeling architecture that delivers parallel training, constant-time inference, and competitive language modeling performance as a potential replacement for Transformers.

Kaczmarz Linear Attention

cs.LG · 2026-05-09 · unverdicted · novelty 5.0

Kaczmarz Linear Attention replaces the empirical coefficient in Gated DeltaNet with a key-norm-normalized step size derived from the online regression objective, yielding lower perplexity and better needle-in-haystack performance.

mHC-SSM: Manifold-Constrained Hyper-Connections for State Space Language Models with Stream-Specialized Adapters

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

Manifold-constrained multi-stream mixing plus per-stream adapters improves SSM language model validation loss from 6.3507 to 6.1353 and perplexity from 572.91 to 461.88 on WikiText-2.

Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models

cs.LG · 2026-04-24 · unverdicted · novelty 5.0

Toeplitz MLP Mixers replace attention with masked Toeplitz multiplications for sub-quadratic complexity while retaining more sequence information and outperforming on copying and in-context tasks.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Structured Recurrent Mixers for Massively Parallelized Sequence Generation cs.CL · 2026-05-09 · unverdicted · none · ref 62
Structured Recurrent Mixers enable algebraic switching between parallel training and recurrent inference representations, delivering higher efficiency, information capacity, and throughput than other linear-complexity models.
The Impossibility Triangle of Long-Context Modeling cs.CL · 2026-05-06 · unverdicted · none · ref 11
No model can achieve efficiency, compactness, and recall capacity scaling with sequence length at once, as any two imply a strict bound of O(poly(d)/log V) on recallable facts.
Retentive Network: A Successor to Transformer for Large Language Models cs.CL · 2023-07-17 · unverdicted · none · ref 5
RetNet is a new sequence modeling architecture that delivers parallel training, constant-time inference, and competitive language modeling performance as a potential replacement for Transformers.

Hungry hungry hippos: Towards language modeling with state space models

fields

years

verdicts

representative citing papers

citing papers explorer