super hub Canonical reference

Efficiently Modeling Long Sequences with Structured State Spaces

Albert Gu, Karan Goel · 2021 · cs.LG · arXiv 2111.00396

Canonical reference. 77% of citing Pith papers cite this work as background.

141 Pith papers citing it

Background 77% of classified citations

open full Pith review browse 141 citing papers more from Albert Gu arXiv PDF

abstract

A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although conventional models including RNNs, CNNs, and Transformers have specialized variants for capturing long dependencies, they still struggle to scale to very long sequences of $10000$ or more steps. A promising recent approach proposed modeling sequences by simulating the fundamental state space model (SSM) $ x'(t) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t) $, and showed that for appropriate choices of the state matrix $ A $, this system could handle long-range dependencies mathematically and empirically. However, this method has prohibitive computation and memory requirements, rendering it infeasible as a general sequence modeling solution. We propose the Structured State Space sequence model (S4) based on a new parameterization for the SSM, and show that it can be computed much more efficiently than prior approaches while preserving their theoretical strengths. Our technique involves conditioning $ A $ with a low-rank correction, allowing it to be diagonalized stably and reducing the SSM to the well-studied computation of a Cauchy kernel. S4 achieves strong empirical results across a diverse range of established benchmarks, including (i) 91\% accuracy on sequential CIFAR-10 with no data augmentation or auxiliary losses, on par with a larger 2-D ResNet, (ii) substantially closing the gap to Transformers on image and language modeling tasks, while performing generation $60\times$ faster (iii) SoTA on every task from the Long Range Arena benchmark, including solving the challenging Path-X task of length 16k that all prior work fails on, while being as efficient as all competitors.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 20 method 5 baseline 1

citation-polarity summary

background 20 use method 5 baseline 1

claims ledger

abstract A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although conventional models including RNNs, CNNs, and Transformers have specialized variants for capturing long dependencies, they still struggle to scale to very long sequences of $10000$ or more steps. A promising recent approach proposed modeling sequences by simulating the fundamental state space model (SSM) $ x'(t) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t) $, and showed that for appropriate choices of the

authors

Albert Gu Christopher R\'e Karan Goel

co-cited works

representative citing papers

Rotation Equivariant Mamba for Vision Tasks

cs.CV · 2026-03-10 · unverdicted · novelty 8.0

EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.

Test-Time Training with KV Binding Is Secretly Linear Attention

cs.LG · 2026-02-24 · conditional · novelty 8.0

Test-time training with KV binding reduces to learned linear attention.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

MASS: Motion-Aligned Selective Scan for Refinement in Flow-Based Video Frame Interpolation

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

MASS reformulates SSM-based feature scanning in flow-based VFI to follow dynamic motion trajectories via learnable path integration and velocity-aware sampling, claiming SOTA on challenging large-displacement cases.

Between Amnesia and Chaos: A Memory Stability Expressivity Trilemma for Trainable Dissipative Oscillator Networks

cs.LG · 2026-06-07 · unverdicted · novelty 7.0

Trainable dissipative oscillator networks exhibit a trilemma in which damping governs memory horizon, gradient stability, and Lyapunov exponent, with learned substrates outperforming frozen ones only at short horizons before the advantage closes near eleven steps.

MOSAIC: A Workload-Driven Simulation and Design-Space Exploration Framework for Heterogeneous NPUs

cs.AR · 2026-06-03 · unverdicted · novelty 7.0

MOSAIC is a simulation and DSE framework for heterogeneous NPUs that finds designs achieving 46.91% mean iso-area energy savings over homogeneous baselines on 20 workloads.

Learning Long Range Spatio-Temporal Representations over Continuous Time Dynamic Graphs with State Space Models

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

CTDG-SSM introduces CTT-HiPPO, a Laplacian-polynomial projection of HiPPO, to create a parameter-efficient state-space formulation for continuous-time dynamic graphs that captures long-range spatio-temporal patterns.

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

cs.AI · 2026-06-01 · unverdicted · novelty 7.0

AURA-Mem uses an action-gated recurrent memory trained on closed-loop action error to deliver constant 4,224-byte state and 5-9x fewer writes than baselines while matching base policy success on LIBERO-Long.

Trading Complexity for Expressivity Through Structured Generalized Linear Token Mixing

cs.LG · 2026-05-29 · unverdicted · novelty 7.0

Presents a structured generalized linear token mixing framework that extends recurrence equations to multiple past states, enabling new patterns with provable complexity-expressivity trade-offs for causal generation.

UWM-JEPA: Predictive World Models That Imagine in Belief Space

cs.LG · 2026-05-25 · unverdicted · novelty 7.0

UWM-JEPA uses a density-matrix latent and unitary predictor in JEPA to preserve joint-state spectrum during blind rollouts, achieving 0.77 accuracy on a five-step hidden-velocity task versus 0.53 for an LSTM baseline.

Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation

cs.CV · 2026-05-24 · unverdicted · novelty 7.0

MVCHead uses a hierarchical state space model with bi-directional scans and an SE(3) critic to enforce 3D consistency in Gaussian avatars trained only on 2D images.

Exact expression for maximum Lyapunov exponent during transients in computationally powerful dynamical networks

nlin.CD · 2026-05-20 · unverdicted · novelty 7.0

Exact analytical expression for the time-dependent maximum Lyapunov exponent during transients in a network supporting dynamics-based computation.

Social-Mamba: Socially-Aware Trajectory Forecasting with State-Space Models

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

Social-Mamba introduces a Cycle Mamba block and social triplet factorization to achieve state-of-the-art trajectory forecasting accuracy with linear-time social interaction modeling on five benchmarks.

A Novel Schur-Decomposition-Based Weight Projection Method for Stable State-Space Neural-Network Architectures

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

A real Schur decomposition projection maps the state matrix of discrete-time state-space layers onto its nearest stable counterpart, delivering accuracy comparable to prior stable identification methods with fewer weights.

QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

QLAM extends state-space models with quantum superposition in the hidden state for linear-time long-sequence modeling and reports consistent gains over RNN and transformer baselines on sequential image tasks.

Parallel Scan Recurrent Neural Quantum States for Scalable Variational Monte Carlo

cond-mat.str-el · 2026-05-13 · conditional · novelty 7.0

PSR-NQS makes recurrent neural quantum states scalable for variational Monte Carlo by using parallel scan recurrence, reaching accurate results on 52x52 two-dimensional lattices.

Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

Radar-Modulated Selection perturbs only the step size Δ and readout C parameters inside Mamba's selective scan with radar data while keeping other components image-only, yielding state-of-the-art depth estimation on nuScenes with up to 34% MAE reduction.

TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.

TIDES: Implicit Time-Awareness in Selective State Space Models

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

TIDES reconciles selective SSM expressivity with continuous-time physical discretization by moving input dependence onto the state matrix, enabling native irregular time series handling and achieving SOTA on UEA and Physiome-ODE benchmarks.

PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

cs.LG · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

PairAlign learns compact variable-length token sequences for audio via self-alignment on paired content-preserving views, achieving 55% fewer archive tokens than VQ while preserving edit-distance retrieval at 12.71 tokens/s.

Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement

cs.CV · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.

How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.

The Predictive-Causal Gap: An Impossibility Theorem and Large-Scale Neural Evidence

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Predictive representation learning structurally favors encoding slower or less noisy environment modes over causal system modes, as shown by an impossibility theorem for linear-Gaussian dynamics and large-scale neural experiments.

FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

FLUID is a continuous-time transformer using Liquid Attention Networks to model attention as stable ODE solutions that interpolate between discrete SDPA and CT-RNNs, with an explicit sink gate and liquid hyper-connections for better information flow.

citing papers explorer

Showing 50 of 141 citing papers.

Rotation Equivariant Mamba for Vision Tasks cs.CV · 2026-03-10 · unverdicted · none · ref 11 · internal anchor
EQ-VMamba adds rotation-equivariant cross-scan and group Mamba blocks to enforce end-to-end rotation equivariance, yielding better rotation robustness, competitive accuracy, and roughly 50% fewer parameters than non-equivariant baselines across classification, segmentation, and super-resolution.
Test-Time Training with KV Binding Is Secretly Linear Attention cs.LG · 2026-02-24 · conditional · none · ref 4 · internal anchor
Test-time training with KV binding reduces to learned linear attention.
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution cs.CL · 2023-09-28 · unverdicted · none · ref 124 · internal anchor
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
MASS: Motion-Aligned Selective Scan for Refinement in Flow-Based Video Frame Interpolation cs.CV · 2026-06-26 · unverdicted · none · ref 29 · internal anchor
MASS reformulates SSM-based feature scanning in flow-based VFI to follow dynamic motion trajectories via learnable path integration and velocity-aware sampling, claiming SOTA on challenging large-displacement cases.
Between Amnesia and Chaos: A Memory Stability Expressivity Trilemma for Trainable Dissipative Oscillator Networks cs.LG · 2026-06-07 · unverdicted · none · ref 4 · internal anchor
Trainable dissipative oscillator networks exhibit a trilemma in which damping governs memory horizon, gradient stability, and Lyapunov exponent, with learned substrates outperforming frozen ones only at short horizons before the advantage closes near eleven steps.
MOSAIC: A Workload-Driven Simulation and Design-Space Exploration Framework for Heterogeneous NPUs cs.AR · 2026-06-03 · unverdicted · none · ref 11 · internal anchor
MOSAIC is a simulation and DSE framework for heterogeneous NPUs that finds designs achieving 46.91% mean iso-area energy savings over homogeneous baselines on 20 workloads.
Learning Long Range Spatio-Temporal Representations over Continuous Time Dynamic Graphs with State Space Models cs.LG · 2026-06-03 · unverdicted · none · ref 3 · internal anchor
CTDG-SSM introduces CTT-HiPPO, a Laplacian-polynomial projection of HiPPO, to create a parameter-efficient state-space formulation for continuous-time dynamic graphs that captures long-range spatio-temporal patterns.
AURA: Action-Gated Memory for Robot Policies at Constant VRAM cs.AI · 2026-06-01 · unverdicted · none · ref 23 · internal anchor
AURA-Mem uses an action-gated recurrent memory trained on closed-loop action error to deliver constant 4,224-byte state and 5-9x fewer writes than baselines while matching base policy success on LIBERO-Long.
Trading Complexity for Expressivity Through Structured Generalized Linear Token Mixing cs.LG · 2026-05-29 · unverdicted · none · ref 3 · internal anchor
Presents a structured generalized linear token mixing framework that extends recurrence equations to multiple past states, enabling new patterns with provable complexity-expressivity trade-offs for causal generation.
UWM-JEPA: Predictive World Models That Imagine in Belief Space cs.LG · 2026-05-25 · unverdicted · none · ref 37 · internal anchor
UWM-JEPA uses a density-matrix latent and unitary predictor in JEPA to preserve joint-state spectrum during blind rollouts, achieving 0.77 accuracy on a five-step hidden-velocity task versus 0.53 for an LSTM baseline.
Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation cs.CV · 2026-05-24 · unverdicted · none · ref 28 · internal anchor
MVCHead uses a hierarchical state space model with bi-directional scans and an SE(3) critic to enforce 3D consistency in Gaussian avatars trained only on 2D images.
Exact expression for maximum Lyapunov exponent during transients in computationally powerful dynamical networks nlin.CD · 2026-05-20 · unverdicted · none · ref 12 · internal anchor
Exact analytical expression for the time-dependent maximum Lyapunov exponent during transients in a network supporting dynamics-based computation.
Social-Mamba: Socially-Aware Trajectory Forecasting with State-Space Models cs.CV · 2026-05-14 · unverdicted · none · ref 15 · internal anchor
Social-Mamba introduces a Cycle Mamba block and social triplet factorization to achieve state-of-the-art trajectory forecasting accuracy with linear-time social interaction modeling on five benchmarks.
A Novel Schur-Decomposition-Based Weight Projection Method for Stable State-Space Neural-Network Architectures cs.LG · 2026-05-14 · unverdicted · none · ref 2 · internal anchor
A real Schur decomposition projection maps the state matrix of discrete-time state-space layers onto its nearest stable counterpart, delivering accuracy comparable to prior stable identification methods with fewer weights.
QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling cs.LG · 2026-05-13 · unverdicted · none · ref 9 · internal anchor
QLAM extends state-space models with quantum superposition in the hidden state for linear-time long-sequence modeling and reports consistent gains over RNN and transformer baselines on sequential image tasks.
Parallel Scan Recurrent Neural Quantum States for Scalable Variational Monte Carlo cond-mat.str-el · 2026-05-13 · conditional · none · ref 28 · internal anchor
PSR-NQS makes recurrent neural quantum states scalable for variational Monte Carlo by using parallel scan recurrence, reaching accurate results on 52x52 two-dimensional lattices.
Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation cs.CV · 2026-05-12 · unverdicted · none · ref 4 · internal anchor
Radar-Modulated Selection perturbs only the step size Δ and readout C parameters inside Mamba's selective scan with radar data while keeping other components image-only, yielding state-of-the-art depth estimation on nuScenes with up to 34% MAE reduction.
TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles cs.CV · 2026-05-12 · unverdicted · none · ref 13 · internal anchor
TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.
TIDES: Implicit Time-Awareness in Selective State Space Models cs.LG · 2026-05-10 · unverdicted · none · ref 24 · internal anchor
TIDES reconciles selective SSM expressivity with continuous-time physical discretization by moving input dependence onto the state matrix, enabling native irregular time series handling and achieving SOTA on UEA and Physiome-ODE benchmarks.
PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization cs.LG · 2026-05-07 · unverdicted · none · ref 22 · 2 links · internal anchor
PairAlign learns compact variable-length token sequences for audio via self-alignment on paired content-preserving views, achieving 55% fewer archive tokens than VQ while preserving edit-distance retrieval at 12.71 tokens/s.
Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement cs.CV · 2026-05-07 · unverdicted · none · ref 8 · 2 links · internal anchor
NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.
How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences cs.LG · 2026-05-06 · unverdicted · none · ref 18 · internal anchor
In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.
The Predictive-Causal Gap: An Impossibility Theorem and Large-Scale Neural Evidence cs.LG · 2026-05-06 · unverdicted · none · ref 3 · internal anchor
Predictive representation learning structurally favors encoding slower or less noisy environment modes over causal system modes, as shown by an impossibility theorem for linear-Gaussian dynamics and large-scale neural experiments.
FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning cs.LG · 2026-05-06 · unverdicted · none · ref 32 · internal anchor
FLUID is a continuous-time transformer using Liquid Attention Networks to model attention as stable ODE solutions that interpolate between discrete SDPA and CT-RNNs, with an explicit sink gate and liquid hyper-connections for better information flow.
Rethink MAE with Linear Time-Invariant Dynamics cs.CV · 2026-04-29 · unverdicted · none · ref 6 · internal anchor
Token order in frozen visual representations is exploitable via SSM-based LTI probes, revealing pre-training-dependent heterogeneity that fixed pooling misses.
Mamba Sequence Modeling meets Model Predictive Control math.OC · 2026-04-15 · unverdicted · none · ref 12 · internal anchor
Mamba-MPC stabilizes and tracks references on SISO and MIMO systems in simulation and hardware while outperforming LSTM-MPC with faster computation.
RSGMamba: Reliability-Aware Self-Gated State Space Model for Multimodal Semantic Segmentation cs.CV · 2026-04-14 · unverdicted · none · ref 29 · internal anchor
RSGMamba introduces a reliability-aware self-gated Mamba block for dynamic cross-modal feature selection in semantic segmentation, delivering state-of-the-art mIoU on RGB-D and RGB-T benchmarks with 48.6M parameters.
Is Flow Matching Just Trajectory Replay for Sequential Data? stat.ML · 2026-02-09 · unverdicted · none · ref 35 · internal anchor
Flow matching on time series targets a closed-form nonparametric velocity field that is a similarity-weighted mixture of observed transition velocities, making neural models approximations to an ideal memory-augmented dynamical system sampler.
Hidden State Poisoning Attacks against Mamba-based Language Models cs.CL · 2026-01-05 · unverdicted · none · ref 7 · internal anchor
Short input phrases can irreversibly overwrite hidden states in Mamba models, impairing information retrieval on a new benchmark while leaving pure Transformer models unaffected.
Kinetic-Mamba: Mamba-Assisted Predictions of Stiff Chemical Kinetics cs.LG · 2025-12-16 · unverdicted · none · ref 23 · internal anchor
Mamba-based neural operators predict stiff chemical kinetics evolution with high fidelity from initial states on Syngas and GRI-Mech 3.0 mechanisms.
L2RU: a Structured State Space Model with prescribed L2-bound eess.SY · 2025-03-31 · unverdicted · none · ref 9 · internal anchor
L2RU parametrizes SSMs to enforce a prescribed L2-gain bound for guaranteed input-output stability and robustness in all parameter regimes.
Mamba-Based Graph Convolutional Networks: Tackling Over-smoothing with Selective State Space cs.LG · 2025-01-26 · unverdicted · none · ref 10 · internal anchor
MbaGCN combines message aggregation, selective state space transitions, and node state prediction to create a more scalable deep graph convolutional network.
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models cs.LG · 2024-02-29 · unverdicted · none · ref 11 · internal anchor
Griffin hybrid model matches Llama-2 performance while trained on over 6 times fewer tokens and offers lower inference latency with higher throughput.
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model cs.CV · 2024-01-17 · conditional · none · ref 21 · internal anchor
Vim is a bidirectional Mamba vision backbone that outperforms DeiT in accuracy on standard tasks while being substantially faster and more memory-efficient for high-resolution images.
Topological Neural Dynamics: A Neuron-wise Framework for Sequence Modeling cs.LG · 2026-06-19 · unverdicted · none · ref 29 · 2 links · internal anchor
TND models sequences via independent neuron dynamics on a directed graph and reports over three times more consecutive catches than strong baselines on a Pong behavior-cloning task.
ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence cs.AI · 2026-06-17 · unverdicted · none · ref 37 · internal anchor
ITNet frames convolution, attention, and recurrence as special cases of one learnable integral transform with an MLP kernel and shows a single shared operator plus modality encoders matches specialized models on ImageNet-1K, GLUE, ModelNet40, VQA v2, and NLVR2.
Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning cs.AI · 2026-06-10 · unverdicted · none · ref 47 · internal anchor
Reinforcement learning after SFT conversion narrows the performance gap between sliding-window attention and full self-attention on math reasoning benchmarks while preserving linear complexity.
Free Parametrization of L_2-Bounded Structured State-Space Controllers for Nonlinear Control with Stability Guarantees eess.SY · 2026-06-09 · unverdicted · none · ref 19 · internal anchor
A new free parametrization of L2-bounded LTI systems creates L2RU SSM layers that enforce stability by design, allowing unconstrained nonlinear controller optimization with guarantees via small-gain theorem.
End-to-End Context Compression at Scale cs.CL · 2026-06-08 · unverdicted · none · ref 29 · internal anchor
LCLMs are scaled 0.6B-encoder 4B-decoder compressors pre-trained on over 350B tokens that improve the Pareto frontier for general-task performance, compression speed, and peak memory in long-context language model inference.
Vision-Language Guided Hyperspectral Object Tracking via Semantics Fusion and Contextual Template Updating cs.CV · 2026-06-08 · unverdicted · none · ref 50 · internal anchor
VLHTrack integrates LLM-guided band selection and Mamba-based dynamic template updating to outperform prior methods on HOT2023 and HOT2024 hyperspectral tracking benchmarks.
Chiaroscuro Attention: Spending Compute in the Dark cs.CL · 2026-06-06 · unverdicted · none · ref 20 · internal anchor
CHIAR-Former routes tokens via spectral entropy to DCT mixing or attention, yielding 35-40% FLOP savings at 400M parameters with modest perplexity increase on WikiText-103.
Pretraining Recurrent Networks without Recurrence cs.LG · 2026-06-04 · unverdicted · none · ref 40 · internal anchor
SMT reduces RNN training to supervised learning on memory transitions (m_t, x_{t+1}) to m_{t+1} obtained from a Transformer encoder, enabling time-parallel training with O(1) gradient paths.
Mamba-Assisted Non-Markovian Closure for Reduced-Order Modeling cs.LG · 2026-06-03 · unverdicted · none · ref 13 · internal anchor
Mamba-Assisted Closure (MAC) trains a Mamba sequence model on resolved trajectories to predict non-Markovian closures and couples it with reduced-order equations, outperforming Markovian, GRU, and Wilks baselines on Burgers' and Lorenz '96 systems.
Physically-Constrained Mamba-SDE for Remaining Useful Life Prediction under Irregular Observations cs.AI · 2026-06-01 · unverdicted · none · ref 13 · internal anchor
PC-MambaSDE combines Mamba with physics-constrained SDE for RUL prediction under irregular observations, with theoretical stability guarantees and empirical outperformance on benchmarks.
Blurry Window Attention cs.LG · 2026-05-31 · unverdicted · none · ref 4 · internal anchor
Blurry Window Attention stores a frequency window and reconstructs blurry KV history via Dirichlet kernel interpolation, achieving 8x better state efficiency than sliding window attention on the MQAR synthetic task.
Neuro-Inspired Inverse Learning for Planning and Control cs.AI · 2026-05-22 · unverdicted · none · ref 81 · internal anchor
The Inverter framework formalizes inverse learning to generate coherent multi-step trajectories, outperforming offline RL and diffusion baselines on D4RL maze tasks by 24% on average with 10-100x less inference time while also matching GRAPE fidelity on single-qubit gates at >1000x speed.
TGSD: Topology-Guided State-Space Diffusion Framework for EEG Spatial Super-Resolution eess.SP · 2026-05-22 · unverdicted · none · ref 31 · internal anchor
TGSD combines a Hierarchical Spatial Prior Encoder with conditional state-space diffusion to achieve EEG spatial super-resolution, outperforming baselines on reconstruction fidelity and classification on SEED and PhysioNet datasets.
Deformba: Vision State Space Model with Adaptive State Fusion cs.CV · 2026-05-20 · unverdicted · none · ref 4 · internal anchor
Deformba introduces context-adaptive state fusion to vision SSMs for better spatial augmentation and cross-stream interactions, showing strong results on 2D classification/detection/segmentation and 3D BEV perception benchmarks.
GeoMamba: A Geometry-driven MambaVision Framework and Dataset for Fine-grained Optical-SAR Object Retrieval cs.CV · 2026-05-19 · unverdicted · none · ref 33 · internal anchor
GeoMamba with Geometric Feature Injection and Geometric Consistency Constraint modules achieves 63.3% mAP and 77.0% Rank-1 on the new FGOS-as dataset for unaligned optical-SAR fine-grained retrieval.
Phasor Memory Networks: Stable Backpropagation Through Time for Scalable Explicit Memory cs.LG · 2026-05-13 · unverdicted · none · ref 9 · internal anchor
PMNet uses unitary phasor dynamics and hierarchical anchors to make explicit memory stable for long sequences, matching a 3x larger Mamba model on long-context robustness with a 119M parameter network.

Efficiently Modeling Long Sequences with Structured State Spaces

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer