hub

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Tri Dao, Albert Gu · 2024 · cs.LG · arXiv 2405.21060

49 Pith papers cite this work. Polarity classification is still indexing.

49 Pith papers citing it

open full Pith review browse 49 citing papers arXiv PDF

abstract

While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mamba's selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

extension 1

citation-polarity summary

unclear 1

claims ledger

abstract While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose c

co-cited works

representative citing papers

Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

Radar-Modulated Selection perturbs only the step size Δ and readout C parameters inside Mamba's selective scan with radar data while keeping other components image-only, yielding state-of-the-art depth estimation on nuScenes with up to 34% MAE reduction.

TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.

TIDES: Implicit Time-Awareness in Selective State Space Models

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

TIDES reconciles selective SSM expressivity with continuous-time physical discretization by moving input dependence onto the state matrix, enabling native irregular time series handling and achieving SOTA on UEA and Physiome-ODE benchmarks.

FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

FRACTAL integrates fractional recurrent architecture into SSMs using a tunable singularity index to capture multi-scale temporal features, reporting 87.11% average on Long Range Arena and outperforming S5.

Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Star Elastic trains N nested submodels in a single post-training job on a parent reasoning LLM, supporting elastic budget control that matches or exceeds independent baselines while cutting training compute by up to 360x.

PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

PairAlign learns compact audio token sequences via self-alignment of paired content views using an autoregressive decoder, achieving strong cross-view consistency and edit-distance preservation while reducing token count by 55% on TIMIT.

Rethink MAE with Linear Time-Invariant Dynamics

cs.CV · 2026-04-29 · unverdicted · novelty 7.0

Token order in frozen visual representations is exploitable via SSM-based LTI probes, revealing pre-training-dependent heterogeneity that fixed pooling misses.

Sparse Prefix Caching for Hybrid and Recurrent LLM Serving

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

Sparse prefix caching via dynamic programming for optimal checkpoint placement under overlap distributions improves the Pareto frontier for recurrent and hybrid LLM serving on shared-prefix data.

The UNDO Flip-Flop: A Controlled Probe for Reversible Semantic State Management in State Space Model

cs.LG · 2026-04-07 · unverdicted · novelty 7.0

Mamba-2 models fail to learn reversible state retrieval in the UNDO Flip-Flop task, defaulting to a toggle heuristic and achieving only 41% accuracy under adversarial conditions.

S0 Tuning: Zero-Overhead Adaptation of Hybrid Recurrent-Attention Models

cs.CL · 2026-04-01 · conditional · novelty 7.0

S0 tuning optimizes initial recurrent states in hybrid models to outperform LoRA with zero inference cost on HumanEval and partial cross-domain transfer.

The Randomness Floor: Measuring Intrinsic Non-Randomness in Language Model Token Distributions

cs.CL · 2026-03-29 · unverdicted · novelty 7.0

Language models have an intrinsic randomness floor: transformers show ~0.30 entropic deviation from uniform on neutral prompts, accounting for 88-93% of observed non-randomness, while state-space models exhibit twice the deviation and strong temperature sensitivity.

Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

Freezing deep layers and training shallow layers during continued pre-training of LLMs outperforms full fine-tuning and the opposite allocation on C-Eval and CMMLU, guided by a new layer-sensitivity diagnostic.

MambaNetBurst: Direct Byte-level Network Traffic Classification without Tokenization or Pretraining

cs.CR · 2026-05-11 · unverdicted · novelty 6.0

A compact Mamba-2 model performs end-to-end byte-level network traffic classification without tokenization or pre-training and remains competitive with substantially larger pre-trained systems.

RT-Transformer: The Transformer Block as a Spherical State Estimator

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Transformer components arise as the natural solution to precision-weighted directional state estimation on the hypersphere.

Structured Recurrent Mixers for Massively Parallelized Sequence Generation

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

Structured Recurrent Mixers enable algebraic switching between parallel training and recurrent inference representations, delivering higher efficiency, information capacity, and throughput than other linear-complexity models.

Echo: KV-Cache-Free Associative Recall with Spectral Koopman Operators

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Spectral Koopman operators let SSMs achieve 100% accuracy on long-gap multi-query associative recall with fixed memory, where pure Mamba fails.

Cubit: Token Mixer with Kernel Ridge Regression

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Cubit replaces Transformer attention with Kernel Ridge Regression token mixing and shows potential gains on longer sequences.

The Impossibility Triangle of Long-Context Modeling

cs.CL · 2026-05-06 · unverdicted · novelty 6.0

No model can achieve efficiency, compactness, and recall capacity scaling with sequence length at once, as any two imply a strict bound of O(poly(d)/log V) on recallable facts.

Long-Context Aware Upcycling: A New Frontier for Hybrid LLM Scaling

cs.CL · 2026-04-27 · unverdicted · novelty 6.0

HyLo upcycles Transformer LLMs into hybrids with MLA and Mamba2/Gated DeltaNet blocks via staged training and distillation, extending context to 2M tokens and outperforming prior upcycled hybrids on long-context benchmarks.

NAKUL-Med: Spectral-Graph State Space Models with Dynamics Kernels for Medical Signals

eess.SP · 2026-04-24 · unverdicted · novelty 6.0

NAKUL achieves 91.7% accuracy on motor imagery EEG with 28% fewer parameters than EEG-Conformer by using dynamic kernel generation, spectral context modeling, and graph-guided spatial attention.

HubRouter: A Pluggable Sub-Quadratic Routing Primitive for Hybrid Sequence Models

cs.LG · 2026-04-24 · unverdicted · novelty 6.0

HubRouter is a sub-quadratic routing primitive using learned hubs that replaces attention layers in hybrid models while delivering competitive perplexity and large throughput gains.

HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.

M$^{2}$GRPO: Mamba-based Multi-Agent Group Relative Policy Optimization for Biomimetic Underwater Robots Pursuit

cs.RO · 2026-04-21 · unverdicted · novelty 6.0

M²GRPO uses a Mamba-based policy and normalized group-relative advantages under CTDE to achieve higher pursuit success and capture efficiency than MAPPO and recurrent baselines in simulations and pool tests.

Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Sonata is a small hybrid world model pre-trained to predict future IMU states that outperforms autoregressive baselines on clinical discrimination, fall-risk prediction, and cross-cohort transfer while fitting on-device wearables.

citing papers explorer

Showing 49 of 49 citing papers.

Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation cs.CV · 2026-05-12 · unverdicted · none · ref 1 · internal anchor
Radar-Modulated Selection perturbs only the step size Δ and readout C parameters inside Mamba's selective scan with radar data while keeping other components image-only, yielding state-of-the-art depth estimation on nuScenes with up to 34% MAE reduction.
TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles cs.CV · 2026-05-12 · unverdicted · none · ref 8 · internal anchor
TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.
TIDES: Implicit Time-Awareness in Selective State Space Models cs.LG · 2026-05-10 · unverdicted · none · ref 7 · internal anchor
TIDES reconciles selective SSM expressivity with continuous-time physical discretization by moving input dependence onto the state matrix, enabling native irregular time series handling and achieving SOTA on UEA and Physiome-ODE benchmarks.
FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences cs.AI · 2026-05-09 · unverdicted · none · ref 47 · internal anchor
FRACTAL integrates fractional recurrent architecture into SSMs using a tunable singularity index to capture multi-scale temporal features, reporting 87.11% average on Long Range Arena and outperforming S5.
Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control cs.LG · 2026-05-08 · unverdicted · none · ref 11 · internal anchor
Star Elastic trains N nested submodels in a single post-training job on a parent reasoning LLM, supporting elastic budget control that matches or exceeds independent baselines while cutting training compute by up to 360x.
PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization cs.LG · 2026-05-07 · unverdicted · none · ref 12 · internal anchor
PairAlign learns compact audio token sequences via self-alignment of paired content views using an autoregressive decoder, achieving strong cross-view consistency and edit-distance preservation while reducing token count by 55% on TIMIT.
Rethink MAE with Linear Time-Invariant Dynamics cs.CV · 2026-04-29 · unverdicted · none · ref 2 · internal anchor
Token order in frozen visual representations is exploitable via SSM-based LTI probes, revealing pre-training-dependent heterogeneity that fixed pooling misses.
Sparse Prefix Caching for Hybrid and Recurrent LLM Serving cs.LG · 2026-04-17 · unverdicted · none · ref 6 · internal anchor
Sparse prefix caching via dynamic programming for optimal checkpoint placement under overlap distributions improves the Pareto frontier for recurrent and hybrid LLM serving on shared-prefix data.
The UNDO Flip-Flop: A Controlled Probe for Reversible Semantic State Management in State Space Model cs.LG · 2026-04-07 · unverdicted · none · ref 2 · internal anchor
Mamba-2 models fail to learn reversible state retrieval in the UNDO Flip-Flop task, defaulting to a toggle heuristic and achieving only 41% accuracy under adversarial conditions.
S0 Tuning: Zero-Overhead Adaptation of Hybrid Recurrent-Attention Models cs.CL · 2026-04-01 · conditional · none · ref 3 · internal anchor
S0 tuning optimizes initial recurrent states in hybrid models to outperform LoRA with zero inference cost on HumanEval and partial cross-domain transfer.
The Randomness Floor: Measuring Intrinsic Non-Randomness in Language Model Token Distributions cs.CL · 2026-03-29 · unverdicted · none · ref 2 · internal anchor
Language models have an intrinsic randomness floor: transformers show ~0.30 entropic deviation from uniform on neutral prompts, accounting for 88-93% of observed non-randomness, while state-space models exhibit twice the deviation and strong temperature sensitivity.
Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training cs.CL · 2026-05-12 · unverdicted · none · ref 11 · internal anchor
Freezing deep layers and training shallow layers during continued pre-training of LLMs outperforms full fine-tuning and the opposite allocation on C-Eval and CMMLU, guided by a new layer-sensitivity diagnostic.
MambaNetBurst: Direct Byte-level Network Traffic Classification without Tokenization or Pretraining cs.CR · 2026-05-11 · unverdicted · none · ref 16 · internal anchor
A compact Mamba-2 model performs end-to-end byte-level network traffic classification without tokenization or pre-training and remains competitive with substantially larger pre-trained systems.
RT-Transformer: The Transformer Block as a Spherical State Estimator cs.LG · 2026-05-10 · unverdicted · none · ref 18 · internal anchor
Transformer components arise as the natural solution to precision-weighted directional state estimation on the hypersphere.
Structured Recurrent Mixers for Massively Parallelized Sequence Generation cs.CL · 2026-05-09 · unverdicted · none · ref 64 · internal anchor
Structured Recurrent Mixers enable algebraic switching between parallel training and recurrent inference representations, delivering higher efficiency, information capacity, and throughput than other linear-complexity models.
Echo: KV-Cache-Free Associative Recall with Spectral Koopman Operators cs.LG · 2026-05-07 · unverdicted · none · ref 7 · internal anchor
Spectral Koopman operators let SSMs achieve 100% accuracy on long-gap multi-query associative recall with fixed memory, where pure Mamba fails.
Cubit: Token Mixer with Kernel Ridge Regression cs.LG · 2026-05-07 · unverdicted · none · ref 15 · internal anchor
Cubit replaces Transformer attention with Kernel Ridge Regression token mixing and shows potential gains on longer sequences.
The Impossibility Triangle of Long-Context Modeling cs.CL · 2026-05-06 · unverdicted · none · ref 8 · internal anchor
No model can achieve efficiency, compactness, and recall capacity scaling with sequence length at once, as any two imply a strict bound of O(poly(d)/log V) on recallable facts.
Long-Context Aware Upcycling: A New Frontier for Hybrid LLM Scaling cs.CL · 2026-04-27 · unverdicted · none · ref 11 · internal anchor
HyLo upcycles Transformer LLMs into hybrids with MLA and Mamba2/Gated DeltaNet blocks via staged training and distillation, extending context to 2M tokens and outperforming prior upcycled hybrids on long-context benchmarks.
NAKUL-Med: Spectral-Graph State Space Models with Dynamics Kernels for Medical Signals eess.SP · 2026-04-24 · unverdicted · none · ref 6 · internal anchor
NAKUL achieves 91.7% accuracy on motor imagery EEG with 28% fewer parameters than EEG-Conformer by using dynamic kernel generation, spectral context modeling, and graph-guided spatial attention.
HubRouter: A Pluggable Sub-Quadratic Routing Primitive for Hybrid Sequence Models cs.LG · 2026-04-24 · unverdicted · none · ref 8 · internal anchor
HubRouter is a sub-quadratic routing primitive using learned hubs that replaces attention layers in hybrid models while delivering competitive perplexity and large throughput gains.
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering cs.AI · 2026-04-22 · unverdicted · none · ref 247 · internal anchor
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
M$^{2}$GRPO: Mamba-based Multi-Agent Group Relative Policy Optimization for Biomimetic Underwater Robots Pursuit cs.RO · 2026-04-21 · unverdicted · none · ref 25 · internal anchor
M²GRPO uses a Mamba-based policy and normalized group-relative advantages under CTDE to achieve higher pursuit success and capture efficiency than MAPPO and recurrent baselines in simulations and pool tests.
Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity cs.LG · 2026-04-20 · unverdicted · none · ref 44 · internal anchor
Sonata is a small hybrid world model pre-trained to predict future IMU states that outperforms autoregressive baselines on clinical discrimination, fall-risk prediction, and cross-cohort transfer while fitting on-device wearables.
MambaBack: Bridging Local Features and Global Contexts in Whole Slide Image Analysis cs.CV · 2026-04-17 · conditional · none · ref 19 · internal anchor
MambaBack is a hybrid Mamba-CNN model with Hilbert sampling and chunked inference that reports better performance than seven prior methods on five whole-slide image datasets.
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective cs.CV · 2026-04-15 · unverdicted · none · ref 233 · internal anchor
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
Parcae: Scaling Laws For Stable Looped Language Models cs.LG · 2026-04-14 · unverdicted · none · ref 17 · internal anchor
Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.
Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction cs.CV · 2026-04-09 · unverdicted · none · ref 15 · internal anchor
Scal3R achieves better accuracy and consistency in large-scale 3D scene reconstruction by maintaining a compressed global context through test-time adaptation of lightweight neural networks on long video sequences.
Optimal Decay Spectra for Linear Recurrences cs.LG · 2026-04-08 · unverdicted · none · ref 2 · internal anchor
PoST reparameterizes decay spectra in linear recurrences with geometric log-spacing and position-adaptive scaling to achieve O(exp(-cN/log t)) decay, improving zero-shot language modeling and long-context retrieval across Mamba-2, RWKV-7 and similar models.
In-Place Test-Time Training cs.LG · 2026-04-07 · conditional · none · ref 16 · internal anchor
In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.
Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space cs.CL · 2026-04-06 · unverdicted · none · ref 102 · internal anchor
PAM, a complex-valued associative memory model, exhibits steeper power-law scaling in loss and perplexity than a matched real-valued baseline when trained on WikiText-103 from 5M to 100M parameters.
Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing cs.LG · 2026-04-03 · unverdicted · none · ref 10 · internal anchor
Stochastic training with random cross-layer KV attention enables depth-wise cache sharing in transformers, cutting memory footprint while preserving or improving performance.
Attention to Mamba: A Recipe for Cross-Architecture Distillation cs.CL · 2026-04-01 · unverdicted · none · ref 9 · internal anchor
A two-stage distillation recipe converts a Pythia-1B Transformer into a Mamba model that preserves performance with perplexity 14.11 versus the teacher's 13.86.
Computer Architecture's AlphaZero Moment: Automated Discovery in an Encircled World cs.AR · 2026-03-31 · conditional · none · ref 38 · internal anchor
Automated architectural discovery engines can outperform human design teams by exploring massive design spaces and compressing development cycles from months to weeks.
Kimi Linear: An Expressive, Efficient Attention Architecture cs.CL · 2025-10-30 · unverdicted · none · ref 16 · internal anchor
Kimi Linear hybridizes linear attention with a new KDA module to beat full attention on tasks while slashing KV cache by 75% and speeding decoding up to 6x.
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention cs.CL · 2025-06-16 · unverdicted · none · ref 7 · internal anchor
MiniMax-M1 is a 456B parameter hybrid-attention MoE model trained with CISPO RL that achieves performance comparable or superior to DeepSeek-R1 and Qwen3-235B on reasoning and software engineering tasks while training in three weeks on 512 GPUs.
Mela: Test-Time Memory Consolidation based on Transformation Hypothesis cs.CL · 2026-05-11 · unverdicted · none · ref 7 · internal anchor
Mela is a Transformer variant with a dual-frequency Hierarchical Memory Module and MemStack that performs test-time memory consolidation, outperforming baselines on long contexts.
Kaczmarz Linear Attention cs.LG · 2026-05-09 · unverdicted · none · ref 8 · internal anchor
Kaczmarz Linear Attention replaces the empirical coefficient in Gated DeltaNet with a key-norm-normalized step size derived from the online regression objective, yielding lower perplexity and better needle-in-haystack performance.
Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving cs.DC · 2026-05-07 · unverdicted · none · ref 7 · internal anchor
Irminsul recovers up to 83% of prompt tokens above exact-prefix matching and delivers 63% prefill energy savings per cache hit on MLA-MoE models by content-hashing CDC chunks and applying closed-form kr correction.
Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models cs.LG · 2026-04-24 · unverdicted · none · ref 52 · internal anchor
Toeplitz MLP Mixers replace attention with masked Toeplitz multiplications for sub-quadratic complexity while retaining more sequence information and outperforming on copying and in-context tasks.
Reasoning Primitives in Hybrid and Non-Hybrid LLMs cs.CL · 2026-04-23 · unverdicted · none · ref 15 · internal anchor
Reasoning augmentation extends the difficulty range for both architectures, but hybrid models stay robust longer than transformers as sequential dependence increases in state-based recall tasks.
LayerTracer: A Joint Task-Particle and Vulnerable-Layer Analysis framework for Arbitrary Large Language Model Architectures cs.CL · 2026-04-22 · unverdicted · none · ref 11 · internal anchor
LayerTracer defines task particles as the first layer where target token probability rises sharply and vulnerable layers via maximum JS divergence after masking, showing task particles in deep layers and greater robustness in larger models.
FG$^2$-GDN: Enhancing Long-Context Gated Delta Networks with Doubly Fine-Grained Control cs.LG · 2026-04-21 · unverdicted · none · ref 28 · internal anchor
FG²-GDN replaces the scalar beta in the delta update with a channel-wise vector and decouples key/value scaling to improve recall over prior GDN and KDA models.
Sessa: Selective State Space Attention cs.LG · 2026-04-20 · unverdicted · none · ref 14 · internal anchor
Sessa integrates attention within recurrent paths to achieve power-law memory tails and flexible non-decaying selective retrieval, outperforming baselines on long-context tasks.
Hypergraph-State Collaborative Reasoning for Multi-Object Tracking cs.CV · 2026-04-14 · unverdicted · none · ref 9 · internal anchor
HyperSSM integrates hypergraphs and state space models to let correlated objects mutually refine motion estimates, stabilizing trajectories under noise and occlusion for state-of-the-art multi-object tracking.
COREY: Entropy-Guided Runtime Chunk Scheduling for Selective Scan Kernels cs.CV · 2026-04-12 · unverdicted · none · ref 2 · internal anchor
COREY maps activation entropy to chunk sizes for SSM kernels, matching static-oracle latency at kernel level with 3.9-4.4x speedups over baselines but adding overhead that prevents end-to-end gains while preserving exact output equivalence.
CARE-ECG: Causal Agent-based Reasoning for Explainable and Counterfactual ECG Interpretation cs.LG · 2026-04-12 · unverdicted · none · ref 6 · internal anchor
CARE-ECG unifies ECG representation learning, causal graph-based diagnosis, and counterfactual assessment in an agentic LLM pipeline to improve accuracy and explanation faithfulness.
Efficient Spatial-Temporal Focal Adapter with SSM for Temporal Action Detection cs.CV · 2026-04-10 · unverdicted · none · ref 15 · internal anchor
A new adapter module combining boundary-aware state space modeling with spatial processing boosts localization and robustness in temporal action detection.
The Hyperscale Lottery: How State-Space Models Have Sacrificed Edge Efficiency cs.AR · 2026-04-09 · unverdicted · none · ref 2 · internal anchor
Mamba-3 architectural changes optimized for hyperscale GPUs cause 28% higher edge latency at 880M parameters and 48% at 15M parameters compared to earlier versions.

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer