super hub Canonical reference

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu, Tri Dao · 2023 · cs.LG · arXiv 2312.00752

Canonical reference. 80% of citing Pith papers cite this work as background.

171 Pith papers citing it

Background 80% of classified citations

open full Pith review browse 171 citing papers more from Albert Gu arXiv PDF

abstract

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5$\times$ higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 4 unclear 1

claims ledger

abstract Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoni

authors

Albert Gu Tri Dao

co-cited works

representative citing papers

WriteSAE: Sparse Autoencoders for Recurrent State

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

WriteSAE is the first sparse autoencoder that factors decoder atoms into the native d_k x d_v cache write shape of recurrent models and supplies a closed-form per-token logit shift for atom substitution.

Convergent Stochastic Training of Attention and Understanding LoRA

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Attention and LoRA regression losses induce Poincaré inequalities under mild regularization, so SGD-mimicking SDEs converge to minimizers with no assumptions on data or model size.

Learning the Signature of Memorization in Autoregressive Language Models

cs.CL · 2026-04-03 · accept · novelty 8.0

A classifier trained only on transformer fine-tuning data detects an invariant memorization signature that transfers to Mamba, RWKV-4, and RecurrentGemma with AUCs of 0.963, 0.972, and 0.936.

The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry

cs.LG · 2026-04-03 · unverdicted · novelty 8.0

Transformer weight spectra exhibit transient compression waves that propagate layer-wise, persistent non-monotonic depth gradients in power-law exponents, and Q/K-V asymmetry, with the spectral exponent alpha predicting layer importance and enabling pruning gains of 1.1x-3.6x over Last-N baselines.

RULER: What's the Real Context Size of Your Long-Context Language Models?

cs.CL · 2024-04-09 · accept · novelty 8.0

RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.

QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

QLAM extends state-space models with quantum superposition in the hidden state for linear-time long-sequence modeling and reports consistent gains over RNN and transformer baselines on sequential image tasks.

Parallel Scan Recurrent Neural Quantum States for Scalable Variational Monte Carlo

cond-mat.str-el · 2026-05-13 · conditional · novelty 7.0

PSR-NQS makes recurrent neural quantum states scalable for variational Monte Carlo by using parallel scan recurrence, reaching accurate results on 52x52 two-dimensional lattices.

Chem-GMNet: A Sphere-Native Geometric Transformer for Molecular Property Prediction

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Chem-GMNet uses sphere-native embeddings, DualSKA attention, and SH-FFN layers to match or beat ChemBERTa-2 on MoleculeNet tasks with fewer parameters and sometimes no pretraining.

SpikeProphecy: A Large-Scale Benchmark for Autoregressive Neural Population Forecasting

q-bio.NC · 2026-05-13 · unverdicted · novelty 7.0

SpikeProphecy decomposes spike-count forecasting performance into temporal fidelity, spatial pattern accuracy, and magnitude-invariant alignment, revealing reproducible brain-region predictability rankings and a sub-Poisson evaluation floor across seven model families on 105 Neuropixels sessions.

Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

Radar-Modulated Selection perturbs only the step size Δ and readout C parameters inside Mamba's selective scan with radar data while keeping other components image-only, yielding state-of-the-art depth estimation on nuScenes with up to 34% MAE reduction.

TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.

Variational Linear Attention: Stable Associative Memory for Long-Context Transformers

cs.LG · 2026-05-11 · conditional · novelty 7.0

VLA stabilizes linear attention by solving regularized least-squares updates with unit-length writes, yielding Jacobian spectral norm exactly 1 and 109x smaller state norms while improving multi-query recall accuracy over standard linear attention and DeltaNet.

Learning to Focus Synthetic Aperture Radar On-line with State-Space Models

eess.IV · 2026-05-11 · unverdicted · novelty 7.0

An online SAR focusing framework using state-space models processes raw data line-by-line with 70x lower latency and 130x lower memory than block-based DSP while supporting downstream tasks.

TIDES: Implicit Time-Awareness in Selective State Space Models

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

TIDES reconciles selective SSM expressivity with continuous-time physical discretization by moving input dependence onto the state matrix, enabling native irregular time series handling and achieving SOTA on UEA and Physiome-ODE benchmarks.

LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.

Test-Time Speculation

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

Test-Time Speculation adapts draft models online via target-model verifications to sustain high acceptance lengths during long LLM generations.

Prediction Bottlenecks Don't Discover Causal Structure (But Here's What They Actually Do)

cs.LG · 2026-05-09 · conditional · novelty 7.0

Prediction bottlenecks do not discover causal structure beyond what linear models, Lasso, and classical Granger/PCMCI methods achieve; intervention benefits are mostly sample-size confounds, leaving a standardized falsification benchmark as the main contribution.

VORT: Adaptive Power-Law Memory for NLP Transformers

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

VORT assigns learnable fractional orders to tokens and approximates their power-law retention kernels via sum-of-exponentials for efficient long-range dependency modeling in transformers.

VIMCAN: Visual-Inertial 3D Human Pose Estimation with Hybrid Mamba-Cross-Attention Network

cs.CV · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

VIMCAN combines Mamba for temporal efficiency and cross-attention for spatial fusion to reach 17.2 mm MPJPE on TotalCapture and 45.3 mm on 3DPW while running above 60 FPS.

Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Star Elastic trains N nested submodels in a single post-training job on a parent reasoning LLM, supporting elastic budget control that matches or exceeds independent baselines while cutting training compute by up to 360x.

Long Context Pre-Training with Lighthouse Attention

cs.CL · 2026-05-07 · conditional · novelty 7.0

Lighthouse Attention enables faster long-context pre-training via gradient-free symmetrical hierarchical compression of QKV while preserving causality, followed by a short full-attention recovery that yields lower loss than standard full-attention training.

How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.

On the Architectural Complexity of Neural Networks

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

A framework quantifies DNN complexity via tensor operations, links 40 years of breakthroughs to complexity increases, and releases a dataset of 3000+ unexplored high-complexity architectures.

Latent State Design for World Models under Sufficiency Constraints

cs.AI · 2026-05-03 · unverdicted · novelty 7.0

World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

citing papers explorer

Showing 50 of 52 citing papers after filters.

Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation cs.CV · 2026-05-12 · unverdicted · none · ref 3 · internal anchor
Radar-Modulated Selection perturbs only the step size Δ and readout C parameters inside Mamba's selective scan with radar data while keeping other components image-only, yielding state-of-the-art depth estimation on nuScenes with up to 34% MAE reduction.
TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles cs.CV · 2026-05-12 · unverdicted · none · ref 12 · internal anchor
TCP-SSM conditions stable poles on visual tokens to explicitly control memory decay and oscillation in SSMs, cutting computation up to 44% while matching or exceeding accuracy on classification, segmentation, and detection.
VIMCAN: Visual-Inertial 3D Human Pose Estimation with Hybrid Mamba-Cross-Attention Network cs.CV · 2026-05-08 · unverdicted · none · ref 5 · 2 links · internal anchor
VIMCAN combines Mamba for temporal efficiency and cross-attention for spatial fusion to reach 17.2 mm MPJPE on TotalCapture and 45.3 mm on 3DPW while running above 60 FPS.
Rethink MAE with Linear Time-Invariant Dynamics cs.CV · 2026-04-29 · unverdicted · none · ref 4 · internal anchor
Token order in frozen visual representations is exploitable via SSM-based LTI probes, revealing pre-training-dependent heterogeneity that fixed pooling misses.
V-Nutri: Dish-Level Nutrition Estimation from Egocentric Cooking Videos cs.CV · 2026-04-13 · unverdicted · none · ref 13 · internal anchor
V-Nutri fuses final-dish features with cooking-process keyframes from egocentric videos to improve dish-level calorie and macronutrient estimation over single-image baselines.
Beyond Reconstruction: Reconstruction-to-Vector Diffusion for Hyperspectral Anomaly Detection cs.CV · 2026-04-13 · unverdicted · none · ref 34 · internal anchor
R2VD redefines reconstruction as the origin for residual-guided vector diffusion across PPE, GMP, RSM, and VDI stages to achieve superior anomaly detectability and background suppression on eight datasets.
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model cs.CV · 2024-01-17 · conditional · none · ref 20 · internal anchor
Vim is a bidirectional Mamba vision backbone that outperforms DeiT in accuracy on standard tasks while being substantially faster and more memory-efficient for high-resolution images.
GraphLeap: Decoupling Graph Construction and Convolution for Vision GNN Acceleration on FPGA cs.CV · 2026-04-23 · conditional · none · ref 5
GraphLeap decouples per-layer graph construction from feature updates in Vision GNNs by using previous-layer features for the current graph, enabling pipelined FPGA acceleration with up to 95.7× CPU speedup after fine-tuning.
LiquidTAD: Efficient Temporal Action Detection via Parallel Liquid-Inspired Temporal Relaxation cs.CV · 2026-04-20 · unverdicted · none · ref 20
LiquidTAD distills liquid neural dynamics into a vectorized parallel temporal operator and hierarchical decay sharing to achieve efficient action detection with substantially reduced model size and computation.
DGSSM: Diffusion guided state-space models for multimodal salient object detection cs.CV · 2026-04-19 · unverdicted · none · ref 65
DGSSM formulates multimodal salient object detection as a progressive denoising process using diffusion-guided Mamba models, achieving better boundary accuracy and outperforming prior methods on 13 benchmarks.
Elastic Attention Cores for Scalable Vision Transformers cs.CV · 2026-05-12 · unverdicted · none · ref 79 · internal anchor
VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.
Polygon-mamba: Retinal vessel segmentation using polygon scanning mamba and space-frequency collaborative attention cs.CV · 2026-05-11 · unverdicted · none · ref 37 · internal anchor
Polygon-Mamba achieves F1 scores of 0.8283, 0.8282, and 0.8251 on DRIVE, STARE, and CHASE_DB1 by combining polygon scanning Mamba with space-frequency collaborative attention to better detect small retinal vessels.
DynGhost: Temporally-Modelled Transformer for Dynamic Ghost Imaging with Quantum Detectors cs.CV · 2026-05-11 · unverdicted · none · ref 14 · internal anchor
DynGhost improves dynamic ghost imaging reconstruction by using a transformer with alternating spatial-temporal attention and quantum-aware training on simulated single-photon detector data.
GEM: Generating LiDAR World Model via Deformable Mamba cs.CV · 2026-05-08 · unverdicted · none · ref 8 · internal anchor
GEM is a new LiDAR world model using deformable Mamba that disentangles dynamic and static features to generate high-fidelity simulations and achieve state-of-the-art results on autonomous driving benchmarks.
Detecting AI-Generated Videos with Spiking Neural Networks cs.CV · 2026-05-07 · unverdicted · none · ref 21 · internal anchor
MAST with spiking neural networks achieves 93.14% mean accuracy detecting AI-generated videos from 10 unseen generators by exploiting smoother pixel residuals and compact semantic trajectories.
A Novel Graph-Regulated Disentangling Mamba Model with Sparse Tokens for Enhanced Tree Species Classification from MODIS Time Series cs.CV · 2026-05-07 · unverdicted · none · ref 134 · internal anchor
A graph-regulated disentangling Mamba model with sparse tokens achieves 93.94% accuracy classifying tree species from MODIS time series in Alberta and outperforms twelve prior models.
GTF: Omnidirectional EPI Transformer for Light Field Super-Resolution cs.CV · 2026-05-06 · unverdicted · none · ref 5 · internal anchor
GTF is an omnidirectional EPI Transformer for light field super-resolution that models horizontal, vertical, 45-degree and 135-degree epipolar geometries, reaching 32.78 dB on benchmarks and top ranks in the NTIRE 2026 challenge.
PACE: Post-Causal Entropy Modeling for Learned LiDAR Point Cloud Compression cs.CV · 2026-05-02 · unverdicted · none · ref 131 · internal anchor
PACE achieves state-of-the-art LiDAR point cloud compression with over 90% lower decoding latency by using a non-causal backbone and a stage-scalable causal predictor.
SAMamba3D: adapting Segment Anything for generalizable 3D segmentation of multiphase pore-scale images cs.CV · 2026-04-29 · unverdicted · none · ref 35 · internal anchor
SAMamba3D adapts a frozen SAM encoder with Mamba volumetric context and cross-scale features to match or exceed 3D baselines on diverse sandstone and carbonate datasets while reducing case-specific retraining.
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective cs.CV · 2026-04-15 · unverdicted · none · ref 211 · internal anchor
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
Event-Adaptive State Transition and Gated Fusion for RGB-Event Object Tracking cs.CV · 2026-04-15 · unverdicted · none · ref 2 · internal anchor
MambaTrack improves RGB-Event object tracking via event-adaptive state transitions in a Dynamic State Space Model and a Gated Projection Fusion module, reporting state-of-the-art results on FE108 and FELT datasets.
CloudMamba: An Uncertainty-Guided Dual-Scale Mamba Network for Cloud Detection in Remote Sensing Imagery cs.CV · 2026-04-08 · unverdicted · none · ref 53 · internal anchor
CloudMamba combines uncertainty-guided refinement with a dual-scale Mamba network to outperform prior methods on cloud segmentation accuracy while maintaining linear computational cost.
Physics-Aligned Spectral Mamba: Decoupling Semantics and Dynamics for Few-Shot Hyperspectral Target Detection cs.CV · 2026-04-07 · unverdicted · none · ref 54 · internal anchor
SpecMamba decouples stable semantic features from agile spectral adaptation via DCT-Mamba adapters, prior-guided tri-encoders, and self-supervised test-time mapping to improve few-shot hyperspectral target detection.
Edge-Efficient Image Restoration: Transformer Distillation into State-Space Models cs.CV · 2026-05-04 · unverdicted · none · ref 19
Hybrid transformer-SSM networks found by multi-objective search run 1.17x to 3.4x faster on edge CPUs for image restoration tasks with competitive quality.
Foveated Reasoning: Stateful, Action-based Visual Focusing for Vision-Language Models cs.CV · 2026-04-22 · unverdicted · none · ref 9
Foveated Reasoner integrates foveation as stateful actions inside the autoregressive decoding loop of vision-language models, trained via cold-start supervision then reinforcement learning to achieve higher accuracy at low token budgets.
TriTS: Time Series Forecasting from a Multimodal Perspective cs.CV · 2026-04-17 · unverdicted · none · ref 5
TriTS projects time series into time, frequency, and vision modalities with Period-Aware Reshaping and MR-WM to achieve SOTA long-term forecasting at lower computational cost.
Hero-Mamba: Mamba-based Dual Domain Learning for Underwater Image Enhancement cs.CV · 2026-04-17 · unverdicted · none · ref 2
Hero-Mamba combines parallel spatial-spectral Mamba processing and a background-light-guided ColorFusion block to enhance underwater images, reporting PSNR 25.802 and SSIM 0.913 on the LSUI benchmark.
CLIMB: Controllable Longitudinal Brain Image Generation using Mamba-based Latent Diffusion Model and Gaussian-aligned Autoencoder cs.CV · 2026-04-17 · unverdicted · none · ref 21
CLIMB generates controllable longitudinal brain MRI images from baseline scans using a Mamba-based latent diffusion model and Gaussian-aligned autoencoder, reporting SSIM 0.9433 on the ADNI dataset of 6306 scans.
HAMSA: Scanning-Free Vision State Space Models via SpectralPulseNet cs.CV · 2026-04-16 · unverdicted · none · ref 12
HAMSA achieves 85.7% ImageNet-1K top-1 accuracy as a spectral-domain SSM with 2.2x faster inference and lower memory than transformers or scanning-based SSMs.
MSR:Hybrid Field Modeling for CT-MRI Rigid-Deformable Registration of the Cervical Spine with an Annotated Dataset cs.CV · 2026-04-30 · unverdicted · none · ref 1 · internal anchor
MSR fuses per-vertebra rigid alignments with a gated Mamba-Swin deformable module to produce a hybrid deformation field for CT-MRI cervical spine registration, released together with the public R-D-Reg annotated dataset.
Selective Attention-Based Network for Robust Infrared Small Target Detection cs.CV · 2026-04-27 · unverdicted · none · ref 19 · internal anchor
SANet augments U-Net with a Dual-path Semantic-aware Module using pinwheel convolutions and CBAM, plus a Selective Attention Fusion Module for adaptive cross-scale feature fusion, to improve detection of sub-pixel infrared targets.
A Synergistic CNN-Transformer Network with Pooling Attention Fusion for Hyperspectral Image Classification cs.CV · 2026-04-26 · unverdicted · none · ref 38 · internal anchor
A new CNN-Transformer hybrid with twin-branch 3D/2D convolution, hybrid pooling attention, cascade spectral transformers, and cross-layer fusion reports higher accuracy than prior methods on standard hyperspectral datasets.
Breaking the Resource Wall: Geometry-Guided Sequence Modeling for Efficient Semantic Segmentation cs.CV · 2026-04-25 · unverdicted · none · ref 15 · internal anchor
DGM-Net reaches 82.3% mIoU on Cityscapes and 45.24% on ADE20K using directional geometric guidance inside a linear-complexity Mamba backbone, without heavy pretraining or large models.
NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results cs.CV · 2026-04-16 · unverdicted · none · ref 25 · internal anchor
The NTIRE 2026 Challenge released a public dataset of 2,000 videos with crowdsourced saliency maps and reported results from participating teams using standard quality metrics.
Hypergraph-State Collaborative Reasoning for Multi-Object Tracking cs.CV · 2026-04-14 · unverdicted · none · ref 20 · internal anchor
HyperSSM integrates hypergraphs and state space models to let correlated objects mutually refine motion estimates, stabilizing trajectories under noise and occlusion for state-of-the-art multi-object tracking.
Efficient Spatial-Temporal Focal Adapter with SSM for Temporal Action Detection cs.CV · 2026-04-10 · unverdicted · none · ref 14 · internal anchor
A new adapter module combining boundary-aware state space modeling with spatial processing boosts localization and robustness in temporal action detection.
HST-HGN: Heterogeneous Spatial-Temporal Hypergraph Networks with Bidirectional State Space Models for Global Fatigue Assessment cs.CV · 2026-04-09 · unverdicted · none · ref 15 · internal anchor
HST-HGN uses heterogeneous spatial-temporal hypergraph networks combined with bidirectional Mamba state space models to achieve state-of-the-art driver fatigue assessment from untrimmed videos while maintaining computational efficiency for real-time use.
MambaLiteUNet: Cross-Gated Adaptive Feature Fusion for Robust Skin Lesion Segmentation cs.CV · 2026-04-22 · unverdicted · none · ref 12
MambaLiteUNet integrates Mamba into U-Net with adaptive fusion, local-global mixing, and cross-gated attention modules to reach 87.12% IoU and 93.09% Dice on skin lesion datasets while cutting parameters by 93.6%.
MambaKick: Early Penalty Direction Prediction from HAR Embeddings cs.CV · 2026-04-17 · unverdicted · none · ref 13
MambaKick reuses pretrained HAR embeddings with Mamba temporal modeling to predict penalty kick direction, reaching 53.1% accuracy on three classes and 64.5% on two classes.
SSMamba: A Self-Supervised Hybrid State Space Model for Pathological Image Classification cs.CV · 2026-04-17 · unverdicted · none · ref 20
SSMamba uses a two-stage self-supervised pretraining and fine-tuning pipeline with Mamba-based components to outperform prior pathological foundation models on ROI and WSI classification tasks.
Learning Coarse-to-Fine Osteoarthritis Representations under Noisy Hierarchical Labels cs.CV · 2026-05-01 · unverdicted · none · ref 13 · internal anchor
Dual-head training on hierarchical OA labels yields backbone-dependent gains in KL metrics, more ordered latent severity axes, and better saliency alignment with cartilage for some 3D backbones.
ConvVitMamba: Efficient Multiscale Convolution, Transformer, and Mamba-Based Sequence modelling for Hyperspectral Image Classification cs.CV · 2026-04-20 · unverdicted · none · ref 8 · internal anchor
ConvVitMamba integrates multiscale convolution, transformer encoding, and Mamba-based refinement with PCA to outperform prior CNN, ViT, and Mamba methods in accuracy, size, and speed on four HSI benchmark datasets.
Beyond ZOH: Advanced Discretization Strategies for Vision Mamba cs.CV · 2026-04-22 · unverdicted · none · ref 11
Bilinear discretization improves Vision Mamba accuracy over zero-order hold on classification, segmentation, and detection benchmarks with only modest extra training cost.
A Hybrid Architecture for Benign-Malignant Classification of Mammography ROIs cs.CV · 2026-04-14 · unverdicted · none · ref 5 · internal anchor
Hybrid EfficientNetV2-M and Vision Mamba architecture achieves strong binary classification performance on abnormality-centered mammography ROIs from CBIS-DDSM.
The First Challenge on Remote Sensing Infrared Image Super-Resolution at NTIRE 2026: Benchmark Results and Method Overview cs.CV · 2026-04-23 · accept · none · ref 15
The NTIRE 2026 challenge establishes a benchmark for x4 super-resolution of remote sensing infrared images, with 13 teams submitting valid methods evaluated on a dedicated dataset.
The Fourth Challenge on Image Super-Resolution ($\times$4) at NTIRE 2026: Benchmark Results and Method Overview cs.CV · 2026-04-16 · unverdicted · none · ref 26 · internal anchor
The NTIRE 2026 ×4 super-resolution challenge benchmarks 31 teams on bicubic-downsampled images using PSNR for the restoration track and perceptual scores for the realism track.
Evolution of Video Generative Foundations cs.CV · 2026-04-07 · unverdicted · none · ref 106 · internal anchor
This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.
The Eleventh NTIRE 2026 Efficient Super-Resolution Challenge Report cs.CV · 2026-04-03 · unverdicted · none · ref 70 · internal anchor
The NTIRE 2026 report benchmarks 15 valid submissions that maintain ~26.9 dB PSNR on DIV2K_LSDIR while reducing runtime, parameters and FLOPs.
Mixture-of-Experts in Remote Sensing: A Survey cs.CV · 2026-04-03 · unverdicted · none · ref 37 · internal anchor
A survey providing the first systematic overview of Mixture-of-Experts models applied to remote sensing tasks, covering principles, architectures, applications, and future trends.
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models cs.CV · 2024-02-27 · unverdicted · none · ref 50 · internal anchor
The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer