DyABD is the first benchmark dataset for abdominal muscle segmentation in dynamic MRIs featuring exercise-induced anatomical changes and pre/post-surgery scans, where existing models achieve an average Dice score of 0.82.
hub
U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
38 Pith papers cite this work. Polarity classification is still indexing.
abstract
Convolutional Neural Networks (CNNs) and Transformers have been the most popular architectures for biomedical image segmentation, but both of them have limited ability to handle long-range dependencies because of inherent locality or computational complexity. To address this challenge, we introduce U-Mamba, a general-purpose network for biomedical image segmentation. Inspired by the State Space Sequence Models (SSMs), a new family of deep sequence models known for their strong capability in handling long sequences, we design a hybrid CNN-SSM block that integrates the local feature extraction power of convolutional layers with the abilities of SSMs for capturing the long-range dependency. Moreover, U-Mamba enjoys a self-configuring mechanism, allowing it to automatically adapt to various datasets without manual intervention. We conduct extensive experiments on four diverse tasks, including the 3D abdominal organ segmentation in CT and MR images, instrument segmentation in endoscopy images, and cell segmentation in microscopy images. The results reveal that U-Mamba outperforms state-of-the-art CNN-based and Transformer-based segmentation networks across all tasks. This opens new avenues for efficient long-range dependency modeling in biomedical image analysis. The code, models, and data are publicly available at https://wanglab.ai/u-mamba.html.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Introduces RAM-W600, the first public multi-task dataset of wrist conventional radiographs with instance segmentation annotations and Sharp/van der Heijde bone erosion scores for rheumatoid arthritis research.
RAM-H1200 introduces a public dataset of 1,200 hand X-rays with whole-hand bone segmentation, pixel-level bone erosion masks, and joint-level SvdH scores for both erosion and narrowing to enable unified RA analysis.
AG-TAL loss improves multiclass Circle of Willis segmentation to 80.85% average Dice with 1-3% gains on small arteries across multi-center datasets by embedding anatomical priors into topology-aware terms.
Camyla autonomously generates research proposals, experiments, and manuscripts in medical image segmentation, outperforming baselines on 24 of 31 recent datasets while producing 40 human-reviewed papers.
Presents the first unsupervised source-free framework for ranking semantic and instance segmentation models via prediction consistency under perturbations, with rankings correlating to target-domain performance across 2D/3D biomedical tasks.
Vim is a bidirectional Mamba vision backbone that outperforms DeiT in accuracy on standard tasks while being substantially faster and more memory-efficient for high-resolution images.
BiSegMamba is a bidirectional tri-oriented Mamba architecture that improves performance and reduces FLOPs in 3D medical image segmentation across brain, cardiac, abdominal, and vascular tasks.
A multimodal training pipeline with phonological bounding-box priors and cross-modal contrastive alignment transfers speech supervision to single-modality rtMRI vocal tract segmentation and outperforms prior methods on two datasets.
MambaPanoptic is a fully Mamba-based panoptic segmentation model that uses MambaFPN for multi-scale features and a QuadMamba kernel generator to outperform PanopticDeepLab and PanopticFCN on Cityscapes and COCO while using fewer parameters than Mask2Former.
EmambaIR is a visual state space model with cross-modal top-k sparse attention and gated SSM components that outperforms prior CNN and ViT methods on event-guided deblurring, deraining, and HDR reconstruction while reducing memory and compute costs.
SAMamba3D adapts a frozen SAM encoder with Mamba volumetric context and cross-scale features to match or exceed 3D baselines on diverse sandstone and carbonate datasets while reducing case-specific retraining.
CrossPan benchmark shows cross-sequence MRI domain shifts cause pancreas segmentation models to fail catastrophically, establishing sequence generalization as the primary barrier to clinical deployment over center variability or architecture choices.
CloudMamba combines uncertainty-guided refinement with a dual-scale Mamba network to outperform prior methods on cloud segmentation accuracy while maintaining linear computational cost.
GCNV-Net achieves state-of-the-art accuracy on multiple 3D medical segmentation benchmarks while cutting FLOPs by 56% and inference latency by 68% through dynamic nonvoid voxelization and geometric attention.
The paper defines the Conformal Hallucination Estimation Metric (CHEM) that localizes hallucination-prone regions in image reconstruction models via multiscale representations and distribution-free conformal regression.
Diff-UMamba combines UNet with Mamba and adds signal differencing for noise reduction, yielding 1-3% segmentation gains on public medical datasets and 4-5% on a small internal lung cancer dataset under limited data conditions.
Presents COMMA, a coordinate-aware Mamba network for 3D vessel segmentation that uses global and local branches, along with a new 570-case labeled dataset.
Gated linear attention Transformers achieve competitive language modeling results with linear-time inference, superior length generalization, and higher training throughput than Mamba.
RadGenome-Anatomy is a large-scale chest radiograph dataset with anatomy labels obtained by projecting 3D CT masks into 2D radiographic space for 210 structures in 25,692 studies.
MHMamba combines a U-Net with multi-head Mamba, channel calibration, and adaptive skip fusion to improve 3D brain tumor segmentation accuracy and small-lesion sensitivity on BraTS datasets while retaining linear complexity.
USEMA is a hybrid UNet architecture merging CNNs with scalable Mamba-like attention (SEMA) that achieves better efficiency than transformers and superior segmentation accuracy than pure CNN or Mamba models across medical imaging modalities.
TopoMamba improves medical image segmentation by combining topology-aware diagonal scans with standard cross-scans and a HSIC Gate for efficient fusion, yielding gains on thin and curved targets like the pancreas.
GroupKAN reduces KAN parameter scaling via intra-group spline mappings, delivering 79.80% average IoU (+1.11% over U-KAN) at 47.6% of the parameters on BUSI, GlaS, and CVC datasets.
citing papers explorer
-
DyABD: The Abdominal Muscle Segmentation in Dynamic MRI Benchmark
DyABD is the first benchmark dataset for abdominal muscle segmentation in dynamic MRIs featuring exercise-induced anatomical changes and pre/post-surgery scans, where existing models achieve an average Dice score of 0.82.
-
RAM-W600: A Multi-Task Wrist Dataset and Benchmark for Rheumatoid Arthritis
Introduces RAM-W600, the first public multi-task dataset of wrist conventional radiographs with instance segmentation annotations and Sharp/van der Heijde bone erosion scores for rheumatoid arthritis research.
-
RAM-H1200: A Unified Evaluation and Dataset on Hand Radiographs for Rheumatoid Arthritis
RAM-H1200 introduces a public dataset of 1,200 hand X-rays with whole-hand bone segmentation, pixel-level bone erosion masks, and joint-level SvdH scores for both erosion and narrowing to enable unified RA analysis.
-
AG-TAL: Anatomically-Guided Topology-Aware Loss for Multiclass Segmentation of the Circle of Willis Using Large-Scale Multi-Center Datasets
AG-TAL loss improves multiclass Circle of Willis segmentation to 80.85% average Dice with 1-3% gains on small arteries across multi-center datasets by embedding anatomical priors into topology-aware terms.
-
Camyla: Scaling Autonomous Research in Medical Image Segmentation
Camyla autonomously generates research proposals, experiments, and manuscripts in medical image segmentation, outperforming baselines on 24 of 31 recent datasets while producing 40 human-reviewed papers.
-
Unsupervised Source-Free Ranking of Biomedical Segmentation Models Under Distribution Shift
Presents the first unsupervised source-free framework for ranking semantic and instance segmentation models via prediction consistency under perturbations, with rankings correlating to target-domain performance across 2D/3D biomedical tasks.
-
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Vim is a bidirectional Mamba vision backbone that outperforms DeiT in accuracy on standard tasks while being substantially faster and more memory-efficient for high-resolution images.
-
BiSegMamba: Efficient Bidirectional Tri-Oriented Mamba for 3D Medical Image Segmentation
BiSegMamba is a bidirectional tri-oriented Mamba architecture that improves performance and reduces FLOPs in 3D medical image segmentation across brain, cardiac, abdominal, and vascular tasks.
-
Speech-Guided Multimodal Learning for Vocal Tract Segmentation in Real-Time MRI
A multimodal training pipeline with phonological bounding-box priors and cross-modal contrastive alignment transfers speech supervision to single-modality rtMRI vocal tract segmentation and outperforms prior methods on two datasets.
-
MambaPanoptic: A Vision Mamba-based Structured State Space Framework for Panoptic Segmentation
MambaPanoptic is a fully Mamba-based panoptic segmentation model that uses MambaFPN for multi-scale features and a QuadMamba kernel generator to outperform PanopticDeepLab and PanopticFCN on Cityscapes and COCO while using fewer parameters than Mask2Former.
-
EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction
EmambaIR is a visual state space model with cross-modal top-k sparse attention and gated SSM components that outperforms prior CNN and ViT methods on event-guided deblurring, deraining, and HDR reconstruction while reducing memory and compute costs.
-
SAMamba3D: adapting Segment Anything for generalizable 3D segmentation of multiphase pore-scale images
SAMamba3D adapts a frozen SAM encoder with Mamba volumetric context and cross-scale features to match or exceed 3D baselines on diverse sandstone and carbonate datasets while reducing case-specific retraining.
-
CrossPan: A Comprehensive Benchmark for Cross-Sequence Pancreas MRI Segmentation and Generalization
CrossPan benchmark shows cross-sequence MRI domain shifts cause pancreas segmentation models to fail catastrophically, establishing sequence generalization as the primary barrier to clinical deployment over center variability or architecture choices.
-
CloudMamba: An Uncertainty-Guided Dual-Scale Mamba Network for Cloud Detection in Remote Sensing Imagery
CloudMamba combines uncertainty-guided refinement with a dual-scale Mamba network to outperform prior methods on cloud segmentation accuracy while maintaining linear computational cost.
-
Geometrical Cross-Attention and Nonvoid Voxelization for Efficient 3D Medical Image Segmentation
GCNV-Net achieves state-of-the-art accuracy on multiple 3D medical segmentation benchmarks while cutting FLOPs by 56% and inference latency by 68% through dynamic nonvoid voxelization and geometric attention.
-
CHEM: Estimating and Understanding Hallucinations in Deep Learning for Image Processing
The paper defines the Conformal Hallucination Estimation Metric (CHEM) that localizes hallucination-prone regions in image reconstruction models via multiscale representations and distribution-free conformal regression.
-
Differential-UMamba: Rethinking Tumor Segmentation Under Limited Data Scenarios
Diff-UMamba combines UNet with Mamba and adds signal differencing for noise reduction, yielding 1-3% segmentation gains on public medical datasets and 4-5% on a small internal lung cancer dataset under limited data conditions.
-
COMMA: Coordinate-aware Modulated Mamba Network for 3D Dispersed Vessel Segmentation
Presents COMMA, a coordinate-aware Mamba network for 3D vessel segmentation that uses global and local branches, along with a new 570-case labeled dataset.
-
Gated Linear Attention Transformers with Hardware-Efficient Training
Gated linear attention Transformers achieve competitive language modeling results with linear-time inference, superior length generalization, and higher training throughput than Mamba.
-
RadGenome-Anatomy: A Large-Scale Anatomy-Labeled Chest Radiograph Dataset via Physically Grounded Volumetric Projection
RadGenome-Anatomy is a large-scale chest radiograph dataset with anatomy labels obtained by projecting 3D CT masks into 2D radiographic space for 210 structures in 25,692 studies.
-
MHMamba: Multi-Head Mamba for 3D Brain Tumor Segmentation
MHMamba combines a U-Net with multi-head Mamba, channel calibration, and adaptive skip fusion to improve 3D brain tumor segmentation accuracy and small-lesion sensitivity on BraTS datasets while retaining linear complexity.
-
USEMA: a Scalable Efficient Mamba Like Attention for Medical Image Segmentation
USEMA is a hybrid UNet architecture merging CNNs with scalable Mamba-like attention (SEMA) that achieves better efficiency than transformers and superior segmentation accuracy than pure CNN or Mamba models across medical imaging modalities.
-
TopoMamba: Topology-Aware Scanning and Fusion for Segmenting Heterogeneous Medical Visual Media
TopoMamba improves medical image segmentation by combining topology-aware diagonal scans with standard cross-scans and a HSIC Gate for efficient fusion, yielding gains on thin and curved targets like the pancreas.
-
GroupKAN: Efficient Kolmogorov-Arnold Networks via Grouped Spline Modeling
GroupKAN reduces KAN parameter scaling via intra-group spline mappings, delivering 79.80% average IoU (+1.11% over U-KAN) at 47.6% of the parameters on BUSI, GlaS, and CVC datasets.
-
Dino U-Net: Exploiting High-Fidelity Dense Features from Foundation Models for Medical Image Segmentation
Dino U-Net combines a frozen DINOv3 backbone with an adapter and fidelity-aware projection module to achieve state-of-the-art medical image segmentation across seven public datasets.
-
FADPNet: Frequency-Aware Dual-Path Network for Face Super-Resolution
FADPNet decomposes facial features into low- and high-frequency components processed by dedicated Mamba and CNN modules to balance quality and efficiency in face super-resolution.
-
EventCrab: Harnessing Frame and Point Synergy for Event-based Action Recognition and Beyond
EventCrab integrates frame and point networks with a joint representation space, SCL, and Hilbert-scan EPE to improve event-based action recognition by 5-7% on two datasets.
-
SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge
SegSTRONG-C provides a new benchmark where top models reach 0.9394 DSC and 0.9301 NSD on corrupted surgical tool segmentation tests, showing conventional techniques help but calling for more innovative robustness methods.
-
3DMambaComplete: Exploring Structured State Space Model for Point Cloud Completion
3DMambaComplete applies the Mamba model to point cloud completion via hyperpoint generation, spatial spreading, and mesh deformation, claiming better results than prior methods on benchmarks.
-
EnergyMamba: An Uncertainty-Aware Graph-Enhanced Selective State Space Model for Energy Consumption Prediction
EnergyMamba improves energy consumption prediction accuracy by about 5% and uncertainty quantification by about 6% over 15 baselines on four real-world US datasets by combining graph-enhanced Mamba with adaptive sequential conformalized quantile regression.
-
CoRE: Concept-Reasoning Expansion for Continual Brain Lesion Segmentation
CoRE aligns image tokens to a hierarchical concept library to simulate clinical reasoning for expert routing and demand-based growth in continual brain lesion segmentation, achieving SOTA on 12 tasks.
-
Delving Aleatoric Uncertainty in Medical Image Segmentation via Vision Foundation Models
Vision foundation models quantify aleatoric uncertainty via feature diversity and singular value energy to enable uncertainty-aware data filtering and dynamic training optimization for improved medical image segmentation.
-
Enhancing Medical Image Segmentation via Heat Conduction Equation
Hybrid U-Mamba architecture with Heat Conduction Operators achieves DSC of 0.8719 on Abdomen CT dataset by simulating frequency-domain thermal diffusion.
-
Attention Is not Everything: Efficient Alternatives for Vision
A survey that taxonomizes non-Transformer vision models and evaluates their practical trade-offs across efficiency, scalability, and robustness.
-
A Survey of Mamba
The paper consolidates existing research on Mamba models, their architecture variants, adaptations to different data modalities, and applications across domains.
-
Advancing Intelligent Sequence Modeling: Evolution, Trade-offs, and Applications of State- Space Architectures from S4 to Mamba
A survey tracing the evolution of state-space models like S4 and Mamba, their efficiency trade-offs, and applications in NLP, vision, and other domains.
- SurgicalMamba: Dual-Path SSD with State Regramming for Online Surgical Phase Recognition
- Adaptable Segmentation Pipeline for Diverse Brain Tumors with Radiomic-Guided Subtyping and Lesion-Wise Model Ensemble