pith. sign in

arxiv: 2605.16464 · v1 · pith:6PTGMI5Wnew · submitted 2026-05-15 · 💻 cs.CV · cs.AI

MHMamba: Multi-Head Mamba for 3D Brain Tumor Segmentation

Pith reviewed 2026-05-20 18:18 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords 3D segmentationbrain tumorMambastate space modelMRIU-Netmulti-head architectureBraTS
0
0 comments X

The pith

MHMamba splits MRI channels into parallel Mamba heads to raise accuracy and boundary precision in 3D brain tumor segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Brain tumors vary widely in shape and contrast across MRI modalities, so manual outlining is slow and operator-dependent. MHMamba places a multi-head state-space model inside a U-shaped network, dividing the channel dimension into several parallel SSM heads that are aggregated with residual links. This captures distant context at linear cost while a channel-space calibration module and adaptive skip fusion align global semantics with local detail. Experiments on BraTS2021 and BraTS2023 show consistent gains in overall Dice scores, smoother tumor boundaries, and higher sensitivity to small enhancing regions without increasing computational complexity.

Core claim

By splitting the channel dimension into parallel state-space model heads, aggregating their outputs with residuals, and adding channel-space calibration plus adaptive skip fusion, MHMamba improves long-range representation and multimodal stability in 3D MRI volumes while retaining linear complexity and delivering measurable gains in accuracy, boundary smoothness, and small-lesion sensitivity on BraTS2021 and BraTS2023.

What carries the argument

Multi-head state-space model that splits the channel dimension into parallel SSM heads, aggregates them with residuals, then applies channel-space calibration and adaptive fusion at skip connections.

If this is right

  • Long-range dependencies in full 3D MRI volumes can be modeled without the quadratic memory cost of self-attention.
  • Boundary consistency and detection of small enhancing tumor volumes improve through explicit alignment of global and local signals.
  • Multimodal training remains stable across heterogeneous contrasts because parallel heads reduce contextual incoherence.
  • Linear scaling is preserved, allowing practical processing of high-resolution 3D scans on standard clinical hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same parallel-head pattern could be tested on other 3D medical volumes such as CT organ segmentation where both wide context and fine edges are needed.
  • Varying the number of heads may expose an optimal width for different tumor grades or acquisition protocols.
  • Pairing MHMamba with self-supervised pretraining on unlabeled MRI could further lift performance in low-data hospital settings.

Load-bearing premise

That splitting channels into parallel SSM heads with residual aggregation plus calibration and adaptive fusion will reliably improve long-range modeling and stability without losing local features or introducing new instabilities in 3D MRI volumes.

What would settle it

An ablation on the same BraTS2021 or BraTS2023 data in which the multi-head splitting is removed and no drop occurs in tumor-core or enhancing-tumor Dice scores or boundary metrics.

Figures

Figures reproduced from arXiv: 2605.16464 by Fan Zhang, Hanjun Tao, Hua Wang.

Figure 1
Figure 1. Figure 1: (a) shows the overall structure of the multi-head Mamba model, (b) shows the structure of the Multi-Head Mamba block in the [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (a) shows the specific structure of the Multi-head Mamba block, and (b) shows the architecture of the Mamba block. We adopt [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative visualization results of the BraTS2021 dataset. Key areas marked by yellow boxes demonstrate the significant [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Brain tumors exhibit high heterogeneity in morphology and multimodal contrast, making manual slice-by-slice de lineation time-consuming and experience-dependent, thus necessitating efficient and stable automated segmentation methods. To address the limitations of CNNs in modeling long-range dependencies, and the heavy computational and memory overhead and inter-block contextual in coherence of Transformers in 3D MRI, this paper proposes Multi-Head Mamba (MHMamba). This method combines a U-shaped architecture with a multi-head state-space model (Mamba), splitting the channel dimension into parallel SSM heads and aggregating them with residuals. This enhances long-range representation and improves the stability of multimodal training while maintaining linear complexity. To further align statistics and enhance lesion response, we designed a channel-space calibration module for multi-head outputs and introduced an adaptive fusion mechanism at skip connections to dynamically connect global semantics with local details, thereby improving boundary consistency and the detection of small-volume lesions. We conducted experiments and ablations on BraTS2021 and BraTS2023. The results showed that MHMamba achieved stable and significant improvements in overall accuracy, boundary smoothness, and sensitivity to tumor core and small-volume enhancement areas, while preserving the linear-complexity advantage of Mamba-based modeling, thus verifying the effectiveness and versatility of the method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes MHMamba, a U-shaped architecture for 3D brain tumor segmentation that integrates multi-head state-space models (Mamba). It splits the channel dimension into parallel SSM heads, aggregates outputs with residuals, adds a channel-space calibration module, and introduces adaptive fusion at skip connections to improve long-range multimodal modeling while preserving linear complexity. Experiments and ablations on BraTS2021 and BraTS2023 are claimed to show stable improvements in overall accuracy, boundary smoothness, and sensitivity to tumor core and small-volume enhancing regions.

Significance. If the empirical claims hold with proper validation, the work could demonstrate a practical way to extend Mamba's efficiency advantages to heterogeneous 3D medical volumes, offering a lower-complexity alternative to Transformers for tasks requiring both global context and fine local contrast in multimodal MRI.

major comments (2)
  1. The central design choice of splitting the channel dimension into parallel SSM heads (described in the method) risks reducing per-head representational capacity for spatially localized high-frequency features in 3D BraTS volumes; the manuscript should supply a concrete analysis or ablation demonstrating that residual aggregation, channel-space calibration, and adaptive skip fusion compensate without loss of detail relative to single-head Mamba or standard CNN baselines.
  2. The abstract and results description assert 'stable and significant improvements' on BraTS2021 and BraTS2023 but supply no quantitative metrics, error bars, ablation tables, baseline comparisons, or statistical tests, so the data-to-claim link cannot be evaluated.
minor comments (2)
  1. Abstract: 'de lineation' appears to be a typo and should read 'delineation'.
  2. The phrase 'inter-block contextual incoherence' for Transformers would benefit from a brief clarification or citation to make the motivation more precise.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing clarifications on the design rationale and committing to revisions that strengthen the empirical presentation without altering the core contributions.

read point-by-point responses
  1. Referee: The central design choice of splitting the channel dimension into parallel SSM heads (described in the method) risks reducing per-head representational capacity for spatially localized high-frequency features in 3D BraTS volumes; the manuscript should supply a concrete analysis or ablation demonstrating that residual aggregation, channel-space calibration, and adaptive skip fusion compensate without loss of detail relative to single-head Mamba or standard CNN baselines.

    Authors: We thank the referee for highlighting this potential concern. The multi-head splitting is motivated by enabling parallel specialization across channel subspaces for heterogeneous multimodal MRI features, while residual aggregation explicitly preserves the full input representation to each head. The channel-space calibration module then recalibrates statistics to enhance lesion-specific responses, and adaptive skip fusion dynamically balances global context with local high-frequency details. Our existing ablation studies (Section 4) compare multi-head configurations against single-head Mamba variants and show maintained or improved boundary and small-lesion metrics. To more directly address the request for concrete analysis on high-frequency feature preservation, we have added targeted visualizations of feature maps and an expanded ablation table in the revised manuscript demonstrating no loss of detail relative to baselines. revision: yes

  2. Referee: The abstract and results description assert 'stable and significant improvements' on BraTS2021 and BraTS2023 but supply no quantitative metrics, error bars, ablation tables, baseline comparisons, or statistical tests, so the data-to-claim link cannot be evaluated.

    Authors: We agree that the abstract and narrative summaries would benefit from more explicit quantitative support to strengthen the data-to-claim connection. While the experimental section and tables already present Dice, HD95, sensitivity metrics, ablation results, and baseline comparisons on BraTS2021 and BraTS2023, we acknowledge the high-level claims in the abstract and results overview lack specific numbers and statistical backing. In the revised manuscript we have updated the abstract with key quantitative metrics and improvement deltas, added error bars to performance figures, expanded the ablation tables with direct single-head and CNN comparisons, and included results from statistical significance tests to substantiate the reported improvements. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on empirical results from external BraTS benchmarks

full rationale

The paper proposes an architectural extension (multi-head SSM splitting, residual aggregation, channel calibration, adaptive skip fusion) inside a U-Net and validates it via experiments and ablations on the independent BraTS2021/BraTS2023 datasets. No equations, uniqueness theorems, or self-citations are invoked to derive the performance gains; the improvements are presented as observed outcomes rather than forced by construction or prior self-referential results. The derivation chain is therefore self-contained against external data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the effectiveness of the described architectural modifications to standard U-Net and Mamba components; no explicit free parameters, axioms, or new entities are stated.

pith-pipeline@v0.9.0 · 5755 in / 1218 out tokens · 81755 ms · 2026-05-20T18:18:15.687123+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 4 internal anchors

  1. [1]

    Drbd-mamba for robust and efficient brain tumor segmentation with analytical insights.arXiv preprint arXiv:2510.14383, 2025

    Danish Ali, Ajmal Mian, Naveed Akhtar, and Ghu- lam Mubashar Hassan. Drbd-mamba for robust and efficient brain tumor segmentation with analytical insights.arXiv preprint arXiv:2510.14383, 2025. 4

  2. [2]

    The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification

    Ujjwal Baid, Satyam Ghodasara, Suyash Mohan, Michel Bilello, Evan Calabrese, Errol Colak, Keyvan Farahani, Jayashree Kalpathy-Cramer, Felipe C Kitamura, Sarthak Pati, et al. The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification. arXiv preprint arXiv:2107.02314, 2021. 2

  3. [3]

    Bingzhi Chen, Yishu Liu, Zheng Zhang, Guangming Lu, and Adams Wai Kin Kong. Transattunet: Multi-level attention- guided u-net with transformer for medical image segmenta- tion.IEEE Transactions on Emerging Topics in Computa- tional Intelligence, 8(1):55–68, 2023. 2

  4. [4]

    TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

    Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. Transunet: Transformers make strong encoders for medi- cal image segmentation.arXiv preprint arXiv:2102.04306,

  5. [5]

    Offset: Segmentation-based focus shift revision for composed image retrieval

    Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Xuemeng Song, and Liqiang Nie. Offset: Segmentation-based focus shift revision for composed image retrieval. InProceedings of the ACM International Conference on Multimedia, page 6113–6122, 2025. 6

  6. [6]

    Hud: Hierarchical uncertainty-aware disambiguation network for composed video retrieval

    Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Haokun Wen, and Weili Guan. Hud: Hierarchical uncertainty-aware disambiguation network for composed video retrieval. In Proceedings of the ACM International Conference on Mul- timedia, page 6143–6152, 2025. 6

  7. [7]

    Intent: Invariance and discrimination-aware noise mitigation for robust composed image retrieval

    Zhiwei Chen, Yupeng Hu, Zhiheng Fu, Zixu Li, Jiale Huang, Qinlei Huang, and Yinwei Wei. Intent: Invariance and discrimination-aware noise mitigation for robust composed image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, pages 20463–20471, 2026. 6

  8. [8]

    Semantic segmentation of re- mote sensing imagery based on multiscale deformable cnn and densecrf.Remote Sensing, 15(5):1229, 2023

    Xiang Cheng and Hong Lei. Semantic segmentation of re- mote sensing imagery based on multiscale deformable cnn and densecrf.Remote Sensing, 15(5):1229, 2023. 2

  9. [9]

    Mamba: Linear-time sequence mod- eling with selective state spaces

    Albert Gu and Tri Dao. Mamba: Linear-time sequence mod- eling with selective state spaces. InFirst conference on lan- guage modeling, 2024. 1

  10. [10]

    Squeeze-and-excitation net- works

    Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation net- works. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018. 2

  11. [11]

    Refine: Composed video retrieval via shared and differential semantics enhancement

    Yupeng Hu, Zixu Li, Zhiwei Chen, Qinlei Huang, Zhiheng Fu, Mingzhu Xu, and Liqiang Nie. Refine: Composed video retrieval via shared and differential semantics enhancement. ACM Transactions on Multimedia Computing, Communica- tions and Applications, 2026. 5

  12. [12]

    Multi-modal brain tumor segmentation via 3d multi- scale self-attention and cross-attention.arXiv preprint arXiv:2504.09088, 2025

    Yonghao Huang, Leiting Chen, and Chuan Zhou. Multi-modal brain tumor segmentation via 3d multi- scale self-attention and cross-attention.arXiv preprint arXiv:2504.09088, 2025. 2

  13. [13]

    nnu-net: a self-configuring method for deep learning-based biomedical image segmen- tation.Nature methods, 18(2):203–211, 2021

    Fabian Isensee, Paul F Jaeger, Simon AA Kohl, Jens Pe- tersen, and Klaus H Maier-Hein. nnu-net: a self-configuring method for deep learning-based biomedical image segmen- tation.Nature methods, 18(2):203–211, 2021. 3, 7

  14. [14]

    The brain tumor segmentation (brats) challenge 2023: Brain mr image synthesis for tumor segmentation (brasyn).ArXiv, pages arXiv–2305, 2024

    Hongwei Bran Li, Gian Marco Conte, Qingqiao Hu, Syed Muhammad Anwar, Florian Kofler, Ivan Ezhov, Koen van Leemput, Marie Piraud, Maria Diaz, Byrone Cole, et al. The brain tumor segmentation (brats) challenge 2023: Brain mr image synthesis for tumor segmentation (brasyn).ArXiv, pages arXiv–2305, 2024. 2

  15. [15]

    Encoder: Entity mining and modifica- tion relation binding for composed image retrieval

    Zixu Li, Zhiwei Chen, Haokun Wen, Zhiheng Fu, Yupeng Hu, and Weili Guan. Encoder: Entity mining and modifica- tion relation binding for composed image retrieval. InPro- ceedings of the AAAI Conference on Artificial Intelligence, pages 5101–5109, 2025. 5

  16. [16]

    Finecir: Explicit parsing of fine- grained modification semantics for composed image re- trieval.https://arxiv.org/abs/2503.21309, 2025

    Zixu Li, Zhiheng Fu, Yupeng Hu, Zhiwei Chen, Haokun Wen, and Liqiang Nie. Finecir: Explicit parsing of fine- grained modification semantics for composed image re- trieval.https://arxiv.org/abs/2503.21309, 2025. 5

  17. [17]

    Retrack: Evidence-driven dual-stream directional anchor calibration network for com- posed video retrieval

    Zixu Li, Yupeng Hu, Zhiwei Chen, Qinlei Huang, Guozhi Qiu, Zhiheng Fu, and Meng Liu. Retrack: Evidence-driven dual-stream directional anchor calibration network for com- posed video retrieval. InProceedings of the AAAI Confer- ence on Artificial Intelligence, pages 23373–23381, 2026. 6

  18. [18]

    Habit: Chrono- synergia robust progressive learning framework for com- posed image retrieval

    Zixu Li, Yupeng Hu, Zhiwei Chen, Shiqi Zhang, Qinlei Huang, Zhiheng Fu, and Yinwei Wei. Habit: Chrono- synergia robust progressive learning framework for com- posed image retrieval. InProceedings of the AAAI Confer- ence on Artificial Intelligence, pages 6762–6770, 2026. 7

  19. [19]

    Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A.W.M

    Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Ar- naud A.A. Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A.W.M. van der Laak, Bram van Ginneken, and Clara I. S´anchez. A survey on deep learning in medical im- age analysis.Medical Image Analysis, 42:60–88, 2017. 1

  20. [20]

    Consistency-driven state-space model for incomplete multi- modal mri brain tumor segmentation.Meta-Radiology, page 100185, 2025

    Debao Liu, Xiaozhi Zhang, Hong Zhou, and Kok Lay Teo. Consistency-driven state-space model for incomplete multi- modal mri brain tumor segmentation.Meta-Radiology, page 100185, 2025. 4

  21. [21]

    Vmamba: Visual state space model.Advances in neural information processing systems, 37:103031–103063, 2024

    Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, Jianbin Jiao, and Yunfan Liu. Vmamba: Visual state space model.Advances in neural information processing systems, 37:103031–103063, 2024. 3

  22. [22]

    U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

    Jun Ma, Feifei Li, and Bo Wang. U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722, 2024. 4

  23. [23]

    The multimodal brain tumor image segmentation benchmark (brats).IEEE transactions on medical imaging, 34(10):1993–2024, 2014

    Bjoern H Menze, Andras Jakab, Stefan Bauer, Jayashree Kalpathy-Cramer, Keyvan Farahani, Justin Kirby, Yuliya Burren, Nicole Porz, Johannes Slotboom, Roland Wiest, et al. The multimodal brain tumor image segmentation benchmark (brats).IEEE transactions on medical imaging, 34(10):1993–2024, 2014. 1

  24. [24]

    V-net: Fully convolutional neural networks for volumetric medical image segmentation

    Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In2016 fourth international conference on 3D vision (3DV), pages 565–571. Ieee, 2016. 2

  25. [25]

    Attention U-Net: Learning Where to Look for the Pancreas

    Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y Hammerla, Bernhard Kainz, et al. Atten- tion u-net: Learning where to look for the pancreas.arXiv preprint arXiv:1804.03999, 2018. 2

  26. [26]

    Cbtrus statistical report: primary brain and other central ner- vous system tumors diagnosed in the united states in 2015– 2019.Neuro-oncology, 24(Supplement 5):v1–v95, 2022

    Quinn T Ostrom, Mackenzie Price, Corey Neff, Gino Cioffi, Kristin A Waite, Carol Kruchko, and Jill S Barnholtz-Sloan. Cbtrus statistical report: primary brain and other central ner- vous system tumors diagnosed in the united states in 2015– 2019.Neuro-oncology, 24(Supplement 5):v1–v95, 2022. 1

  27. [27]

    Vcanet: Vision transformer with fusion channel and spatial attention module for 3d brain tu- mor segmentation.Computers in Biology and Medicine, 186: 109662, 2025

    Dichao Pan, Jianguo Shen, Zaid Al-Huda, and Mo- hammed AA Al-Qaness. Vcanet: Vision transformer with fusion channel and spatial attention module for 3d brain tu- mor segmentation.Computers in Biology and Medicine, 186: 109662, 2025. 3, 7

  28. [28]

    A variant form of 3d-unet for infant brain segmen- tation.Future Generation Computer Systems, 108:613–623,

    Saqib Qamar, Hai Jin, Ran Zheng, Parvez Ahmad, and Mohd Usama. A variant form of 3d-unet for infant brain segmen- tation.Future Generation Computer Systems, 108:613–623,

  29. [29]

    U- net: Convolutional networks for biomedical image segmen- tation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. InInternational Conference on Medical image com- puting and computer-assisted intervention, pages 234–241. Springer, 2015. 1, 2

  30. [30]

    Attention is all you need.Advances in neural information processing systems, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 1

  31. [31]

    Computing nodes for plane data points by constructing cubic polynomial with constraints

    Hua Wang and Fan Zhang. Computing nodes for plane data points by constructing cubic polynomial with constraints. Computer Aided Geometric Design, 111:102308, 2024. 3

  32. [32]

    Eeo-tfv: Escape- explore optimizer for web-scale time-series forecasting and vision analysis.arXiv preprint arXiv:2602.02551, 2026

    Hua Wang, Jinghao Lu, and Fan Zhang. Eeo-tfv: Escape- explore optimizer for web-scale time-series forecasting and vision analysis.arXiv preprint arXiv:2602.02551, 2026. 2

  33. [33]

    Transbts: Multimodal brain tumor seg- mentation using transformer

    Wenxuan Wang, Chen Chen, Meng Ding, Hong Yu, Sen Zha, and Jiangyun Li. Transbts: Multimodal brain tumor seg- mentation using transformer. InInternational conference on medical image computing and computer-assisted interven- tion, pages 109–119. Springer, 2021. 2

  34. [34]

    Cbam: Convolutional block attention module

    Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018. 2

  35. [35]

    From points to coalitions: Hierarchical contrastive shapley values for prioritizing data samples

    Canran Xiao, Jiabao Dou, Zhiming Lin, Zong Ke, and Li- wei Hou. From points to coalitions: Hierarchical contrastive shapley values for prioritizing data samples. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 15995–16003, 2026. 7

  36. [36]

    Reversible primitive–composition align- ment for continual vision–language learning

    Canran Xiao, Tianxiang Xu, Yiyang Jiang, Haoyu Gao, Yuhan Wu, et al. Reversible primitive–composition align- ment for continual vision–language learning. InThe Four- teenth International Conference on Learning Representa- tions, 2026. 5

  37. [37]

    Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation

    Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu, and Lei Zhu. Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation. InInternational conference on medical image computing and computer-assisted interven- tion, pages 578–588. Springer, 2024. 4, 7

  38. [38]

    Segmamba-v2: Long-range sequential modeling mamba for general 3d med- ical image segmentation.IEEE Transactions on Medical Imaging, 2025

    Zhaohu Xing, Tian Ye, Yijun Yang, Du Cai, Baowen Gai, Xiao-Jian Wu, Feng Gao, and Lei Zhu. Segmamba-v2: Long-range sequential modeling mamba for general 3d med- ical image segmentation.IEEE Transactions on Medical Imaging, 2025. 7

  39. [39]

    Multi-scale video super-resolution trans- former with polynomial approximation.IEEE Transactions on Circuits and Systems for Video Technology, 33(9):4496– 4506, 2023

    Fan Zhang, Gongguan Chen, Hua Wang, Jinjiang Li, and Caiming Zhang. Multi-scale video super-resolution trans- former with polynomial approximation.IEEE Transactions on Circuits and Systems for Video Technology, 33(9):4496– 4506, 2023. 3

  40. [40]

    Cf-dan: Facial-expression recognition based on cross-fusion dual-attention network.Computational Visual Media, 10(3):593–608, 2024

    Fan Zhang, Gongguan Chen, Hua Wang, and Caiming Zhang. Cf-dan: Facial-expression recognition based on cross-fusion dual-attention network.Computational Visual Media, 10(3):593–608, 2024. 3

  41. [41]

    Decoding with struc- tured awareness: integrating directional, frequency-spatial, and structural attention for medical image segmentation

    Fan Zhang, Zhiwei Gu, and Hua Wang. Decoding with struc- tured awareness: integrating directional, frequency-spatial, and structural attention for medical image segmentation. In Proceedings of the AAAI Conference on Artificial Intelli- gence, pages 12421–12429, 2026. 6

  42. [42]

    Light-unet: An efficient segmentation network for medical image

    Yue Zhang, Chao Xu, Zhifan Zhang, and Jianjun Wang. Light-unet: An efficient segmentation network for medical image. InInternational Conference on Intelligent Comput- ing, pages 302–313. Springer, 2024. 7

  43. [43]

    Incomplete multi-modal brain tumor segmentation via learnable sorting state space model

    Zheyu Zhang, Yayuan Lu, Feipeng Ma, Yueyi Zhang, Huan- jing Yue, and Xiaoyan Sun. Incomplete multi-modal brain tumor segmentation via learnable sorting state space model. InProceedings of the Computer Vision and Pattern Recogni- tion Conference, pages 25982–25992, 2025. 4

  44. [44]

    nnformer: Interleaved transformer for volumetric segmentation.arXiv preprint arXiv:2109.03201, 2021

    Hong-Yu Zhou, Jiansen Guo, Yinghao Zhang, Lequan Yu, Liansheng Wang, and Yizhou Yu. nnformer: Interleaved transformer for volumetric segmentation.arXiv preprint arXiv:2109.03201, 2021. 7

  45. [45]

    Mingzhe Zhou, Jinbao Li, and Yahong Guo. Multi-level channel-spatial attention and light-weight scale-fusion net- work (mcslf-net): multi-level channel-spatial attention and light-weight scale-fusion transformer for 3d brain tumor seg- mentation.Quantitative Imaging in Medicine and Surgery, 15(7):6301–6325, 2025. 3

  46. [46]

    Deep learning of the sectional ap- pearances of 3d ct images for anatomical structure segmen- tation based on an fcn voting method.Medical physics, 44 (10):5221–5233, 2017

    Xiangrong Zhou, Ryosuke Takayama, Song Wang, Takeshi Hara, and Hiroshi Fujita. Deep learning of the sectional ap- pearances of 3d ct images for anatomical structure segmen- tation based on an fcn voting method.Medical physics, 44 (10):5221–5233, 2017. 2