MambaBack: Bridging Local Features and Global Contexts in Whole Slide Image Analysis

Chad Wong, Enhui Chai, Fei Xia, Sicheng Chen, Tianyi Zhang, Zeyu Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:46 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords localanalysisduringglobalcontextmambamambabackmambaout

0 comments

The pith

MambaBack is a hybrid Mamba-CNN model with Hilbert sampling and chunked inference that reports better performance than seven prior methods on five whole-slide image datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Whole slide images are enormous digital scans of tissue slides used to diagnose cancer. Standard AI approaches treat them as bags of small patches and struggle to capture both tiny cell details and the overall tissue layout. MambaBack tries to fix this by first rearranging the patches along a space-filling Hilbert curve so nearby patches stay close in the sequence. It then runs a simple gated CNN on the local patches to pick up fine cell structures and feeds the result into a bidirectional Mamba block that looks at the whole slide for larger patterns. During training the model processes chunks in parallel; at inference it streams the chunks to keep memory use low on edge devices. The authors test the model on five public pathology datasets and state that it beats seven recent competing methods.

Core claim

Experimental results on five datasets demonstrate that MambaBack outperforms seven state-of-the-art methods.

Load-bearing premise

That the combination of Hilbert sampling, 1D gated CNN local blocks, BiMamba2 global blocks, and asymmetric chunking is responsible for the observed gains rather than dataset-specific tuning or unablated baseline differences.

Figures

Figures reproduced from arXiv: 2604.15729 by Chad Wong, Enhui Chai, Fei Xia, Sicheng Chen, Tianyi Zhang, Zeyu Liu.

**Figure 1.** Figure 1: Overview of MambaBack. a. MIL pipeline with MambaBack. b. MambaBack structure. c. Heatmap visualization. d. Inference memory usage comparison. First, the intrinsic 2D spatial relationships of tissue are often disrupted when WSIs are flattened into 1D tile sequences [10]. Existing strategies like sequence reordering [13] or Z-order [14] remain suboptimal for the non-convex, irregular shapes characteristic … view at source ↗

**Figure 2.** Figure 2: Analysis of WSI tile distributions and ablation studies on key components. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Whole Slide Image (WSI) analysis is pivotal in computational pathology, enabling cancer diagnosis by integrating morphological and architectural cues across magnifications. Multiple Instance Learning (MIL) serves as the standard framework for WSI analysis. Recently, Mamba has become a promising backbone for MIL, overtaking Transformers due to its efficiency and global context modeling capabilities originating from Natural Language Processing (NLP). However, existing Mamba-based MIL approaches face three critical challenges: (1) disruption of 2D spatial locality during 1D sequence flattening; (2) sub-optimal modeling of fine-grained local cellular structures; and (3) high memory peaks during inference on resource-constrained edge devices. Studies like MambaOut reveal that Mamba's SSM component is redundant for local feature extraction, where Gated CNNs suffice. Recognizing that WSI analysis demands both fine-grained local feature extraction akin to natural images, and global context modeling akin to NLP, we propose MambaBack, a novel hybrid architecture that harmonizes the strengths of Mamba and MambaOut. First, we propose the Hilbert sampling strategy to preserve the 2D spatial locality of tiles within 1D sequences, enhancing the model's spatial perception. Second, we design a hierarchical structure comprising a 1D Gated CNN block based on MambaOut to capture local cellular features, and a BiMamba2 block to aggregate global context, jointly enhancing multi-scale representation. Finally, we implement an asymmetric chunking design, allowing parallel processing during training and chunking-streaming accumulation during inference, minimizing peak memory usage for deployment. Experimental results on five datasets demonstrate that MambaBack outperforms seven state-of-the-art methods. Source code and datasets are publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MambaBack is a practical hybrid CNN-Mamba backbone for WSI MIL that fixes spatial flattening and memory issues, but its outperformance claims need ablations to confirm the components are responsible.

read the letter

The main point is that this paper puts forward MambaBack, a hybrid architecture for multiple instance learning on whole slide images. It keeps 2D patch locality via Hilbert curve sampling, uses a 1D gated CNN block for local cellular detail, pairs it with BiMamba2 for global context, and adds asymmetric chunking so training stays parallel while inference streams with low peak memory. The abstract reports it beats seven prior methods on five datasets, with code released.

Circularity Check

0 steps flagged

No circularity: empirical architecture proposal validated on external benchmarks

full rationale

The paper is an empirical proposal of a hybrid Mamba-based MIL architecture for WSI analysis. It describes design choices (Hilbert sampling, 1D Gated-CNN blocks, BiMamba2 blocks, asymmetric chunking) motivated by prior observations and validates them via measured performance on five public datasets against seven external SOTA methods. No equations, first-principles derivations, or predictions are presented that reduce by construction to fitted inputs, self-citations, or renamed known results. All load-bearing claims are experimental outcomes on independent data, satisfying the criteria for a self-contained non-circular finding.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard deep-learning assumptions plus three design choices whose effectiveness is demonstrated empirically rather than derived from first principles.

free parameters (1)

chunk size and overlap parameters
Chosen to balance training parallelism and inference memory; values are not stated in abstract but affect the reported memory and accuracy numbers.

axioms (2)

domain assumption Mamba SSM blocks provide efficient global context modeling for sequences
Invoked when claiming BiMamba2 aggregates global context better than prior MIL backbones.
domain assumption Gated CNNs are sufficient for local feature extraction in images
Taken from the cited MambaOut work and applied to 1D sequences of tiles.

pith-pipeline@v0.9.0 · 5626 in / 1379 out tokens · 36233 ms · 2026-05-10T08:46:50.895457+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 5 canonical work pages · 3 internal anchors

[1]

Digital pathology and artificial intelligence.The lancet oncology, 20(5):e253–e261, 2019

Muhammad Khalid Khan Niazi, Anil V Parwani, and Metin N Gurcan. Digital pathology and artificial intelligence.The lancet oncology, 20(5):e253–e261, 2019

2019
[2]

Review of the current state of whole slide imaging in pathology.Journal of pathology informatics, 2 (1):36, 2011

Liron Pantanowitz, Paul N Valenstein, Andrew J Evans, Keith J Kaplan, John D Pfeifer, David C Wilbur, Laura C Collins, and Terence J Colgan. Review of the current state of whole slide imaging in pathology.Journal of pathology informatics, 2 (1):36, 2011

2011
[3]

A survey on deep learning in medical image analysis

Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88, 2017

2017
[4]

Deep neural network models for computational histopathology: A survey

Chetan L Srinidhi, Ozan Ciga, and Anne L Martel. Deep neural network models for computational histopathology: A survey. Medical image analysis, 67:101813, 2021

2021
[5]

Clinical-grade computational pathology using weakly supervised deep learning on whole slide images.Nature medicine, 25(8):1301–1309, 2019

Gabriele Campanella, Matthew G Hanna, Luke Geneslaw, Allen Miraflor, Vitor Werneck Krauss Silva, Klaus J Busam, Edi Brogi, Victor E Reuter, David S Klimstra, and Thomas J Fuchs. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images.Nature medicine, 25(8):1301–1309, 2019

2019
[6]

Attention-based deep multiple instance learning

Maximilian Ilse, Jakub Tomczak, and Max Welling. Attention-based deep multiple instance learning. InInternational conference on machine learning, pages 2127–2136. PMLR, 2018

2018
[7]

Towards a general-purpose foundation model for computational pathology

Richard J Chen, Tong Ding, Ming Y Lu, Drew FK Williamson, Guillaume Jaume, Andrew H Song, Bowen Chen, Andrew Zhang, Daniel Shao, Muhammad Shaban, et al. Towards a general-purpose foundation model for computational pathology. Nature medicine, 30(3):850–862, 2024

2024
[8]

Data-efficient and weakly supervised computational pathology on whole-slide images.Nature biomedical engineering, 5(6):555–570, 2021

Ming Y Lu, Drew FK Williamson, Tiffany Y Chen, Richard J Chen, Matteo Barbieri, and Faisal Mahmood. Data-efficient and weakly supervised computational pathology on whole-slide images.Nature biomedical engineering, 5(6):555–570, 2021

2021
[9]

Dtfd-mil: Double-tier feature distillation multiple instance learning for histopathology whole slide image classification

Hongrun Zhang, Yanda Meng, Yitian Zhao, Yihong Qiao, Xiaoyun Yang, Sarah E Coupland, and Yalin Zheng. Dtfd-mil: Double-tier feature distillation multiple instance learning for histopathology whole slide image classification. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18802–18812, 2022

2022
[10]

Transmil: Transformer based correlated multiple instance learning for whole slide image classification.Advances in neural information processing systems, 34: 2136–2147, 2021

Zhuchen Shao, Hao Bian, Yang Chen, Yifeng Wang, Jian Zhang, Xiangyang Ji, et al. Transmil: Transformer based correlated multiple instance learning for whole slide image classification.Advances in neural information processing systems, 34: 2136–2147, 2021. Preprint– MambaBack: BridgingLocalFeatures andGlobalContexts inWholeSlideImageAnalysis7

2021
[11]

Cpia dataset: a large-scale comprehensive pathological image analysis dataset for self-supervised learning pre-training

Nan Ying, Yanli Lei, Tianyi Zhang, Shangqing Lyu, Sicheng Chen, Zeyu Liu, Yunlu Feng, Yu Zhao, and Guanglei Zhang. Cpia dataset: a large-scale comprehensive pathological image analysis dataset for self-supervised learning pre-training. Biomedical Signal Processing and Control, 110:108148, 2025

2025
[12]

Mamba: Linear-time sequence modeling with selective state spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. InFirst conference on language modeling, 2024

2024
[13]

Mambamil: Enhancing long sequence modeling with sequence reordering in computational pathology

Shu Yang, Yihui Wang, and Hao Chen. Mambamil: Enhancing long sequence modeling with sequence reordering in computational pathology. InInternational conference on medical image computing and computer-assisted intervention, pages 296–306. Springer, 2024

2024
[14]

Exploring multi-scale local and global features in whole slide images using state space models.bioRxiv, pages 2026–01, 2026

Chongcong Jiang, Zhuo Zhao, Peixian Liang, Min Shi, Jun Han, Nian-Feng Tzeng, Guanghua Xiao, Danny Z Chen, and Hao Zheng. Exploring multi-scale local and global features in whole slide images using state space models.bioRxiv, pages 2026–01, 2026

2026
[15]

Springer Science & Business Media, 2012

Hans Sagan.Space-filling curves. Springer Science & Business Media, 2012

2012
[16]

Cellmix: A general instance relationship-based method for data augmentation toward pathology image classification.IEEE Transactions on Neural Networks and Learning Systems, 2025

Tianyi Zhang, Zhiling Yan, Chunhui Li, Nan Ying, Yanli Lei, Shangqing Lyu, Yunlu Feng, Yu Zhao, and Guanglei Zhang. Cellmix: A general instance relationship-based method for data augmentation toward pathology image classification.IEEE Transactions on Neural Networks and Learning Systems, 2025

2025
[17]

Mambaout: Do we really need mamba for vision? InProceedings of the Computer Vision and Pattern Recognition Conference, pages 4484–4496, 2025

Weihao Yu and Xinchao Wang. Mambaout: Do we really need mamba for vision? InProceedings of the Computer Vision and Pattern Recognition Conference, pages 4484–4496, 2025

2025
[18]

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. Vision mamba: Efficient visual representation learning with bidirectional state space model.arXiv preprint arXiv:2401.09417, 2024

work page internal anchor Pith review arXiv 2024
[19]

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Tri Dao and Albert Gu. Transformers are ssms: Generalized models and efficient algorithms through structured state space duality.arXiv preprint arXiv:2405.21060, 2024

work page internal anchor Pith review arXiv 2024
[20]

Deep learning with edge computing: A review.Proceedings of the IEEE, 107(8):1655–1674, 2019

Jiasi Chen and Xukan Ran. Deep learning with edge computing: A review.Proceedings of the IEEE, 107(8):1655–1674, 2019

2019
[21]

A survey of fpga-based neural network accelerator

Kaiyuan Guo, Shulin Zeng, Jincheng Yu, Yu Wang, and Huazhong Yang. A survey of fpga-based neural network accelerator. arXiv preprint arXiv:1712.08934, 2017

work page arXiv 2017
[22]

PathRWKV: Enhancing Whole Slide Image Inference with Asymmetric Recurrent Modeling

Sicheng Chen, Tianyi Zhang, Dankai Liao, Dandan Li, Low Chang Han, Yanqin Jiang, Yueming Jin, and Shangqing Lyu. Pathrwkv: Enabling whole slide prediction with recurrent-transformer.arXiv preprint arXiv:2503.03199, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

Deep multi-instance learning for survival prediction from whole slide images

Jiawen Yao, Xinliang Zhu, and Junzhou Huang. Deep multi-instance learning for survival prediction from whole slide images. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 496–504. Springer, 2019

2019
[24]

Unpuzzle: A unified framework for pathol- ogy image analysis.arXiv preprint arXiv:2503.03152, 2025

Dankai Liao, Sicheng Chen, Nuwa Xi, Qiaochu Xue, Jieyu Li, Lingxuan Hou, Zeyu Liu, Chang Han Low, Yufeng Wu, Yiling Liu, et al. Unpuzzle: A unified framework for pathology image analysis.arXiv preprint arXiv:2503.03152, 2025

work page arXiv 2025
[25]

A whole-slide foundation model for digital pathology from real-world data.Nature, pages 1–8, 2024

Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, CliffWong, Zelalem Gero, Javier González, Yu Gu, et al. A whole-slide foundation model for digital pathology from real-world data.Nature, pages 1–8, 2024

2024
[26]

Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer.Jama, 318(22):2199–2210, 2017

Babak Ehteshami Bejnordi, Mitko Veta, Paul Johannes Van Diest, Bram Van Ginneken, Nico Karssemeijer, Geert Litjens, Jeroen AWM Van Der Laak, Meyke Hermsen, Quirine F Manson, Maschenka Balkenhol, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer.Jama, 318(22):2199–2210, 2017

2017
[27]

Peter Bandi, Oscar Geessink, Quirine Manson, Marcory Van Dijk, Maschenka Balkenhol, Meyke Hermsen, Babak Ehteshami Bejnordi, Byungjae Lee, Kyunghyun Paeng, Aoxiao Zhong, et al. From detection of individual metastases to classification of lymph node status at the patient level: the camelyon17 challenge.IEEE transactions on medical imaging, 38(2):550–560, 2018

2018
[28]

Artificial intelligence for diagnosis and gleason grading of prostate cancer: the panda challenge.Nature medicine, 28(1):154–163, 2022

Wouter Bulten, Kimmo Kartasalo, Po-Hsuan Cameron Chen, Peter Ström, Hans Pinckaers, Kunal Nagpal, Yuannan Cai, David F Steiner, Hester Van Boven, Robert Vink, et al. Artificial intelligence for diagnosis and gleason grading of prostate cancer: the panda challenge.Nature medicine, 28(1):154–163, 2022

2022
[29]

Tcgabiolinks: an r/bioconductor package for integrative analysis of tcga data.Nucleic acids research, 44(8):e71–e71, 2016

Antonio Colaprico, Tiago C Silva, Catharina Olsen, Luciano Garofano, Claudia Cava, Davide Garolini, Thais S Sabedot, Tathiane M Malta, Stefano M Pagnotta, Isabella Castiglioni, et al. Tcgabiolinks: an r/bioconductor package for integrative analysis of tcga data.Nucleic acids research, 44(8):e71–e71, 2016

2016
[30]

Deepsurv: personalized treatment recommender system using a cox proportional hazards deep neural network.BMC medical research methodology, 18(1):24, 2018

Jared L Katzman, Uri Shaham, Alexander Cloninger, Jonathan Bates, Tingting Jiang, and Yuval Kluger. Deepsurv: personalized treatment recommender system using a cox proportional hazards deep neural network.BMC medical research methodology, 18(1):24, 2018

2018