pith. machine review for the scientific record. sign in

arxiv: 2604.15729 · v2 · submitted 2026-04-17 · 💻 cs.CV · cs.AI

Recognition: unknown

MambaBack: Bridging Local Features and Global Contexts in Whole Slide Image Analysis

Chad Wong, Enhui Chai, Fei Xia, Sicheng Chen, Tianyi Zhang, Zeyu Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:46 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords localanalysisduringglobalcontextmambamambabackmambaout
0
0 comments X

The pith

MambaBack is a hybrid Mamba-CNN model with Hilbert sampling and chunked inference that reports better performance than seven prior methods on five whole-slide image datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Whole slide images are enormous digital scans of tissue slides used to diagnose cancer. Standard AI approaches treat them as bags of small patches and struggle to capture both tiny cell details and the overall tissue layout. MambaBack tries to fix this by first rearranging the patches along a space-filling Hilbert curve so nearby patches stay close in the sequence. It then runs a simple gated CNN on the local patches to pick up fine cell structures and feeds the result into a bidirectional Mamba block that looks at the whole slide for larger patterns. During training the model processes chunks in parallel; at inference it streams the chunks to keep memory use low on edge devices. The authors test the model on five public pathology datasets and state that it beats seven recent competing methods.

Core claim

Experimental results on five datasets demonstrate that MambaBack outperforms seven state-of-the-art methods.

Load-bearing premise

That the combination of Hilbert sampling, 1D gated CNN local blocks, BiMamba2 global blocks, and asymmetric chunking is responsible for the observed gains rather than dataset-specific tuning or unablated baseline differences.

Figures

Figures reproduced from arXiv: 2604.15729 by Chad Wong, Enhui Chai, Fei Xia, Sicheng Chen, Tianyi Zhang, Zeyu Liu.

Figure 1
Figure 1. Figure 1: Overview of MambaBack. a. MIL pipeline with MambaBack. b. MambaBack structure. c. Heatmap visualization. d. Inference memory usage comparison. First, the intrinsic 2D spatial relationships of tissue are often disrupted when WSIs are flattened into 1D tile sequences [10]. Existing strategies like sequence reordering [13] or Z-order [14] remain suboptimal for the non-convex, irregular shapes character￾istic … view at source ↗
Figure 2
Figure 2. Figure 2: Analysis of WSI tile distributions and ablation studies on key components. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

Whole Slide Image (WSI) analysis is pivotal in computational pathology, enabling cancer diagnosis by integrating morphological and architectural cues across magnifications. Multiple Instance Learning (MIL) serves as the standard framework for WSI analysis. Recently, Mamba has become a promising backbone for MIL, overtaking Transformers due to its efficiency and global context modeling capabilities originating from Natural Language Processing (NLP). However, existing Mamba-based MIL approaches face three critical challenges: (1) disruption of 2D spatial locality during 1D sequence flattening; (2) sub-optimal modeling of fine-grained local cellular structures; and (3) high memory peaks during inference on resource-constrained edge devices. Studies like MambaOut reveal that Mamba's SSM component is redundant for local feature extraction, where Gated CNNs suffice. Recognizing that WSI analysis demands both fine-grained local feature extraction akin to natural images, and global context modeling akin to NLP, we propose MambaBack, a novel hybrid architecture that harmonizes the strengths of Mamba and MambaOut. First, we propose the Hilbert sampling strategy to preserve the 2D spatial locality of tiles within 1D sequences, enhancing the model's spatial perception. Second, we design a hierarchical structure comprising a 1D Gated CNN block based on MambaOut to capture local cellular features, and a BiMamba2 block to aggregate global context, jointly enhancing multi-scale representation. Finally, we implement an asymmetric chunking design, allowing parallel processing during training and chunking-streaming accumulation during inference, minimizing peak memory usage for deployment. Experimental results on five datasets demonstrate that MambaBack outperforms seven state-of-the-art methods. Source code and datasets are publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Circularity Check

0 steps flagged

No circularity: empirical architecture proposal validated on external benchmarks

full rationale

The paper is an empirical proposal of a hybrid Mamba-based MIL architecture for WSI analysis. It describes design choices (Hilbert sampling, 1D Gated-CNN blocks, BiMamba2 blocks, asymmetric chunking) motivated by prior observations and validates them via measured performance on five public datasets against seven external SOTA methods. No equations, first-principles derivations, or predictions are presented that reduce by construction to fitted inputs, self-citations, or renamed known results. All load-bearing claims are experimental outcomes on independent data, satisfying the criteria for a self-contained non-circular finding.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard deep-learning assumptions plus three design choices whose effectiveness is demonstrated empirically rather than derived from first principles.

free parameters (1)
  • chunk size and overlap parameters
    Chosen to balance training parallelism and inference memory; values are not stated in abstract but affect the reported memory and accuracy numbers.
axioms (2)
  • domain assumption Mamba SSM blocks provide efficient global context modeling for sequences
    Invoked when claiming BiMamba2 aggregates global context better than prior MIL backbones.
  • domain assumption Gated CNNs are sufficient for local feature extraction in images
    Taken from the cited MambaOut work and applied to 1D sequences of tiles.

pith-pipeline@v0.9.0 · 5626 in / 1379 out tokens · 36233 ms · 2026-05-10T08:46:50.895457+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 5 canonical work pages · 3 internal anchors

  1. [1]

    Digital pathology and artificial intelligence.The lancet oncology, 20(5):e253–e261, 2019

    Muhammad Khalid Khan Niazi, Anil V Parwani, and Metin N Gurcan. Digital pathology and artificial intelligence.The lancet oncology, 20(5):e253–e261, 2019

  2. [2]

    Review of the current state of whole slide imaging in pathology.Journal of pathology informatics, 2 (1):36, 2011

    Liron Pantanowitz, Paul N Valenstein, Andrew J Evans, Keith J Kaplan, John D Pfeifer, David C Wilbur, Laura C Collins, and Terence J Colgan. Review of the current state of whole slide imaging in pathology.Journal of pathology informatics, 2 (1):36, 2011

  3. [3]

    A survey on deep learning in medical image analysis

    Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88, 2017

  4. [4]

    Deep neural network models for computational histopathology: A survey

    Chetan L Srinidhi, Ozan Ciga, and Anne L Martel. Deep neural network models for computational histopathology: A survey. Medical image analysis, 67:101813, 2021

  5. [5]

    Clinical-grade computational pathology using weakly supervised deep learning on whole slide images.Nature medicine, 25(8):1301–1309, 2019

    Gabriele Campanella, Matthew G Hanna, Luke Geneslaw, Allen Miraflor, Vitor Werneck Krauss Silva, Klaus J Busam, Edi Brogi, Victor E Reuter, David S Klimstra, and Thomas J Fuchs. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images.Nature medicine, 25(8):1301–1309, 2019

  6. [6]

    Attention-based deep multiple instance learning

    Maximilian Ilse, Jakub Tomczak, and Max Welling. Attention-based deep multiple instance learning. InInternational conference on machine learning, pages 2127–2136. PMLR, 2018

  7. [7]

    Towards a general-purpose foundation model for computational pathology

    Richard J Chen, Tong Ding, Ming Y Lu, Drew FK Williamson, Guillaume Jaume, Andrew H Song, Bowen Chen, Andrew Zhang, Daniel Shao, Muhammad Shaban, et al. Towards a general-purpose foundation model for computational pathology. Nature medicine, 30(3):850–862, 2024

  8. [8]

    Data-efficient and weakly supervised computational pathology on whole-slide images.Nature biomedical engineering, 5(6):555–570, 2021

    Ming Y Lu, Drew FK Williamson, Tiffany Y Chen, Richard J Chen, Matteo Barbieri, and Faisal Mahmood. Data-efficient and weakly supervised computational pathology on whole-slide images.Nature biomedical engineering, 5(6):555–570, 2021

  9. [9]

    Dtfd-mil: Double-tier feature distillation multiple instance learning for histopathology whole slide image classification

    Hongrun Zhang, Yanda Meng, Yitian Zhao, Yihong Qiao, Xiaoyun Yang, Sarah E Coupland, and Yalin Zheng. Dtfd-mil: Double-tier feature distillation multiple instance learning for histopathology whole slide image classification. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18802–18812, 2022

  10. [10]

    Transmil: Transformer based correlated multiple instance learning for whole slide image classification.Advances in neural information processing systems, 34: 2136–2147, 2021

    Zhuchen Shao, Hao Bian, Yang Chen, Yifeng Wang, Jian Zhang, Xiangyang Ji, et al. Transmil: Transformer based correlated multiple instance learning for whole slide image classification.Advances in neural information processing systems, 34: 2136–2147, 2021. Preprint– MambaBack: BridgingLocalFeatures andGlobalContexts inWholeSlideImageAnalysis7

  11. [11]

    Cpia dataset: a large-scale comprehensive pathological image analysis dataset for self-supervised learning pre-training

    Nan Ying, Yanli Lei, Tianyi Zhang, Shangqing Lyu, Sicheng Chen, Zeyu Liu, Yunlu Feng, Yu Zhao, and Guanglei Zhang. Cpia dataset: a large-scale comprehensive pathological image analysis dataset for self-supervised learning pre-training. Biomedical Signal Processing and Control, 110:108148, 2025

  12. [12]

    Mamba: Linear-time sequence modeling with selective state spaces

    Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. InFirst conference on language modeling, 2024

  13. [13]

    Mambamil: Enhancing long sequence modeling with sequence reordering in computational pathology

    Shu Yang, Yihui Wang, and Hao Chen. Mambamil: Enhancing long sequence modeling with sequence reordering in computational pathology. InInternational conference on medical image computing and computer-assisted intervention, pages 296–306. Springer, 2024

  14. [14]

    Exploring multi-scale local and global features in whole slide images using state space models.bioRxiv, pages 2026–01, 2026

    Chongcong Jiang, Zhuo Zhao, Peixian Liang, Min Shi, Jun Han, Nian-Feng Tzeng, Guanghua Xiao, Danny Z Chen, and Hao Zheng. Exploring multi-scale local and global features in whole slide images using state space models.bioRxiv, pages 2026–01, 2026

  15. [15]

    Springer Science & Business Media, 2012

    Hans Sagan.Space-filling curves. Springer Science & Business Media, 2012

  16. [16]

    Cellmix: A general instance relationship-based method for data augmentation toward pathology image classification.IEEE Transactions on Neural Networks and Learning Systems, 2025

    Tianyi Zhang, Zhiling Yan, Chunhui Li, Nan Ying, Yanli Lei, Shangqing Lyu, Yunlu Feng, Yu Zhao, and Guanglei Zhang. Cellmix: A general instance relationship-based method for data augmentation toward pathology image classification.IEEE Transactions on Neural Networks and Learning Systems, 2025

  17. [17]

    Mambaout: Do we really need mamba for vision? InProceedings of the Computer Vision and Pattern Recognition Conference, pages 4484–4496, 2025

    Weihao Yu and Xinchao Wang. Mambaout: Do we really need mamba for vision? InProceedings of the Computer Vision and Pattern Recognition Conference, pages 4484–4496, 2025

  18. [18]

    Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

    Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. Vision mamba: Efficient visual representation learning with bidirectional state space model.arXiv preprint arXiv:2401.09417, 2024

  19. [19]

    Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

    Tri Dao and Albert Gu. Transformers are ssms: Generalized models and efficient algorithms through structured state space duality.arXiv preprint arXiv:2405.21060, 2024

  20. [20]

    Deep learning with edge computing: A review.Proceedings of the IEEE, 107(8):1655–1674, 2019

    Jiasi Chen and Xukan Ran. Deep learning with edge computing: A review.Proceedings of the IEEE, 107(8):1655–1674, 2019

  21. [21]

    A survey of fpga-based neural network accelerator

    Kaiyuan Guo, Shulin Zeng, Jincheng Yu, Yu Wang, and Huazhong Yang. A survey of fpga-based neural network accelerator. arXiv preprint arXiv:1712.08934, 2017

  22. [22]

    PathRWKV: Enhancing Whole Slide Image Inference with Asymmetric Recurrent Modeling

    Sicheng Chen, Tianyi Zhang, Dankai Liao, Dandan Li, Low Chang Han, Yanqin Jiang, Yueming Jin, and Shangqing Lyu. Pathrwkv: Enabling whole slide prediction with recurrent-transformer.arXiv preprint arXiv:2503.03199, 2025

  23. [23]

    Deep multi-instance learning for survival prediction from whole slide images

    Jiawen Yao, Xinliang Zhu, and Junzhou Huang. Deep multi-instance learning for survival prediction from whole slide images. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 496–504. Springer, 2019

  24. [24]

    Unpuzzle: A unified framework for pathol- ogy image analysis.arXiv preprint arXiv:2503.03152, 2025

    Dankai Liao, Sicheng Chen, Nuwa Xi, Qiaochu Xue, Jieyu Li, Lingxuan Hou, Zeyu Liu, Chang Han Low, Yufeng Wu, Yiling Liu, et al. Unpuzzle: A unified framework for pathology image analysis.arXiv preprint arXiv:2503.03152, 2025

  25. [25]

    A whole-slide foundation model for digital pathology from real-world data.Nature, pages 1–8, 2024

    Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, CliffWong, Zelalem Gero, Javier González, Yu Gu, et al. A whole-slide foundation model for digital pathology from real-world data.Nature, pages 1–8, 2024

  26. [26]

    Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer.Jama, 318(22):2199–2210, 2017

    Babak Ehteshami Bejnordi, Mitko Veta, Paul Johannes Van Diest, Bram Van Ginneken, Nico Karssemeijer, Geert Litjens, Jeroen AWM Van Der Laak, Meyke Hermsen, Quirine F Manson, Maschenka Balkenhol, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer.Jama, 318(22):2199–2210, 2017

  27. [27]

    Peter Bandi, Oscar Geessink, Quirine Manson, Marcory Van Dijk, Maschenka Balkenhol, Meyke Hermsen, Babak Ehteshami Bejnordi, Byungjae Lee, Kyunghyun Paeng, Aoxiao Zhong, et al. From detection of individual metastases to classification of lymph node status at the patient level: the camelyon17 challenge.IEEE transactions on medical imaging, 38(2):550–560, 2018

  28. [28]

    Artificial intelligence for diagnosis and gleason grading of prostate cancer: the panda challenge.Nature medicine, 28(1):154–163, 2022

    Wouter Bulten, Kimmo Kartasalo, Po-Hsuan Cameron Chen, Peter Ström, Hans Pinckaers, Kunal Nagpal, Yuannan Cai, David F Steiner, Hester Van Boven, Robert Vink, et al. Artificial intelligence for diagnosis and gleason grading of prostate cancer: the panda challenge.Nature medicine, 28(1):154–163, 2022

  29. [29]

    Tcgabiolinks: an r/bioconductor package for integrative analysis of tcga data.Nucleic acids research, 44(8):e71–e71, 2016

    Antonio Colaprico, Tiago C Silva, Catharina Olsen, Luciano Garofano, Claudia Cava, Davide Garolini, Thais S Sabedot, Tathiane M Malta, Stefano M Pagnotta, Isabella Castiglioni, et al. Tcgabiolinks: an r/bioconductor package for integrative analysis of tcga data.Nucleic acids research, 44(8):e71–e71, 2016

  30. [30]

    Deepsurv: personalized treatment recommender system using a cox proportional hazards deep neural network.BMC medical research methodology, 18(1):24, 2018

    Jared L Katzman, Uri Shaham, Alexander Cloninger, Jonathan Bates, Tingting Jiang, and Yuval Kluger. Deepsurv: personalized treatment recommender system using a cox proportional hazards deep neural network.BMC medical research methodology, 18(1):24, 2018