Recognition: 2 theorem links
· Lean TheoremPolygon-mamba: Retinal vessel segmentation using polygon scanning mamba and space-frequency collaborative attention
Pith reviewed 2026-05-12 03:48 UTC · model grok-4.3
The pith
Polygon scanning mamba maintains connectivity of small retinal vessels during segmentation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We design a hybrid CNN-Mamba fusion network that integrates polygon scanning mamba and space-frequency collaborative attention mechanism for the detection of small vessels. The polygon scanning visual state space model uses multi-layer reverse scanning to identify small vessel structural features and preserve pixel connectivity, mitigating information loss. The space-frequency collaborative attention mechanism extracts efficient features from spatial and frequency domains to dynamically enhance key features and suppress clutter.
What carries the argument
Polygon scanning visual state space model (PS-VSS) using multi-layer reverse scanning to preserve connectivity in small vessel structures
Load-bearing premise
The polygon scanning and space-frequency attention mechanisms will continue to preserve small vessel connectivity and enhance features effectively on retinal datasets beyond the three used in the study without introducing artifacts.
What would settle it
Running the model on a fourth independent retinal vessel dataset and checking if small vessels show continuous paths matching the annotations or if new breaks and false positives appear.
read the original abstract
Retinal vessel segmentation is crucial for diagnosis and assessment of ocular diseases. Notably, segmentation of small retinal vessels has been consistently recognized as a challenging and complex task. To tackle this challenge, we design a hybrid CNN-Mamba fusion network that integrates polygon scanning mamba and space-frequency collaborative attention mechanism for the detection of small vessels. Considering that the traditional mamba architecture with horizontal-vertical scanning may compromise the topological integrity of target structures and result in local discontinuities in small retinal vessels, we present a polygon scanning visual state space model (PS-VSS) to identify small vessel structural features by multi-layer reverse scanning way. Which effectively preserves pixels connectivity, thereby substantially mitigating the loss of information pertaining to small vessels. Furthermore, as we all known that the spatial domain prioritizes positional and structural information, while the frequency domain emphasizes global perception and local detail components, a space-frequency collaborative attention mechanism (SFCAM) is introduced within the skip connection to extract efficient features from the spatial and frequency domains. This strategy empowers the model to dynamically enhance the key features while effectively suppressing clutters. To assess the efficacy of our model, it was tested on three publicly available datasets: DRIVE, STARE, and CHASE_DB1. Compared to manual annotations, our model demonstrated F1 scores of 0.8283, 0.8282, and 0.8251, Area Under Curve (AUC) values of 0.9806, 0.9840, and 0.9866, and Sensitivity (SE) values of of 0.8268, 0.8314, and 0.8484 across three datasets, respectively. The effectiveness of our model was validated through both visual inspection and quantitative analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents Polygon-Mamba, a hybrid CNN-Mamba fusion network for retinal vessel segmentation. It introduces a Polygon Scanning Visual State Space (PS-VSS) module that employs multi-layer reverse polygon scanning to preserve topological connectivity of small vessels (addressing discontinuities from standard horizontal-vertical Mamba scans), and a Space-Frequency Collaborative Attention Mechanism (SFCAM) placed in skip connections to fuse spatial positional information with frequency-domain global and local details. The model is evaluated on the DRIVE, STARE, and CHASE_DB1 datasets, reporting F1 scores of 0.8283/0.8282/0.8251, AUC values of 0.9806/0.9840/0.9866, and sensitivities of 0.8268/0.8314/0.8484.
Significance. If the performance gains can be rigorously attributed to the polygon scanning and space-frequency fusion rather than training details or the base CNN-Mamba backbone, the work would provide a useful direction for maintaining vessel continuity in thin-structure segmentation tasks. The Mamba-based scanning offers an efficiency-oriented alternative to attention mechanisms, and the hybrid design could inform subsequent medical imaging models that prioritize both local connectivity and global context.
major comments (3)
- [§4 Experiments] §4 Experiments and §4.2 Ablation Study (if present): No ablation experiments are described that isolate the PS-VSS polygon scanning (e.g., by replacing it with standard raster or bidirectional scanning on the identical backbone) or the SFCAM (e.g., by substituting standard spatial attention or removing the frequency branch). Without these, the reported F1/SE improvements cannot be attributed to the claimed connectivity preservation and artifact suppression rather than hyperparameter choices or the overall architecture.
- [§4.1 Quantitative Results] §4.1 Quantitative Results: The evaluation reports only standard pixel-wise metrics (F1, AUC, SE) and visual inspection; no topology-specific metrics (e.g., connected-component counts, Betti numbers, or vessel continuity scores) or direct side-by-side comparisons of scanning orders are provided to verify that multi-layer reverse polygon scanning specifically reduces discontinuities in small vessels compared with conventional Mamba scanning.
- [§4 Experiments] §4 and §3 Method: Training procedures, data augmentation, hyperparameter search, baseline re-implementations (versus literature numbers), and statistical significance tests for the metric differences are not detailed. This information is load-bearing for assessing whether the gains on DRIVE/STARE/CHASE_DB1 are robust and generalizable beyond the three tested datasets.
minor comments (3)
- [Abstract] Abstract: The phrase 'as we all known that' is grammatically incorrect and should be revised to 'as is well known' or equivalent.
- [Abstract] Abstract: Duplicate word 'of of' appears in 'Sensitivity (SE) values of of 0.8268'.
- [§3 Method] Method section: Module acronyms (PS-VSS, SFCAM) and the exact polygon scanning directions should be accompanied by a clear diagram or pseudocode for reproducibility.
Simulated Author's Rebuttal
We sincerely thank the referee for the constructive and detailed feedback. We have carefully reviewed each major comment and will incorporate revisions to address the concerns regarding experimental validation, metrics, and reproducibility.
read point-by-point responses
-
Referee: [§4 Experiments] §4 Experiments and §4.2 Ablation Study (if present): No ablation experiments are described that isolate the PS-VSS polygon scanning (e.g., by replacing it with standard raster or bidirectional scanning on the identical backbone) or the SFCAM (e.g., by substituting standard spatial attention or removing the frequency branch). Without these, the reported F1/SE improvements cannot be attributed to the claimed connectivity preservation and artifact suppression rather than hyperparameter choices or the overall architecture.
Authors: We agree that dedicated ablation studies are necessary to rigorously attribute performance gains to the polygon scanning in PS-VSS and the space-frequency fusion in SFCAM. The current manuscript validates the full model via SOTA comparisons and visuals but does not isolate these components. In the revised manuscript, we will add a dedicated ablation subsection with: (i) PS-VSS replaced by standard horizontal-vertical or bidirectional scanning on the identical backbone, and (ii) SFCAM replaced by standard spatial attention or with the frequency branch removed. Results will be reported on all three datasets to support the connectivity and artifact-suppression claims. revision: yes
-
Referee: [§4.1 Quantitative Results] §4.1 Quantitative Results: The evaluation reports only standard pixel-wise metrics (F1, AUC, SE) and visual inspection; no topology-specific metrics (e.g., connected-component counts, Betti numbers, or vessel continuity scores) or direct side-by-side comparisons of scanning orders are provided to verify that multi-layer reverse polygon scanning specifically reduces discontinuities in small vessels compared with conventional Mamba scanning.
Authors: We acknowledge that topology-aware metrics would provide stronger quantitative support for the connectivity-preservation benefit of multi-layer reverse polygon scanning. The manuscript currently relies on standard metrics plus qualitative visual evidence of improved small-vessel continuity. In the revision, we will add topology-specific evaluations, including connected-component counts and a vessel continuity score, together with direct side-by-side quantitative and visual comparisons of polygon versus standard raster scanning on representative images from DRIVE, STARE, and CHASE_DB1. revision: yes
-
Referee: [§4 Experiments] §4 and §3 Method: Training procedures, data augmentation, hyperparameter search, baseline re-implementations (versus literature numbers), and statistical significance tests for the metric differences are not detailed. This information is load-bearing for assessing whether the gains on DRIVE/STARE/CHASE_DB1 are robust and generalizable beyond the three tested datasets.
Authors: We agree that expanded methodological and experimental details are required for reproducibility and to substantiate the reported gains. The original manuscript provides only high-level descriptions. In the revised version we will: (i) detail all data-augmentation strategies, optimizer settings, learning-rate schedules, and hyperparameter search procedure in Section 3; (ii) clarify baseline re-implementations versus literature-reported numbers; and (iii) include statistical significance tests (e.g., paired t-tests with p-values) for metric differences across the three datasets, along with a brief discussion of generalizability limitations. revision: yes
Circularity Check
No circularity: empirical network design with independent validation
full rationale
The paper is an empirical deep-learning contribution that proposes a hybrid CNN-Mamba architecture (PS-VSS polygon scanning and SFCAM space-frequency attention) and evaluates it via standard training on public retinal datasets (DRIVE, STARE, CHASE_DB1), reporting F1/AUC/SE metrics. No derivation chain, equations, or predictions are presented that reduce by construction to fitted inputs, self-citations, or ansatzes. Performance numbers arise from gradient descent on held-out test splits rather than any self-referential loop. The work is therefore self-contained against external benchmarks with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Mamba-based state space models can capture long-range dependencies in 2D images when an appropriate scanning order is chosen.
- domain assumption Joint spatial and frequency domain processing improves discrimination of small structures over spatial-only attention.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we present a polygon scanning visual state space model (PS-VSS) to identify small vessel structural features by multi-layer reverse scanning way. Which effectively preserves pixels connectivity
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
a space-frequency collaborative attention mechanism (SFCAM) is introduced within the skip connection
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A review of retinal vessel segmentation for fundus image analysis
QIN, Qing; CHEN, Yuanyuan. A review of retinal vessel segmentation for fundus image analysis. Engineering Applications of Artificial Intelligence, 2024, 128: 107454
work page 2024
-
[2]
ILESANMI, Ademola E.; ILESANMI, Taiwo; GBOTOSO, Gbenga A. A systematic review of retinal fundus image segmentation and classification methods using convolutional neural networks. Healthcare Analytics, 2023, 4: 100261
work page 2023
-
[3]
Deep learning for retinal vessel segmentation: a systematic review of techniques and applications
LIU, Zhihui, et al. Deep learning for retinal vessel segmentation: a systematic review of techniques and applications. Medical & Biological Engineering & Computing, 2025, 63.8: 2191-2208
work page 2025
-
[4]
Systematic review of retinal blood vessels segmentation based on AI-driven technique
VERMA, Prem Kumari; KAUR, Jagdeep. Systematic review of retinal blood vessels segmentation based on AI-driven technique. Journal of imaging informatics in medicine, 2024, 37.4: 1783-1799
work page 2024
-
[5]
U-net: Convolutional networks for biomedical image segmentation
RONNEBERGER, Olaf; FISCHER, Philipp; BROX, Thomas. U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. Cham: Springer international publishing, 2015. p. 234-241
work page 2015
-
[6]
XU, Hao; WU, Yun. G2ViT: Graph neural network-guided vision transformer enhanced network for retinal vessel and coronary angiograph segmentation. Neural Networks, 2024, 176: 106356
work page 2024
-
[7]
WANG, Ziyang, et al. Weak-mamba-unet: Visual mamba makes cnn and vit work better for scribble-based medical image segmentation. IEEE Transactions on Biomedical Engineering, 2026
work page 2026
-
[8]
Attention U-Net: Learning Where to Look for the Pancreas
LONG, Jonathan; SHELHAMER, Evan; DARRELL, Trevor. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 3431-3440. [9]OKTAY, Ozan, et al. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999, 2018. [10]ZHOU, Zongwei, et al. Un...
work page internal anchor Pith review arXiv 2015
-
[9]
Curvilinear object segmentation in medical images based on ODoS filter and deep learning network: Y
PENG, Yuanyuan, et al. Curvilinear object segmentation in medical images based on ODoS filter and deep learning network: Y. Peng et al. Applied Intelligence, 2023, 53.20: 23470-23481
work page 2023
-
[10]
QI, Yaolei, et al. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision. 2023. p. 6070-6079
work page 2023
-
[11]
VasCA-Net: A Vascular Channel Attention Network for Retinal Vessel Segmentation
MA, Zhendi, et al. VasCA-Net: A Vascular Channel Attention Network for Retinal Vessel Segmentation. Expert Systems with Applications, 2025, 130591
work page 2025
-
[12]
Vision mamba: A comprehensive survey and taxonomy
LIU, Xiao, et al. Vision mamba: A comprehensive survey and taxonomy. IEEE Transactions on Neural Networks and Learning Systems, 2025
work page 2025
-
[13]
arXiv preprint arXiv:2402.05079 (2024)
WANG, Ziyang, et al. Mamba-unet: Unet-like pure visual mamba for medical image segmentation. arXiv preprint arXiv:2402.05079, 2024
-
[14]
Serp-mamba: Advancing high-resolution retinal vessel segmentation with selective state-space model
Wang, Hongqiu, et al. "Serp-mamba: Advancing high-resolution retinal vessel segmentation with selective state-space model." IEEE Transactions on Medical Imaging (2025)
work page 2025
-
[15]
S ³ -mamba: Small-size-sensitive mamba for lesion segmentation
WANG, Gui, et al. S ³ -mamba: Small-size-sensitive mamba for lesion segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2025. p. 7655-7664
work page 2025
-
[16]
CHENG, Zihan, et al. Mamba-sea: A mamba-based framework with global-to-local sequence augmentation for generalizable medical image segmentation. IEEE Transactions on Medical Imaging, 2025
work page 2025
-
[17]
U-shape mamba: State space model for faster diffusion
ERGASTI, Alex, et al. U-shape mamba: State space model for faster diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2025. p. 3276-3283
work page 2025
-
[18]
GLMamba: A Global– Local Mamba Network for Efficient Remote Sensing Change Detection
LIU, Shengyan, et al. GLMamba: A Global– Local Mamba Network for Efficient Remote Sensing Change Detection. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2026
work page 2026
-
[19]
FSE-Mamba: A novel Frequency-Spatial Entanglement Mamba model for retinal vessel segmentation
-
[20]
ZigMa: A DiT-style Zigzag Mamba Diffusion Model
-
[21]
DS-Mamba: Dynamic snake visual state space model for vessel segmentation
LIU, Zixuan, et al. DS-Mamba: Dynamic snake visual state space model for vessel segmentation. Biomedical Signal Processing and Control, 2026, 119: 109783. [27]TAN, Tengfei, et al. Lightweight pyramid network with spatial attention mechanism for accurate retinal vessel segmentation. International Journal of Computer Assisted Radiology and Surgery, 2021, 16...
work page 2026
-
[22]
RCAR-UNet: Retinal vessel segmentation network algorithm via novel rough attention mechanism
DING, Weiping, et al. RCAR-UNet: Retinal vessel segmentation network algorithm via novel rough attention mechanism. Information Sciences, 2024, 657: 120007
work page 2024
-
[23]
CCS-UNet: a cross-channel spatial attention model for accurate retinal vessel segmentation
ZHU, Yong-fei, et al. CCS-UNet: a cross-channel spatial attention model for accurate retinal vessel segmentation. Biomedical Optics Express, 2023, 14.9: 4739-4758
work page 2023
-
[24]
MambaFuse: Fusing Multi-scale Mamba and CNN Features for Seizure Prediction
PAN, Ding; LUO, Guibo; ZHU, Yuesheng. MambaFuse: Fusing Multi-scale Mamba and CNN Features for Seizure Prediction. In: International Conference on Neural Information Processing. Singapore: Springer Nature Singapore, 2024. p. 399-413
work page 2024
-
[25]
MSFE-Mamba: Multi-scale Frequency-Enhanced Mamba for Hyperspectral Image Classification
MSHI, Rui, et al. MSFE-Mamba: Multi-scale Frequency-Enhanced Mamba for Hyperspectral Image Classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2026. [33]Thin vessel segmentation in fundus images using attention UNet and modified Frangi filtering [34]A retinal vessel segmentation network approach based on rough...
work page 2026
-
[26]
Retina blood vessels segmentation and classification with the multi-featured approach
BHIMAVARAPU, Usharani. Retina blood vessels segmentation and classification with the multi-featured approach. Journal of Imaging Informatics in Medicine, 2025, 38.1: 520-533
work page 2025
-
[27]
LI, Jianwei, et al. Topology-joint curvilinear segmentation network using confidence-based Bezier topological representation. Engineering Applications of Artificial Intelligence, 2025, 143: 110045
work page 2025
-
[28]
ResMU-Net: Residual Multi-kernel U-Net for blood vessel segmentation in retinal fundus images
PANCHAL, Sachin; KOKARE, Manesh. ResMU-Net: Residual Multi-kernel U-Net for blood vessel segmentation in retinal fundus images. Biomedical Signal Processing and Control, 2024, 90: 105859
work page 2024
-
[29]
Multi-level attention network for retinal vessel segmentation
YUAN, Yuchen, et al. Multi-level attention network for retinal vessel segmentation. IEEE Journal of Biomedical and Health Informatics, 2021, 26.1: 312-323
work page 2021
-
[30]
LI, Yang, et al. Global transformer and dual local attention network via deep-shallow hierarchical feature fusion for retinal vessel segmentation. IEEE Transactions on Cybernetics, 2022, 53.9: 5826-5839
work page 2022
-
[31]
Pa-net: A hybrid architecture for retinal vessel segmentation
LUO, Xuebing, et al. Pa-net: A hybrid architecture for retinal vessel segmentation. Pattern Recognition, 2025, 161: 111254
work page 2025
-
[32]
MSVENet: multi-scale vascular enhancement network for retinal vessel segmentation
CAO, Kang; MA, Hui. MSVENet: multi-scale vascular enhancement network for retinal vessel segmentation. Biomedical Signal Processing and Control, 2025, 110: 108272
work page 2025
-
[33]
MA, Changlong, et al. MSFB-Net: Multi-scale frequency compound attention and cross-layer boundary optimization for medical image segmentation. Biomedical Signal Processing and Control, 2026, 112: 108405
work page 2026
-
[34]
Frequency-Enhanced Wavelet Transformer Based Decoder for Medical Image Segmentation
YIN, Haitao; XU, Yongchang. Frequency-Enhanced Wavelet Transformer Based Decoder for Medical Image Segmentation. Pattern Recognition, 2026, 113198
work page 2026
-
[35]
LI, Xuan; MA, Ding; WU, Xiangqian. WFDENet: Wavelet-based frequency decomposition and enhancement network for diabetic retinopathy lesion segmentation. Pattern Recognition, 2026, 172: 112492
work page 2026
-
[36]
ZHANG, Fan; GU, Zhiwei; WANG, Hua. Decoding with structured awareness: integrating directional, frequency-spatial, and structural attention for medical image segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2026. p. 12421-12429
work page 2026
-
[37]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
GU, Albert; DAO, Tri. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[38]
Vmamba: Visual state space model
LIU, Yue, et al. Vmamba: Visual state space model. Advances in neural information processing systems, 2024, 37: 103031-103063
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.