arxiv: 2605.10581 · v1 · submitted 2026-05-11 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Polygon-mamba: Retinal vessel segmentation using polygon scanning mamba and space-frequency collaborative attention

Yuanyuan Peng , Wen Li , Xiong Li , Juan Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:48 UTC · model grok-4.3

classification 💻 cs.CV

keywords retinal vessel segmentationMambapolygon scanningspace-frequency attentionsmall vessel detectionhybrid CNN-Mamba networkmedical image analysisocular disease diagnosis

0 comments

The pith

Polygon scanning mamba maintains connectivity of small retinal vessels during segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that a hybrid CNN-Mamba network using polygon scanning can segment small retinal vessels more accurately by avoiding breaks in their structure. A reader would care because small vessels are key for diagnosing eye diseases like retinopathy, and current methods often miss or fragment them. The approach uses multi-directional reverse scanning in the Mamba component to keep pixels connected and adds attention that mixes spatial position with frequency details to focus on important features and reduce noise. This combination is tested on three standard retinal image datasets and shows strong results in detecting fine vessels.

Core claim

We design a hybrid CNN-Mamba fusion network that integrates polygon scanning mamba and space-frequency collaborative attention mechanism for the detection of small vessels. The polygon scanning visual state space model uses multi-layer reverse scanning to identify small vessel structural features and preserve pixel connectivity, mitigating information loss. The space-frequency collaborative attention mechanism extracts efficient features from spatial and frequency domains to dynamically enhance key features and suppress clutter.

What carries the argument

Polygon scanning visual state space model (PS-VSS) using multi-layer reverse scanning to preserve connectivity in small vessel structures

Load-bearing premise

The polygon scanning and space-frequency attention mechanisms will continue to preserve small vessel connectivity and enhance features effectively on retinal datasets beyond the three used in the study without introducing artifacts.

What would settle it

Running the model on a fourth independent retinal vessel dataset and checking if small vessels show continuous paths matching the annotations or if new breaks and false positives appear.

read the original abstract

Retinal vessel segmentation is crucial for diagnosis and assessment of ocular diseases. Notably, segmentation of small retinal vessels has been consistently recognized as a challenging and complex task. To tackle this challenge, we design a hybrid CNN-Mamba fusion network that integrates polygon scanning mamba and space-frequency collaborative attention mechanism for the detection of small vessels. Considering that the traditional mamba architecture with horizontal-vertical scanning may compromise the topological integrity of target structures and result in local discontinuities in small retinal vessels, we present a polygon scanning visual state space model (PS-VSS) to identify small vessel structural features by multi-layer reverse scanning way. Which effectively preserves pixels connectivity, thereby substantially mitigating the loss of information pertaining to small vessels. Furthermore, as we all known that the spatial domain prioritizes positional and structural information, while the frequency domain emphasizes global perception and local detail components, a space-frequency collaborative attention mechanism (SFCAM) is introduced within the skip connection to extract efficient features from the spatial and frequency domains. This strategy empowers the model to dynamically enhance the key features while effectively suppressing clutters. To assess the efficacy of our model, it was tested on three publicly available datasets: DRIVE, STARE, and CHASE_DB1. Compared to manual annotations, our model demonstrated F1 scores of 0.8283, 0.8282, and 0.8251, Area Under Curve (AUC) values of 0.9806, 0.9840, and 0.9866, and Sensitivity (SE) values of of 0.8268, 0.8314, and 0.8484 across three datasets, respectively. The effectiveness of our model was validated through both visual inspection and quantitative analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This adapts Mamba with polygon scanning and space-frequency attention for retinal vessel segmentation, hitting competitive F1 scores around 0.828 on the usual three datasets, but the gains are not isolated from the rest of the model.

read the letter

The paper's main move is a CNN-Mamba hybrid that replaces standard scanning with multi-layer reverse polygon scanning in the state space blocks (PS-VSS) and adds a space-frequency collaborative attention module (SFCAM) in the skip connections. The polygon scan is meant to keep thin vessel structures connected instead of breaking them up the way row-by-row or column-by-column passes can. SFCAM combines spatial structure with frequency details to highlight small vessels and suppress background clutter. They evaluate on DRIVE, STARE, and CHASE_DB1 and report F1 scores of 0.8283, 0.8282, and 0.8251, with AUCs of 0.9806–0.9866 and sensitivity in the 0.827–0.848 range. Those numbers sit in the current range for this task and the visual examples look reasonable for small vessels.

Referee Report

3 major / 3 minor

Summary. The manuscript presents Polygon-Mamba, a hybrid CNN-Mamba fusion network for retinal vessel segmentation. It introduces a Polygon Scanning Visual State Space (PS-VSS) module that employs multi-layer reverse polygon scanning to preserve topological connectivity of small vessels (addressing discontinuities from standard horizontal-vertical Mamba scans), and a Space-Frequency Collaborative Attention Mechanism (SFCAM) placed in skip connections to fuse spatial positional information with frequency-domain global and local details. The model is evaluated on the DRIVE, STARE, and CHASE_DB1 datasets, reporting F1 scores of 0.8283/0.8282/0.8251, AUC values of 0.9806/0.9840/0.9866, and sensitivities of 0.8268/0.8314/0.8484.

Significance. If the performance gains can be rigorously attributed to the polygon scanning and space-frequency fusion rather than training details or the base CNN-Mamba backbone, the work would provide a useful direction for maintaining vessel continuity in thin-structure segmentation tasks. The Mamba-based scanning offers an efficiency-oriented alternative to attention mechanisms, and the hybrid design could inform subsequent medical imaging models that prioritize both local connectivity and global context.

major comments (3)

[§4 Experiments] §4 Experiments and §4.2 Ablation Study (if present): No ablation experiments are described that isolate the PS-VSS polygon scanning (e.g., by replacing it with standard raster or bidirectional scanning on the identical backbone) or the SFCAM (e.g., by substituting standard spatial attention or removing the frequency branch). Without these, the reported F1/SE improvements cannot be attributed to the claimed connectivity preservation and artifact suppression rather than hyperparameter choices or the overall architecture.
[§4.1 Quantitative Results] §4.1 Quantitative Results: The evaluation reports only standard pixel-wise metrics (F1, AUC, SE) and visual inspection; no topology-specific metrics (e.g., connected-component counts, Betti numbers, or vessel continuity scores) or direct side-by-side comparisons of scanning orders are provided to verify that multi-layer reverse polygon scanning specifically reduces discontinuities in small vessels compared with conventional Mamba scanning.
[§4 Experiments] §4 and §3 Method: Training procedures, data augmentation, hyperparameter search, baseline re-implementations (versus literature numbers), and statistical significance tests for the metric differences are not detailed. This information is load-bearing for assessing whether the gains on DRIVE/STARE/CHASE_DB1 are robust and generalizable beyond the three tested datasets.

minor comments (3)

[Abstract] Abstract: The phrase 'as we all known that' is grammatically incorrect and should be revised to 'as is well known' or equivalent.
[Abstract] Abstract: Duplicate word 'of of' appears in 'Sensitivity (SE) values of of 0.8268'.
[§3 Method] Method section: Module acronyms (PS-VSS, SFCAM) and the exact polygon scanning directions should be accompanied by a clear diagram or pseudocode for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We sincerely thank the referee for the constructive and detailed feedback. We have carefully reviewed each major comment and will incorporate revisions to address the concerns regarding experimental validation, metrics, and reproducibility.

read point-by-point responses

Referee: [§4 Experiments] §4 Experiments and §4.2 Ablation Study (if present): No ablation experiments are described that isolate the PS-VSS polygon scanning (e.g., by replacing it with standard raster or bidirectional scanning on the identical backbone) or the SFCAM (e.g., by substituting standard spatial attention or removing the frequency branch). Without these, the reported F1/SE improvements cannot be attributed to the claimed connectivity preservation and artifact suppression rather than hyperparameter choices or the overall architecture.

Authors: We agree that dedicated ablation studies are necessary to rigorously attribute performance gains to the polygon scanning in PS-VSS and the space-frequency fusion in SFCAM. The current manuscript validates the full model via SOTA comparisons and visuals but does not isolate these components. In the revised manuscript, we will add a dedicated ablation subsection with: (i) PS-VSS replaced by standard horizontal-vertical or bidirectional scanning on the identical backbone, and (ii) SFCAM replaced by standard spatial attention or with the frequency branch removed. Results will be reported on all three datasets to support the connectivity and artifact-suppression claims. revision: yes
Referee: [§4.1 Quantitative Results] §4.1 Quantitative Results: The evaluation reports only standard pixel-wise metrics (F1, AUC, SE) and visual inspection; no topology-specific metrics (e.g., connected-component counts, Betti numbers, or vessel continuity scores) or direct side-by-side comparisons of scanning orders are provided to verify that multi-layer reverse polygon scanning specifically reduces discontinuities in small vessels compared with conventional Mamba scanning.

Authors: We acknowledge that topology-aware metrics would provide stronger quantitative support for the connectivity-preservation benefit of multi-layer reverse polygon scanning. The manuscript currently relies on standard metrics plus qualitative visual evidence of improved small-vessel continuity. In the revision, we will add topology-specific evaluations, including connected-component counts and a vessel continuity score, together with direct side-by-side quantitative and visual comparisons of polygon versus standard raster scanning on representative images from DRIVE, STARE, and CHASE_DB1. revision: yes
Referee: [§4 Experiments] §4 and §3 Method: Training procedures, data augmentation, hyperparameter search, baseline re-implementations (versus literature numbers), and statistical significance tests for the metric differences are not detailed. This information is load-bearing for assessing whether the gains on DRIVE/STARE/CHASE_DB1 are robust and generalizable beyond the three tested datasets.

Authors: We agree that expanded methodological and experimental details are required for reproducibility and to substantiate the reported gains. The original manuscript provides only high-level descriptions. In the revised version we will: (i) detail all data-augmentation strategies, optimizer settings, learning-rate schedules, and hyperparameter search procedure in Section 3; (ii) clarify baseline re-implementations versus literature-reported numbers; and (iii) include statistical significance tests (e.g., paired t-tests with p-values) for metric differences across the three datasets, along with a brief discussion of generalizability limitations. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical network design with independent validation

full rationale

The paper is an empirical deep-learning contribution that proposes a hybrid CNN-Mamba architecture (PS-VSS polygon scanning and SFCAM space-frequency attention) and evaluates it via standard training on public retinal datasets (DRIVE, STARE, CHASE_DB1), reporting F1/AUC/SE metrics. No derivation chain, equations, or predictions are presented that reduce by construction to fitted inputs, self-citations, or ansatzes. Performance numbers arise from gradient descent on held-out test splits rather than any self-referential loop. The work is therefore self-contained against external benchmarks with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so specific free parameters and training details are unknown. The work rests on standard deep-learning assumptions about feature extraction and long-range modeling rather than new physical postulates.

axioms (2)

domain assumption Mamba-based state space models can capture long-range dependencies in 2D images when an appropriate scanning order is chosen.
Invoked to justify replacing horizontal-vertical scanning with polygon scanning.
domain assumption Joint spatial and frequency domain processing improves discrimination of small structures over spatial-only attention.
Basis for introducing SFCAM in skip connections.

pith-pipeline@v0.9.0 · 5626 in / 1470 out tokens · 67551 ms · 2026-05-12T03:48:07.501306+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we present a polygon scanning visual state space model (PS-VSS) to identify small vessel structural features by multi-layer reverse scanning way. Which effectively preserves pixels connectivity
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

a space-frequency collaborative attention mechanism (SFCAM) is introduced within the skip connection

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 2 internal anchors

[1]

A review of retinal vessel segmentation for fundus image analysis

QIN, Qing; CHEN, Yuanyuan. A review of retinal vessel segmentation for fundus image analysis. Engineering Applications of Artificial Intelligence, 2024, 128: 107454

work page 2024
[2]

A systematic review of retinal fundus image segmentation and classification methods using convolutional neural networks

ILESANMI, Ademola E.; ILESANMI, Taiwo; GBOTOSO, Gbenga A. A systematic review of retinal fundus image segmentation and classification methods using convolutional neural networks. Healthcare Analytics, 2023, 4: 100261

work page 2023
[3]

Deep learning for retinal vessel segmentation: a systematic review of techniques and applications

LIU, Zhihui, et al. Deep learning for retinal vessel segmentation: a systematic review of techniques and applications. Medical & Biological Engineering & Computing, 2025, 63.8: 2191-2208

work page 2025
[4]

Systematic review of retinal blood vessels segmentation based on AI-driven technique

VERMA, Prem Kumari; KAUR, Jagdeep. Systematic review of retinal blood vessels segmentation based on AI-driven technique. Journal of imaging informatics in medicine, 2024, 37.4: 1783-1799

work page 2024
[5]

U-net: Convolutional networks for biomedical image segmentation

RONNEBERGER, Olaf; FISCHER, Philipp; BROX, Thomas. U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. Cham: Springer international publishing, 2015. p. 234-241

work page 2015
[6]

G2ViT: Graph neural network-guided vision transformer enhanced network for retinal vessel and coronary angiograph segmentation

XU, Hao; WU, Yun. G2ViT: Graph neural network-guided vision transformer enhanced network for retinal vessel and coronary angiograph segmentation. Neural Networks, 2024, 176: 106356

work page 2024
[7]

Weak-mamba-unet: Visual mamba makes cnn and vit work better for scribble-based medical image segmentation

WANG, Ziyang, et al. Weak-mamba-unet: Visual mamba makes cnn and vit work better for scribble-based medical image segmentation. IEEE Transactions on Biomedical Engineering, 2026

work page 2026
[8]

Attention U-Net: Learning Where to Look for the Pancreas

LONG, Jonathan; SHELHAMER, Evan; DARRELL, Trevor. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 3431-3440. [9]OKTAY, Ozan, et al. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999, 2018. [10]ZHOU, Zongwei, et al. Un...

work page internal anchor Pith review arXiv 2015
[9]

Curvilinear object segmentation in medical images based on ODoS filter and deep learning network: Y

PENG, Yuanyuan, et al. Curvilinear object segmentation in medical images based on ODoS filter and deep learning network: Y. Peng et al. Applied Intelligence, 2023, 53.20: 23470-23481

work page 2023
[10]

Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation

QI, Yaolei, et al. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision. 2023. p. 6070-6079

work page 2023
[11]

VasCA-Net: A Vascular Channel Attention Network for Retinal Vessel Segmentation

MA, Zhendi, et al. VasCA-Net: A Vascular Channel Attention Network for Retinal Vessel Segmentation. Expert Systems with Applications, 2025, 130591

work page 2025
[12]

Vision mamba: A comprehensive survey and taxonomy

LIU, Xiao, et al. Vision mamba: A comprehensive survey and taxonomy. IEEE Transactions on Neural Networks and Learning Systems, 2025

work page 2025
[13]

arXiv preprint arXiv:2402.05079 (2024)

WANG, Ziyang, et al. Mamba-unet: Unet-like pure visual mamba for medical image segmentation. arXiv preprint arXiv:2402.05079, 2024

work page arXiv 2024
[14]

Serp-mamba: Advancing high-resolution retinal vessel segmentation with selective state-space model

Wang, Hongqiu, et al. "Serp-mamba: Advancing high-resolution retinal vessel segmentation with selective state-space model." IEEE Transactions on Medical Imaging (2025)

work page 2025
[15]

S ³ -mamba: Small-size-sensitive mamba for lesion segmentation

WANG, Gui, et al. S ³ -mamba: Small-size-sensitive mamba for lesion segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2025. p. 7655-7664

work page 2025
[16]

Mamba-sea: A mamba-based framework with global-to-local sequence augmentation for generalizable medical image segmentation

CHENG, Zihan, et al. Mamba-sea: A mamba-based framework with global-to-local sequence augmentation for generalizable medical image segmentation. IEEE Transactions on Medical Imaging, 2025

work page 2025
[17]

U-shape mamba: State space model for faster diffusion

ERGASTI, Alex, et al. U-shape mamba: State space model for faster diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2025. p. 3276-3283

work page 2025
[18]

GLMamba: A Global– Local Mamba Network for Efficient Remote Sensing Change Detection

LIU, Shengyan, et al. GLMamba: A Global– Local Mamba Network for Efficient Remote Sensing Change Detection. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2026

work page 2026
[19]

FSE-Mamba: A novel Frequency-Spatial Entanglement Mamba model for retinal vessel segmentation

work page
[20]

ZigMa: A DiT-style Zigzag Mamba Diffusion Model

work page
[21]

DS-Mamba: Dynamic snake visual state space model for vessel segmentation

LIU, Zixuan, et al. DS-Mamba: Dynamic snake visual state space model for vessel segmentation. Biomedical Signal Processing and Control, 2026, 119: 109783. [27]TAN, Tengfei, et al. Lightweight pyramid network with spatial attention mechanism for accurate retinal vessel segmentation. International Journal of Computer Assisted Radiology and Surgery, 2021, 16...

work page 2026
[22]

RCAR-UNet: Retinal vessel segmentation network algorithm via novel rough attention mechanism

DING, Weiping, et al. RCAR-UNet: Retinal vessel segmentation network algorithm via novel rough attention mechanism. Information Sciences, 2024, 657: 120007

work page 2024
[23]

CCS-UNet: a cross-channel spatial attention model for accurate retinal vessel segmentation

ZHU, Yong-fei, et al. CCS-UNet: a cross-channel spatial attention model for accurate retinal vessel segmentation. Biomedical Optics Express, 2023, 14.9: 4739-4758

work page 2023
[24]

MambaFuse: Fusing Multi-scale Mamba and CNN Features for Seizure Prediction

PAN, Ding; LUO, Guibo; ZHU, Yuesheng. MambaFuse: Fusing Multi-scale Mamba and CNN Features for Seizure Prediction. In: International Conference on Neural Information Processing. Singapore: Springer Nature Singapore, 2024. p. 399-413

work page 2024
[25]

MSFE-Mamba: Multi-scale Frequency-Enhanced Mamba for Hyperspectral Image Classification

MSHI, Rui, et al. MSFE-Mamba: Multi-scale Frequency-Enhanced Mamba for Hyperspectral Image Classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2026. [33]Thin vessel segmentation in fundus images using attention UNet and modified Frangi filtering [34]A retinal vessel segmentation network approach based on rough...

work page 2026
[26]

Retina blood vessels segmentation and classification with the multi-featured approach

BHIMAVARAPU, Usharani. Retina blood vessels segmentation and classification with the multi-featured approach. Journal of Imaging Informatics in Medicine, 2025, 38.1: 520-533

work page 2025
[27]

Topology-joint curvilinear segmentation network using confidence-based Bezier topological representation

LI, Jianwei, et al. Topology-joint curvilinear segmentation network using confidence-based Bezier topological representation. Engineering Applications of Artificial Intelligence, 2025, 143: 110045

work page 2025
[28]

ResMU-Net: Residual Multi-kernel U-Net for blood vessel segmentation in retinal fundus images

PANCHAL, Sachin; KOKARE, Manesh. ResMU-Net: Residual Multi-kernel U-Net for blood vessel segmentation in retinal fundus images. Biomedical Signal Processing and Control, 2024, 90: 105859

work page 2024
[29]

Multi-level attention network for retinal vessel segmentation

YUAN, Yuchen, et al. Multi-level attention network for retinal vessel segmentation. IEEE Journal of Biomedical and Health Informatics, 2021, 26.1: 312-323

work page 2021
[30]

Global transformer and dual local attention network via deep-shallow hierarchical feature fusion for retinal vessel segmentation

LI, Yang, et al. Global transformer and dual local attention network via deep-shallow hierarchical feature fusion for retinal vessel segmentation. IEEE Transactions on Cybernetics, 2022, 53.9: 5826-5839

work page 2022
[31]

Pa-net: A hybrid architecture for retinal vessel segmentation

LUO, Xuebing, et al. Pa-net: A hybrid architecture for retinal vessel segmentation. Pattern Recognition, 2025, 161: 111254

work page 2025
[32]

MSVENet: multi-scale vascular enhancement network for retinal vessel segmentation

CAO, Kang; MA, Hui. MSVENet: multi-scale vascular enhancement network for retinal vessel segmentation. Biomedical Signal Processing and Control, 2025, 110: 108272

work page 2025
[33]

MSFB-Net: Multi-scale frequency compound attention and cross-layer boundary optimization for medical image segmentation

MA, Changlong, et al. MSFB-Net: Multi-scale frequency compound attention and cross-layer boundary optimization for medical image segmentation. Biomedical Signal Processing and Control, 2026, 112: 108405

work page 2026
[34]

Frequency-Enhanced Wavelet Transformer Based Decoder for Medical Image Segmentation

YIN, Haitao; XU, Yongchang. Frequency-Enhanced Wavelet Transformer Based Decoder for Medical Image Segmentation. Pattern Recognition, 2026, 113198

work page 2026
[35]

WFDENet: Wavelet-based frequency decomposition and enhancement network for diabetic retinopathy lesion segmentation

LI, Xuan; MA, Ding; WU, Xiangqian. WFDENet: Wavelet-based frequency decomposition and enhancement network for diabetic retinopathy lesion segmentation. Pattern Recognition, 2026, 172: 112492

work page 2026
[36]

Decoding with structured awareness: integrating directional, frequency-spatial, and structural attention for medical image segmentation

ZHANG, Fan; GU, Zhiwei; WANG, Hua. Decoding with structured awareness: integrating directional, frequency-spatial, and structural attention for medical image segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2026. p. 12421-12429

work page 2026
[37]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

GU, Albert; DAO, Tri. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[38]

Vmamba: Visual state space model

LIU, Yue, et al. Vmamba: Visual state space model. Advances in neural information processing systems, 2024, 37: 103031-103063

work page 2024