pith. machine review for the scientific record. sign in

arxiv: 2605.10581 · v1 · submitted 2026-05-11 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Polygon-mamba: Retinal vessel segmentation using polygon scanning mamba and space-frequency collaborative attention

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:48 UTC · model grok-4.3

classification 💻 cs.CV
keywords retinal vessel segmentationMambapolygon scanningspace-frequency attentionsmall vessel detectionhybrid CNN-Mamba networkmedical image analysisocular disease diagnosis
0
0 comments X

The pith

Polygon scanning mamba maintains connectivity of small retinal vessels during segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that a hybrid CNN-Mamba network using polygon scanning can segment small retinal vessels more accurately by avoiding breaks in their structure. A reader would care because small vessels are key for diagnosing eye diseases like retinopathy, and current methods often miss or fragment them. The approach uses multi-directional reverse scanning in the Mamba component to keep pixels connected and adds attention that mixes spatial position with frequency details to focus on important features and reduce noise. This combination is tested on three standard retinal image datasets and shows strong results in detecting fine vessels.

Core claim

We design a hybrid CNN-Mamba fusion network that integrates polygon scanning mamba and space-frequency collaborative attention mechanism for the detection of small vessels. The polygon scanning visual state space model uses multi-layer reverse scanning to identify small vessel structural features and preserve pixel connectivity, mitigating information loss. The space-frequency collaborative attention mechanism extracts efficient features from spatial and frequency domains to dynamically enhance key features and suppress clutter.

What carries the argument

Polygon scanning visual state space model (PS-VSS) using multi-layer reverse scanning to preserve connectivity in small vessel structures

Load-bearing premise

The polygon scanning and space-frequency attention mechanisms will continue to preserve small vessel connectivity and enhance features effectively on retinal datasets beyond the three used in the study without introducing artifacts.

What would settle it

Running the model on a fourth independent retinal vessel dataset and checking if small vessels show continuous paths matching the annotations or if new breaks and false positives appear.

read the original abstract

Retinal vessel segmentation is crucial for diagnosis and assessment of ocular diseases. Notably, segmentation of small retinal vessels has been consistently recognized as a challenging and complex task. To tackle this challenge, we design a hybrid CNN-Mamba fusion network that integrates polygon scanning mamba and space-frequency collaborative attention mechanism for the detection of small vessels. Considering that the traditional mamba architecture with horizontal-vertical scanning may compromise the topological integrity of target structures and result in local discontinuities in small retinal vessels, we present a polygon scanning visual state space model (PS-VSS) to identify small vessel structural features by multi-layer reverse scanning way. Which effectively preserves pixels connectivity, thereby substantially mitigating the loss of information pertaining to small vessels. Furthermore, as we all known that the spatial domain prioritizes positional and structural information, while the frequency domain emphasizes global perception and local detail components, a space-frequency collaborative attention mechanism (SFCAM) is introduced within the skip connection to extract efficient features from the spatial and frequency domains. This strategy empowers the model to dynamically enhance the key features while effectively suppressing clutters. To assess the efficacy of our model, it was tested on three publicly available datasets: DRIVE, STARE, and CHASE_DB1. Compared to manual annotations, our model demonstrated F1 scores of 0.8283, 0.8282, and 0.8251, Area Under Curve (AUC) values of 0.9806, 0.9840, and 0.9866, and Sensitivity (SE) values of of 0.8268, 0.8314, and 0.8484 across three datasets, respectively. The effectiveness of our model was validated through both visual inspection and quantitative analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript presents Polygon-Mamba, a hybrid CNN-Mamba fusion network for retinal vessel segmentation. It introduces a Polygon Scanning Visual State Space (PS-VSS) module that employs multi-layer reverse polygon scanning to preserve topological connectivity of small vessels (addressing discontinuities from standard horizontal-vertical Mamba scans), and a Space-Frequency Collaborative Attention Mechanism (SFCAM) placed in skip connections to fuse spatial positional information with frequency-domain global and local details. The model is evaluated on the DRIVE, STARE, and CHASE_DB1 datasets, reporting F1 scores of 0.8283/0.8282/0.8251, AUC values of 0.9806/0.9840/0.9866, and sensitivities of 0.8268/0.8314/0.8484.

Significance. If the performance gains can be rigorously attributed to the polygon scanning and space-frequency fusion rather than training details or the base CNN-Mamba backbone, the work would provide a useful direction for maintaining vessel continuity in thin-structure segmentation tasks. The Mamba-based scanning offers an efficiency-oriented alternative to attention mechanisms, and the hybrid design could inform subsequent medical imaging models that prioritize both local connectivity and global context.

major comments (3)
  1. [§4 Experiments] §4 Experiments and §4.2 Ablation Study (if present): No ablation experiments are described that isolate the PS-VSS polygon scanning (e.g., by replacing it with standard raster or bidirectional scanning on the identical backbone) or the SFCAM (e.g., by substituting standard spatial attention or removing the frequency branch). Without these, the reported F1/SE improvements cannot be attributed to the claimed connectivity preservation and artifact suppression rather than hyperparameter choices or the overall architecture.
  2. [§4.1 Quantitative Results] §4.1 Quantitative Results: The evaluation reports only standard pixel-wise metrics (F1, AUC, SE) and visual inspection; no topology-specific metrics (e.g., connected-component counts, Betti numbers, or vessel continuity scores) or direct side-by-side comparisons of scanning orders are provided to verify that multi-layer reverse polygon scanning specifically reduces discontinuities in small vessels compared with conventional Mamba scanning.
  3. [§4 Experiments] §4 and §3 Method: Training procedures, data augmentation, hyperparameter search, baseline re-implementations (versus literature numbers), and statistical significance tests for the metric differences are not detailed. This information is load-bearing for assessing whether the gains on DRIVE/STARE/CHASE_DB1 are robust and generalizable beyond the three tested datasets.
minor comments (3)
  1. [Abstract] Abstract: The phrase 'as we all known that' is grammatically incorrect and should be revised to 'as is well known' or equivalent.
  2. [Abstract] Abstract: Duplicate word 'of of' appears in 'Sensitivity (SE) values of of 0.8268'.
  3. [§3 Method] Method section: Module acronyms (PS-VSS, SFCAM) and the exact polygon scanning directions should be accompanied by a clear diagram or pseudocode for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We sincerely thank the referee for the constructive and detailed feedback. We have carefully reviewed each major comment and will incorporate revisions to address the concerns regarding experimental validation, metrics, and reproducibility.

read point-by-point responses
  1. Referee: [§4 Experiments] §4 Experiments and §4.2 Ablation Study (if present): No ablation experiments are described that isolate the PS-VSS polygon scanning (e.g., by replacing it with standard raster or bidirectional scanning on the identical backbone) or the SFCAM (e.g., by substituting standard spatial attention or removing the frequency branch). Without these, the reported F1/SE improvements cannot be attributed to the claimed connectivity preservation and artifact suppression rather than hyperparameter choices or the overall architecture.

    Authors: We agree that dedicated ablation studies are necessary to rigorously attribute performance gains to the polygon scanning in PS-VSS and the space-frequency fusion in SFCAM. The current manuscript validates the full model via SOTA comparisons and visuals but does not isolate these components. In the revised manuscript, we will add a dedicated ablation subsection with: (i) PS-VSS replaced by standard horizontal-vertical or bidirectional scanning on the identical backbone, and (ii) SFCAM replaced by standard spatial attention or with the frequency branch removed. Results will be reported on all three datasets to support the connectivity and artifact-suppression claims. revision: yes

  2. Referee: [§4.1 Quantitative Results] §4.1 Quantitative Results: The evaluation reports only standard pixel-wise metrics (F1, AUC, SE) and visual inspection; no topology-specific metrics (e.g., connected-component counts, Betti numbers, or vessel continuity scores) or direct side-by-side comparisons of scanning orders are provided to verify that multi-layer reverse polygon scanning specifically reduces discontinuities in small vessels compared with conventional Mamba scanning.

    Authors: We acknowledge that topology-aware metrics would provide stronger quantitative support for the connectivity-preservation benefit of multi-layer reverse polygon scanning. The manuscript currently relies on standard metrics plus qualitative visual evidence of improved small-vessel continuity. In the revision, we will add topology-specific evaluations, including connected-component counts and a vessel continuity score, together with direct side-by-side quantitative and visual comparisons of polygon versus standard raster scanning on representative images from DRIVE, STARE, and CHASE_DB1. revision: yes

  3. Referee: [§4 Experiments] §4 and §3 Method: Training procedures, data augmentation, hyperparameter search, baseline re-implementations (versus literature numbers), and statistical significance tests for the metric differences are not detailed. This information is load-bearing for assessing whether the gains on DRIVE/STARE/CHASE_DB1 are robust and generalizable beyond the three tested datasets.

    Authors: We agree that expanded methodological and experimental details are required for reproducibility and to substantiate the reported gains. The original manuscript provides only high-level descriptions. In the revised version we will: (i) detail all data-augmentation strategies, optimizer settings, learning-rate schedules, and hyperparameter search procedure in Section 3; (ii) clarify baseline re-implementations versus literature-reported numbers; and (iii) include statistical significance tests (e.g., paired t-tests with p-values) for metric differences across the three datasets, along with a brief discussion of generalizability limitations. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical network design with independent validation

full rationale

The paper is an empirical deep-learning contribution that proposes a hybrid CNN-Mamba architecture (PS-VSS polygon scanning and SFCAM space-frequency attention) and evaluates it via standard training on public retinal datasets (DRIVE, STARE, CHASE_DB1), reporting F1/AUC/SE metrics. No derivation chain, equations, or predictions are presented that reduce by construction to fitted inputs, self-citations, or ansatzes. Performance numbers arise from gradient descent on held-out test splits rather than any self-referential loop. The work is therefore self-contained against external benchmarks with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so specific free parameters and training details are unknown. The work rests on standard deep-learning assumptions about feature extraction and long-range modeling rather than new physical postulates.

axioms (2)
  • domain assumption Mamba-based state space models can capture long-range dependencies in 2D images when an appropriate scanning order is chosen.
    Invoked to justify replacing horizontal-vertical scanning with polygon scanning.
  • domain assumption Joint spatial and frequency domain processing improves discrimination of small structures over spatial-only attention.
    Basis for introducing SFCAM in skip connections.

pith-pipeline@v0.9.0 · 5626 in / 1470 out tokens · 67551 ms · 2026-05-12T03:48:07.501306+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 2 internal anchors

  1. [1]

    A review of retinal vessel segmentation for fundus image analysis

    QIN, Qing; CHEN, Yuanyuan. A review of retinal vessel segmentation for fundus image analysis. Engineering Applications of Artificial Intelligence, 2024, 128: 107454

  2. [2]

    A systematic review of retinal fundus image segmentation and classification methods using convolutional neural networks

    ILESANMI, Ademola E.; ILESANMI, Taiwo; GBOTOSO, Gbenga A. A systematic review of retinal fundus image segmentation and classification methods using convolutional neural networks. Healthcare Analytics, 2023, 4: 100261

  3. [3]

    Deep learning for retinal vessel segmentation: a systematic review of techniques and applications

    LIU, Zhihui, et al. Deep learning for retinal vessel segmentation: a systematic review of techniques and applications. Medical & Biological Engineering & Computing, 2025, 63.8: 2191-2208

  4. [4]

    Systematic review of retinal blood vessels segmentation based on AI-driven technique

    VERMA, Prem Kumari; KAUR, Jagdeep. Systematic review of retinal blood vessels segmentation based on AI-driven technique. Journal of imaging informatics in medicine, 2024, 37.4: 1783-1799

  5. [5]

    U-net: Convolutional networks for biomedical image segmentation

    RONNEBERGER, Olaf; FISCHER, Philipp; BROX, Thomas. U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. Cham: Springer international publishing, 2015. p. 234-241

  6. [6]

    G2ViT: Graph neural network-guided vision transformer enhanced network for retinal vessel and coronary angiograph segmentation

    XU, Hao; WU, Yun. G2ViT: Graph neural network-guided vision transformer enhanced network for retinal vessel and coronary angiograph segmentation. Neural Networks, 2024, 176: 106356

  7. [7]

    Weak-mamba-unet: Visual mamba makes cnn and vit work better for scribble-based medical image segmentation

    WANG, Ziyang, et al. Weak-mamba-unet: Visual mamba makes cnn and vit work better for scribble-based medical image segmentation. IEEE Transactions on Biomedical Engineering, 2026

  8. [8]

    Attention U-Net: Learning Where to Look for the Pancreas

    LONG, Jonathan; SHELHAMER, Evan; DARRELL, Trevor. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 3431-3440. [9]OKTAY, Ozan, et al. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999, 2018. [10]ZHOU, Zongwei, et al. Un...

  9. [9]

    Curvilinear object segmentation in medical images based on ODoS filter and deep learning network: Y

    PENG, Yuanyuan, et al. Curvilinear object segmentation in medical images based on ODoS filter and deep learning network: Y. Peng et al. Applied Intelligence, 2023, 53.20: 23470-23481

  10. [10]

    Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation

    QI, Yaolei, et al. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision. 2023. p. 6070-6079

  11. [11]

    VasCA-Net: A Vascular Channel Attention Network for Retinal Vessel Segmentation

    MA, Zhendi, et al. VasCA-Net: A Vascular Channel Attention Network for Retinal Vessel Segmentation. Expert Systems with Applications, 2025, 130591

  12. [12]

    Vision mamba: A comprehensive survey and taxonomy

    LIU, Xiao, et al. Vision mamba: A comprehensive survey and taxonomy. IEEE Transactions on Neural Networks and Learning Systems, 2025

  13. [13]

    arXiv preprint arXiv:2402.05079 (2024)

    WANG, Ziyang, et al. Mamba-unet: Unet-like pure visual mamba for medical image segmentation. arXiv preprint arXiv:2402.05079, 2024

  14. [14]

    Serp-mamba: Advancing high-resolution retinal vessel segmentation with selective state-space model

    Wang, Hongqiu, et al. "Serp-mamba: Advancing high-resolution retinal vessel segmentation with selective state-space model." IEEE Transactions on Medical Imaging (2025)

  15. [15]

    S ³ -mamba: Small-size-sensitive mamba for lesion segmentation

    WANG, Gui, et al. S ³ -mamba: Small-size-sensitive mamba for lesion segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2025. p. 7655-7664

  16. [16]

    Mamba-sea: A mamba-based framework with global-to-local sequence augmentation for generalizable medical image segmentation

    CHENG, Zihan, et al. Mamba-sea: A mamba-based framework with global-to-local sequence augmentation for generalizable medical image segmentation. IEEE Transactions on Medical Imaging, 2025

  17. [17]

    U-shape mamba: State space model for faster diffusion

    ERGASTI, Alex, et al. U-shape mamba: State space model for faster diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2025. p. 3276-3283

  18. [18]

    GLMamba: A Global– Local Mamba Network for Efficient Remote Sensing Change Detection

    LIU, Shengyan, et al. GLMamba: A Global– Local Mamba Network for Efficient Remote Sensing Change Detection. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2026

  19. [19]

    FSE-Mamba: A novel Frequency-Spatial Entanglement Mamba model for retinal vessel segmentation

  20. [20]

    ZigMa: A DiT-style Zigzag Mamba Diffusion Model

  21. [21]

    DS-Mamba: Dynamic snake visual state space model for vessel segmentation

    LIU, Zixuan, et al. DS-Mamba: Dynamic snake visual state space model for vessel segmentation. Biomedical Signal Processing and Control, 2026, 119: 109783. [27]TAN, Tengfei, et al. Lightweight pyramid network with spatial attention mechanism for accurate retinal vessel segmentation. International Journal of Computer Assisted Radiology and Surgery, 2021, 16...

  22. [22]

    RCAR-UNet: Retinal vessel segmentation network algorithm via novel rough attention mechanism

    DING, Weiping, et al. RCAR-UNet: Retinal vessel segmentation network algorithm via novel rough attention mechanism. Information Sciences, 2024, 657: 120007

  23. [23]

    CCS-UNet: a cross-channel spatial attention model for accurate retinal vessel segmentation

    ZHU, Yong-fei, et al. CCS-UNet: a cross-channel spatial attention model for accurate retinal vessel segmentation. Biomedical Optics Express, 2023, 14.9: 4739-4758

  24. [24]

    MambaFuse: Fusing Multi-scale Mamba and CNN Features for Seizure Prediction

    PAN, Ding; LUO, Guibo; ZHU, Yuesheng. MambaFuse: Fusing Multi-scale Mamba and CNN Features for Seizure Prediction. In: International Conference on Neural Information Processing. Singapore: Springer Nature Singapore, 2024. p. 399-413

  25. [25]

    MSFE-Mamba: Multi-scale Frequency-Enhanced Mamba for Hyperspectral Image Classification

    MSHI, Rui, et al. MSFE-Mamba: Multi-scale Frequency-Enhanced Mamba for Hyperspectral Image Classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2026. [33]Thin vessel segmentation in fundus images using attention UNet and modified Frangi filtering [34]A retinal vessel segmentation network approach based on rough...

  26. [26]

    Retina blood vessels segmentation and classification with the multi-featured approach

    BHIMAVARAPU, Usharani. Retina blood vessels segmentation and classification with the multi-featured approach. Journal of Imaging Informatics in Medicine, 2025, 38.1: 520-533

  27. [27]

    Topology-joint curvilinear segmentation network using confidence-based Bezier topological representation

    LI, Jianwei, et al. Topology-joint curvilinear segmentation network using confidence-based Bezier topological representation. Engineering Applications of Artificial Intelligence, 2025, 143: 110045

  28. [28]

    ResMU-Net: Residual Multi-kernel U-Net for blood vessel segmentation in retinal fundus images

    PANCHAL, Sachin; KOKARE, Manesh. ResMU-Net: Residual Multi-kernel U-Net for blood vessel segmentation in retinal fundus images. Biomedical Signal Processing and Control, 2024, 90: 105859

  29. [29]

    Multi-level attention network for retinal vessel segmentation

    YUAN, Yuchen, et al. Multi-level attention network for retinal vessel segmentation. IEEE Journal of Biomedical and Health Informatics, 2021, 26.1: 312-323

  30. [30]

    Global transformer and dual local attention network via deep-shallow hierarchical feature fusion for retinal vessel segmentation

    LI, Yang, et al. Global transformer and dual local attention network via deep-shallow hierarchical feature fusion for retinal vessel segmentation. IEEE Transactions on Cybernetics, 2022, 53.9: 5826-5839

  31. [31]

    Pa-net: A hybrid architecture for retinal vessel segmentation

    LUO, Xuebing, et al. Pa-net: A hybrid architecture for retinal vessel segmentation. Pattern Recognition, 2025, 161: 111254

  32. [32]

    MSVENet: multi-scale vascular enhancement network for retinal vessel segmentation

    CAO, Kang; MA, Hui. MSVENet: multi-scale vascular enhancement network for retinal vessel segmentation. Biomedical Signal Processing and Control, 2025, 110: 108272

  33. [33]

    MSFB-Net: Multi-scale frequency compound attention and cross-layer boundary optimization for medical image segmentation

    MA, Changlong, et al. MSFB-Net: Multi-scale frequency compound attention and cross-layer boundary optimization for medical image segmentation. Biomedical Signal Processing and Control, 2026, 112: 108405

  34. [34]

    Frequency-Enhanced Wavelet Transformer Based Decoder for Medical Image Segmentation

    YIN, Haitao; XU, Yongchang. Frequency-Enhanced Wavelet Transformer Based Decoder for Medical Image Segmentation. Pattern Recognition, 2026, 113198

  35. [35]

    WFDENet: Wavelet-based frequency decomposition and enhancement network for diabetic retinopathy lesion segmentation

    LI, Xuan; MA, Ding; WU, Xiangqian. WFDENet: Wavelet-based frequency decomposition and enhancement network for diabetic retinopathy lesion segmentation. Pattern Recognition, 2026, 172: 112492

  36. [36]

    Decoding with structured awareness: integrating directional, frequency-spatial, and structural attention for medical image segmentation

    ZHANG, Fan; GU, Zhiwei; WANG, Hua. Decoding with structured awareness: integrating directional, frequency-spatial, and structural attention for medical image segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2026. p. 12421-12429

  37. [37]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    GU, Albert; DAO, Tri. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023

  38. [38]

    Vmamba: Visual state space model

    LIU, Yue, et al. Vmamba: Visual state space model. Advances in neural information processing systems, 2024, 37: 103031-103063