arxiv: 2604.06658 · v1 · submitted 2026-04-08 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

GPAFormer: Graph-guided Patch Aggregation Transformer for Efficient 3D Medical Image Segmentation

Chung-Ming Lo , I-Yun Liu , Wei-Yang Lin

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:09 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D medical image segmentationtransformerlightweight networkmulti-scale attentiongraph aggregationCT MRImulti-organ segmentation

0 comments

The pith

GPAFormer achieves highest Dice scores on four 3D medical segmentation benchmarks using only 1.81 million parameters and sub-second inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GPAFormer as a lightweight transformer architecture for 3D medical image segmentation across CT and MRI modalities. It builds two core modules into the network: MASA, which processes features through three parallel paths of differing receptive fields and aggregates them planarly, and MPGA, which forms dynamic graphs over patches using feature similarity plus spatial adjacency to group related regions. Experiments on the BTCV, Synapse, ACDC, and BraTS datasets show this design reaching top DSC values while using far fewer parameters than prior networks and completing inference in under one second on consumer GPUs. The central goal is to deliver accurate multi-organ segmentation in settings where compute and time are limited. If the modules work as described, the network becomes practical for whole-body scans in resource-constrained clinical environments.

Core claim

GPAFormer with its MASA module for multi-scale stacked aggregation and MPGA module for mutual-aware graph-based patch aggregation attains the highest reported DSC scores of 75.70 percent on BTCV, 81.20 percent on Synapse, 89.32 percent on ACDC, and 82.74 percent on BraTS while using only 1.81 million parameters and requiring less than one second of inference time per validation case on consumer hardware.

What carries the argument

The MASA module, which runs three parallel paths with different receptive fields and combines them via planar aggregation, together with the MPGA module, which builds graphs over patches using inter-patch feature similarity and spatial adjacency to aggregate similar regions dynamically.

Load-bearing premise

The performance gains arise mainly from the MASA and MPGA modules and will persist on new clinical data without dataset-specific tuning or overfitting to the four public benchmarks.

What would settle it

Evaluating the trained GPAFormer on an independent 3D medical dataset collected with different scanners or patient demographics and observing whether its Dice scores fall below those of established methods such as nnU-Net.

read the original abstract

Deep learning has been widely applied to 3D medical image segmentation tasks. However, due to the diversity of imaging modalities, the high-dimensional nature of the data, and the heterogeneity of anatomical structures, achieving both segmentation accuracy and computational efficiency in multi-organ segmentation remains a challenge. This study proposed GPAFormer, a lightweight network architecture specifically designed for 3D medical image segmentation, emphasizing efficiency while keeping high accuracy. GPAFormer incorporated two core modules: the multi-scale attention-guided stacked aggregation (MASA) and the mutual-aware patch graph aggregator (MPGA). MASA utilized three parallel paths with different receptive fields, combined through planar aggregation, to enhance the network's capability in handling structures of varying sizes. MPGA employed a graph-guided approach to dynamically aggregate regions with similar feature distributions based on inter-patch feature similarity and spatial adjacency, thereby improving the discrimination of both internal and boundary structures of organs. Experiments were performed on public whole-body CT and MRI datasets including BTCV, Synapse, ACDC, and BraTS. Compared to the existed 3D segmentation networkd, GPAFormer using only 1.81 M parameters achieved overall highest DSC on BTCV (75.70%), Synapse (81.20%), ACDC (89.32%), and BraTS (82.74%). Using consumer level GPU, the inference time for one validation case of BTCV spent less than one second. The results demonstrated that GPAFormer balanced accuracy and efficiency in multi-organ, multi-modality 3D segmentation tasks across various clinical scenarios especially for resource-constrained and time-sensitive clinical environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes GPAFormer, a lightweight transformer architecture for 3D medical image segmentation that integrates two modules—MASA (multi-scale attention-guided stacked aggregation) for handling structures of varying sizes via parallel paths and planar aggregation, and MPGA (mutual-aware patch graph aggregator) for dynamic graph-guided patch aggregation based on feature similarity and spatial adjacency. It reports state-of-the-art Dice Similarity Coefficient (DSC) scores on BTCV (75.70%), Synapse (81.20%), ACDC (89.32%), and BraTS (82.74%) using only 1.81M parameters, with inference under one second per case on consumer GPUs, positioning the model as efficient for resource-constrained clinical settings.

Significance. If the performance gains can be rigorously attributed to the proposed modules through controlled experiments, this would represent a meaningful contribution to efficient 3D segmentation, addressing the practical need for high-accuracy models that run on modest hardware across CT and MRI modalities. The low parameter count and fast inference are particularly relevant for deployment in time-sensitive or edge-computing clinical environments.

major comments (3)

[Abstract and Experimental Results] Abstract and Experimental Results section: The headline DSC scores (e.g., 75.70% on BTCV) are presented without any ablation studies, standard deviations, statistical tests, or details on the number of training runs. This makes it impossible to determine whether the gains are statistically meaningful or attributable to MASA/MPGA rather than differences in training recipes.
[Experimental Results] Experimental Results section: On small medical datasets (BTCV/Synapse with ~30 training cases), the reported superiority over baselines requires explicit confirmation that all models were trained with identical data augmentation, optimizer schedules, patch sampling, and splits. Published baseline numbers often reflect different hyper-parameter regimes that can produce 2–4% DSC swings, undermining attribution to the graph-guided aggregation.
[Ablation Studies] Ablation Studies (or lack thereof): The central claim that MASA and MPGA enable the high DSC at 1.81M parameters is load-bearing but unsupported without quantitative ablations showing performance drops when either module is removed or replaced. Without these, the efficiency-accuracy balance cannot be confidently credited to the architectural innovations.

minor comments (2)

[Abstract] Abstract: Typographical errors ('existed 3D segmentation networkd' should read 'existing 3D segmentation networks').
[Abstract] Abstract: The inference-time claim ('less than one second') should specify exact hardware, input resolution, and batch size to allow reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for the thorough and constructive review. We have carefully addressed each major comment below and will revise the manuscript accordingly to improve the transparency and rigor of the experimental validation.

read point-by-point responses

Referee: [Abstract and Experimental Results] Abstract and Experimental Results section: The headline DSC scores (e.g., 75.70% on BTCV) are presented without any ablation studies, standard deviations, statistical tests, or details on the number of training runs. This makes it impossible to determine whether the gains are statistically meaningful or attributable to MASA/MPGA rather than differences in training recipes.

Authors: We agree that additional statistical details would strengthen the presentation of results. The abstract is length-constrained, but in the revised manuscript we will expand the Experimental Results section to report standard deviations from three independent training runs with different random seeds and explicitly state the number of runs performed. Formal statistical tests were not included in the original submission owing to the high computational cost on 3D medical datasets, but we will add error bars to all tables to convey run-to-run variability and support attribution of gains to the proposed modules rather than training differences. revision: yes
Referee: [Experimental Results] Experimental Results section: On small medical datasets (BTCV/Synapse with ~30 training cases), the reported superiority over baselines requires explicit confirmation that all models were trained with identical data augmentation, optimizer schedules, patch sampling, and splits. Published baseline numbers often reflect different hyper-parameter regimes that can produce 2–4% DSC swings, undermining attribution to the graph-guided aggregation.

Authors: We confirm that every baseline was re-implemented and trained from scratch using identical data augmentation, optimizer (AdamW with the same schedule), patch sampling strategy, and train/validation splits as described in Section 4.1. To eliminate any remaining ambiguity, the revised manuscript will include an explicit statement in the Experimental Results section together with a supplementary table that lists the precise hyper-parameters applied to each compared method. revision: yes
Referee: [Ablation Studies] Ablation Studies (or lack thereof): The central claim that MASA and MPGA enable the high DSC at 1.81M parameters is load-bearing but unsupported without quantitative ablations showing performance drops when either module is removed or replaced. Without these, the efficiency-accuracy balance cannot be confidently credited to the architectural innovations.

Authors: We acknowledge that quantitative ablation studies are essential to substantiate the contribution of the proposed modules. In the revised manuscript we will add a dedicated ablation subsection that reports DSC, parameter count, and inference time for the full GPAFormer, the model without MASA, the model without MPGA, and variants in which each module is replaced by simpler alternatives. These controlled experiments will directly quantify the performance impact and thereby attribute the observed efficiency-accuracy trade-off to MASA and MPGA. revision: yes

Circularity Check

0 steps flagged

No derivation chain or mathematical predictions; results are empirical architecture evaluations.

full rationale

The paper proposes GPAFormer with MASA and MPGA modules for 3D medical segmentation and reports DSC metrics on BTCV, Synapse, ACDC, and BraTS. No equations, first-principles derivations, or 'predictions' of quantities are presented that could reduce to fitted inputs or self-definitions. Performance numbers are framed as direct experimental outcomes from training the network, with no load-bearing self-citations or ansatz smuggling in any derivation. The central claims rest on empirical results rather than any closed logical loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract contains no mathematical derivations, axioms, or new postulated entities; the work is an empirical neural-network design whose internal assumptions are not specified.

pith-pipeline@v0.9.0 · 5595 in / 1094 out tokens · 48570 ms · 2026-05-10T18:09:01.292980+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GPAFormer incorporated two core modules: the multi-scale attention-guided stacked aggregation (MASA) and the mutual-aware patch graph aggregator (MPGA).
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MPGA employed a graph-guided approach to dynamically aggregate regions with similar feature distributions based on inter-patch feature similarity and spatial adjacency

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 4 canonical work pages · 4 internal anchors

[1]

MICCAI 2015 Multi -Atlas Abdomen Labeling Challenge(BTCV) DATASET The BTCV dataset consists of 50 portal venous phase abdominal CT scans collected under the supervision of institutional review board from colorectal cancer and hernia studies. Image volumes range from 512×512×85 to 512×512×198, with fields of view between 280×280×280 mm³ and 500×500×650 mm³...

2015
[2]

Synapse provides labels for 8 major organs: aorta, gallbladder, left kidney, 4 right kidney, liver, pancreas, spleen, and stomach

Synapse multi-organ segmentation (Synpase) Dataset The Synapse dataset, a subset of BTCV , includes 30 portal venous phase CT scans with imaging characteristics consistent with BTCV. Synapse provides labels for 8 major organs: aorta, gallbladder, left kidney, 4 right kidney, liver, pancreas, spleen, and stomach
[3]

Images have an in-plane resolution of 1.34 –1.68 mm² /pixel and 5 mm slice thickness

AUTOMATED CARDIAC DIAGNOSIS CHALLENGE (ACDC) DATASET The ACDC dataset comprises cardiac MRIs of 150 patients collected at Dijon University Hospital over six years using 1.5T/3.0T scanners with steady -state free precession sequences under breath-hold. Images have an in-plane resolution of 1.34 –1.68 mm² /pixel and 5 mm slice thickness. The end-diastolic (...
[4]

In total, it comprises 750 four - dimensional (4D) volumes, of which 484 provide voxel - wise manual annotations for training and 266 are unlabeled

MULTIMODAL BRAIN TUMOR SEGMENTATION CHALLENGE (BraTS) DATASET The Medical Segmentation Decathlon (MSD) Task01_BrainTumour dataset is der ived from the multi - institutional pre-operative adult glioma cohorts of BraTS 2016–2017 and was released following standardized preprocessing—co-registration to a common anatomical template, resampling to 1 mm³ isotrop...

2016
[5]

2) adopted three independent and parallel feature extraction paths

multi-scale attention -guided stacked aggregation module (MASA) MASA (Fig. 2) adopted three independent and parallel feature extraction paths. After obtaining spatial structural information from diverse convolution kernels of different scales, the features were fused through planar 5 stacking and then enhanced with a self -attention mechanism, allowing mo...
[6]

mutual-aware patch graph aggregator module (MPGA) MPGA, as shown in Fig. 3, used a graph structure to represent the topology of spatial adjacency relationships 6 in the image, where each image patch is represented as a node and edges connect neighboring patches, forming a mutual-aware patch graph. This graph structure can explicitly rep resent the spatial...
[7]

The i-th row corresponds to the embedding vector pi ∈ ℝd of the i -th patch, i.e., G𝑛𝑜𝑑𝑒 = [p1; p2

Node feature matrix G𝑛𝑜𝑑𝑒 ∈ ℝN×d : all patch embedding vectors are used as node features to form matrix G𝑛𝑜𝑑𝑒. The i-th row corresponds to the embedding vector pi ∈ ℝd of the i -th patch, i.e., G𝑛𝑜𝑑𝑒 = [p1; p2; . . . ; pN]
[8]

Spatial adjacency matrix Aspatial ∈ ℝN×N : built based on the edge set described above, reflecting whether two patches are spatial neighbors
[9]

broaden then refine

Similarity matrix A𝑠𝑖𝑚 ∈ ℝN×N : during message passing, each node must retain its own feature, thus 𝐴𝑖𝑖 𝑠𝑖𝑚 = 1. For each edge (i, j), the edge weight is defined by the cosine similarity between patch embedding vectors, as shown in Equation (6): 𝐴𝑖𝑗 𝑠𝑖𝑚 = cos(θij) = pi ∙ pj ‖pi‖ ∙ ‖pj‖ (6) In Fig. 4, the element-wise multiplication of the spatial adjacenc...

2061
[10]

Advances in medical imaging techniques,

J. Rong and Y. Liu, "Advances in medical imaging techniques," BMC Methods, vol. 1, no. 1, p. 10, 2024

2024
[11]

Trends in use of medical imaging in US health care systems and in Ontario, Canada, 2000-2016,

R. Smith-Bindman et al., "Trends in use of medical imaging in US health care systems and in Ontario, Canada, 2000-2016," Jama, vol. 322, no. 9, pp. 843-856, 2019

2000
[12]

Projected US imaging utilization, 2025 to 2055,

E. W. Christensen, A. R. Drake, J. R. Parikh, E. M. Rubin, and E. Y. Rula, "Projected US imaging utilization, 2025 to 2055," Journal of the American College of Radiology, vol. 22, no. 2, pp. 151-158, 2025

2025
[13]

Principles of CT and CT technology,

L. W. Goldman, "Principles of CT and CT technology," Journal of nuclear medicine technology, vol. 35, no. 3, pp. 115-128, 2007

2007
[14]

Dual -energy CT of the abdomen: radiology in training,

S. Lennartz, N. G. Hokamp, and A. Kambadakone, "Dual -energy CT of the abdomen: radiology in training," Radiology, vol. 305, no. 1, pp. 19-27, 2022

2022
[15]

Cardiac imaging techniques for physicians: late enhancement,

P. Kellman and A. E. Arai, "Cardiac imaging techniques for physicians: late enhancement," Journal of magnetic resonance imaging, vol. 36, no. 3, pp. 529-542, 2012

2012
[16]

Inter-observer and inter-examination variability of manual vertebral b one attenuation measurements on computed tomography,

E. Pompe et al., "Inter-observer and inter-examination variability of manual vertebral b one attenuation measurements on computed tomography," European radiology, vol. 26, no. 9, pp. 3046 -3053, 2016

2016
[17]

Statistical shape models for 3D medical image segmentation: a review,

T. Heimann and H. -P. Meinzer, "Statistical shape models for 3D medical image segmentation: a review," Medical image analysis, vol. 13, no. 4, pp. 543-563, 2009

2009
[18]

Deep image guiding: guide knee ultrasound scanning using hierarchical classification and retrieval,

C.-M. Lo and K. -L. Lai, "Deep image guiding: guide knee ultrasound scanning using hierarchical classification and retrieval," IEEE Transactions on Instrumentation and Measurement, 2024

2024
[19]

Large -scale hierarchical medical image retrieval based on a multilevel convolutional neural network,

C.-M. Lo and C. -Y. Hsieh, "Large -scale hierarchical medical image retrieval based on a multilevel convolutional neural network," IEEE Transactions on Emerging Topics in Computational Intelligence, 2024

2024
[20]

Segnet: A deep convolutional encoder -decoder architecture for image segmentation,

V. Badrinarayanan, A. Kendall, and R. Cipolla, "Segnet: A deep convolutional encoder -decoder architecture for image segmentation," IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481-2495, 2017

2017
[21]

U -net: Convolutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, "U -net: Convolutional networks for biomedical image segmentation," in Medical image computing and computer -assisted intervention –MICCAI 2015: 18th interna tional conference, Munich, Germany, October 5 -9, 2015, proceedings, part III 18, 2015: Springer, pp. 234-241

2015
[22]

3D U-Net: learning dense volumetric segmentation from sparse annotatio n,

Ö . Ç içek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, "3D U-Net: learning dense volumetric segmentation from sparse annotatio n," in Medical Image Computing and Computer-Assisted Intervention –MICCAI 2016: 19th International Conference, Athens, Greece, October 17 -21, 2016, Proceedings, Part II 19, 2016: Springer, pp. 424-432

2016
[23]

nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation,

F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier- Hein, "nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation," Nature methods, vol. 18, no. 2, pp. 203-211, 2021

2021
[24]

Attention is all you need,

A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017

2017
[25]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy et al. , "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[26]

Unetr: Transformers for 3d medical image segmentation,

A. Hatamizadeh et al., "Unetr: Transformers for 3d medical image segmentation," in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2022, pp. 574-584

2022
[27]

Miccai multi -atlas labeling beyond the cranial vault – workshop and challen ge,

B. Landman, Z. Xu, J. Igelsias, M. Styner, T. Langerak, and A. Klein, "Miccai multi -atlas labeling beyond the cranial vault – workshop and challen ge," in Proc. MICCAI multi -atlas labeling beyond cranial vault—workshop challenge, 2015, vol. 5: Munich, Germany, p. 12

2015
[28]

nnFormer: volumetric medical image segmentation via a 3D transformer,

H.-Y. Zhou et al. , "nnFormer: volumetric medical image segmentation via a 3D transformer," IEEE transactions on image processing, vol. 32, pp. 4036-4045, 2023

2023
[29]

Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images,

A. Hatamizadeh, V. Nath, Y. Tang, D. Yang, H. R. Roth, and D. Xu, "Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images," in International MICCAI brainlesion workshop, 2022: Springer, pp. 272-284

2022
[30]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu et al., "Swin transformer: Hierarchical vision transformer using shifted windows," in Proceedings of the IEEE/CVF international conference on computer vision , 2021, pp. 10012 - 10022

2021
[31]

Self-supervised pre-training of swin transformers for 3d medical image analysis,

Y. Tang et al., "Self-supervised pre-training of swin transformers for 3d medical image analysis," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 20730-20740

2022
[32]

Segformer3d: an efficient transformer for 3d medical image segmentation,

S. Perera, P. Navard, and A. Yilmaz, "Segformer3d: an efficient transformer for 3d medical image segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 4981-4988

2024
[33]

SegFormer: Simple and efficient design for semantic segmentation with transformers,

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, "SegFormer: Simple and efficient design for semantic segmentation with transformers," Advances in neural information processing systems, vol. 34, pp. 12077-12090, 2021

2021
[34]

UNETR++: delving into efficient and accurate 3D medical image segmentation,

A. Shaker, M. Maaz, H. Rasheed, S. Khan, M.-H. Yang, and F. S. Khan, "UNETR++: delving into efficient and accurate 3D medical image segmentation," IEEE Transactions on Medical Imaging, vol. 43, no. 9, pp. 3377-3390, 2024

2024
[35]

Deep learning techniques for automatic MRI cardiac multi -structures segmentation and diagnosis: is the problem solved?,

O. Bernard et al., "Deep learning techniques for automatic MRI cardiac multi -structures segmentation and diagnosis: is the problem solved?," IEEE transactions on medical imaging, vol. 37, no. 11, pp. 2514-2525, 2018

2018
[36]

Brain tumor segmentation and radiomics survival prediction: Contribution to the brats 2017 challenge,

F. Isensee, P. Kickingereder, W. Wick, M. Bendszus, and K. H. Maier-Hein, "Brain tumor segmentation and radiomics survival prediction: Contribution to the brats 2017 challenge," in International MICCAI Brainlesion Workshop, 2017: Springer, pp. 287-297

2017
[37]

The graph neural network model,

F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, "The graph neural network model," IEEE transactions on neural networks, vol. 20, no. 1, pp. 61-80, 2008

2008
[38]

Semi-supervised learning with graph learning -convolutional networks,

B. Jiang, Z. Zhang, D. Lin, J. Tang, and B. Luo, "Semi-supervised learning with graph learning -convolutional networks," in 14 Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 11313-11320

2019
[39]

Metrics for e valuating 3D medical image segmentation: analysis, selection, and tool,

A. A. Taha and A. Hanbury, "Metrics for e valuating 3D medical image segmentation: analysis, selection, and tool," BMC medical imaging, vol. 15, pp. 1-28, 2015

2015
[40]

PyTorch: An Imperative Style, High-Performance Deep Learning Library

A. Paszke, "Pytorch: An imperative style, high -performance deep learning library," arXiv preprint arXiv:1912.01703, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1912
[41]

MONAI: An open-source framework for deep learning in healthcare

M. J. Cardoso et al., "Monai: An open-source framework for deep learning in healthcare," arXiv preprint arXiv:2211.02701, 2022

work page internal anchor Pith review arXiv 2022
[42]

Decoupled Weight Decay Regularization

I. Loshchilov and F. Hutter, "Decoupled weight decay regularization," arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[43]

Imagenet classification with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, 2012

2012
[44]

Gradient -based learning applied to d ocument recognition,

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient -based learning applied to d ocument recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 2002

2002
[45]

AI solutions for overcoming delays in telesurgery and telementoring to enhance surgical practice and education,

Y. Li, N. Raison, S. Ourselin, T. Mahmoodi, P. Dasgupta, and A. Granados, "AI solutions for overcoming delays in telesurgery and telementoring to enhance surgical practice and education," Journal of robotic surgery, vol. 18, no. 1, p. 403, 2024

2024
[46]

A survey on deep learning in medical image analysis,

G. Li tjens et al. , "A survey on deep learning in medical image analysis," Medical image analysis, vol. 42, pp. 60-88, 2017

2017
[47]

The role of artificial intelligence in medical imaging research,

X. Tang, "The role of artificial intelligence in medical imaging research," BJR| open, vol. 2, no. 1, p. 20190031, 2019

2019
[48]

A simple framework for contrastive learning of visual representations,

T. Che n, S. Kornblith, M. Norouzi, and G. Hinton, "A simple framework for contrastive learning of visual representations," in International conference on machine learning , 2020: PmLR, pp. 1597-1607

2020
[49]

Self -supervised visual feature learning with deep neural networks: A survey,

L. Jing and Y. Tian, "Self -supervised visual feature learning with deep neural networks: A survey," IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 11, pp. 4037-4058, 2020

2020
[50]

Clip-driven universal model for organ segmentation and tumor detection,

J. Liu et al., "Clip-driven universal model for organ segmentation and tumor detection," in Proceedings of the IEEE/CVF international conference on computer vision , 2023, pp. 21152 - 21164

2023