SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 Detection

Anastasios Arsenos; Dimitrios Kollias; James Wingate; Stefanos Kollias

arxiv: 2407.15728 · v2 · pith:KEMLUIYYnew · submitted 2024-07-22 · 📡 eess.IV · cs.CV

SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 Detection

Dimitrios Kollias , Anastasios Arsenos , James Wingate , Stefanos Kollias This is my paper

classification 📡 eess.IV cs.CV

keywords segmentationcovid-19scansdetectionlungsmodelsegmentapproach

0 comments

read the original abstract

This paper presents a new approach for effective segmentation of images that can be integrated into any model and methodology; the paradigm that we choose is classification of medical images (3-D chest CT scans) for Covid-19 detection. Our approach includes a combination of vision-language models that segment the CT scans, which are then fed to a deep neural architecture, named RACNet, for Covid-19 detection. In particular, a novel framework, named SAM2CLIP2SAM, is introduced for segmentation that leverages the strengths of both Segment Anything Model (SAM) and Contrastive Language-Image Pre-Training (CLIP) to accurately segment the right and left lungs in CT scans, subsequently feeding these segmented outputs into RACNet for classification of COVID-19 and non-COVID-19 cases. At first, SAM produces multiple part-based segmentation masks for each slice in the CT scan; then CLIP selects only the masks that are associated with the regions of interest (ROIs), i.e., the right and left lungs; finally SAM is given these ROIs as prompts and generates the final segmentation mask for the lungs. Experiments are presented across two Covid-19 annotated databases which illustrate the improved performance obtained when our method has been used for segmentation of the CT scans.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CLIP-Guided SAM: Parameter-Efficient Semantic Conditioning for Promptable Segmentation
cs.CV 2026-05 unverdicted novelty 5.0

CLIP-Guided SAM injects CLIP-derived features into SAM via lightweight adapters for semantic conditioning, supporting text and spatial prompts while remaining parameter-efficient and achieving competitive performance.
Towards Fair and Robust Volumetric CT Classification via KL-Regularised Group Distributionally Robust Optimisation
cs.CV 2026-03 unverdicted novelty 4.0

KL-regularised Group DRO improves F1 scores for multi-site COVID-19 CT classification and gender-fair four-class lung pathology recognition over prior challenge baselines.