pith. machine review for the scientific record. sign in

arxiv: 2604.04133 · v1 · submitted 2026-04-05 · 💻 cs.CV · cs.AI

Recognition: no theorem link

Learning Robust Visual Features in Computed Tomography Enables Efficient Transfer Learning for Clinical Tasks

Rub\'en Moreno-Aguado , Alba Magall\'on , Victor Moreno , Yingying Fang , Guang Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-13 16:56 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords computed tomographyfoundation modelself-distillationDINOtransfer learningfrozen featuresclinical tasksreport generation
0
0 comments X

The pith

A self-distilled CT foundation model learns visual features that transfer efficiently to clinical tasks and outperform language-supervised models without any text supervision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces VoxelFM, a 3D CT model trained solely with self-distillation on image data using the DINO framework. It demonstrates that these learned features, kept frozen, enable lightweight probes to match or surpass four existing CT foundation models across seven clinical task categories including classification, segmentation, and report generation. This approach is significant because it shows that language supervision is not required for effective visual representations in CT and allows efficient transfer with minimal labeled data and no backbone fine-tuning. Such models lower the computational barriers for developing AI tools in radiology.

Core claim

VoxelFM is a 3D CT foundation model trained with self-distillation using the DINO framework to learn semantically rich features without language supervision. Evaluated using frozen backbone representations with lightweight probes on seven categories of clinically relevant downstream tasks—classification, regression, survival analysis, instance retrieval, localisation, segmentation, and report generation—VoxelFM matched or outperformed four existing CT foundation models. Notably, it surpassed models trained with language-alignment objectives, including on report generation.

What carries the argument

VoxelFM, the 3D vision model trained via DINO self-distillation on CT volumes to extract robust visual features for lightweight downstream probes.

Load-bearing premise

The seven task categories, chosen datasets, and lightweight probe evaluations are representative of clinical performance and allow fair model comparisons without backbone fine-tuning.

What would settle it

Observing a dataset or task where a language-supervised CT model with lightweight probes significantly outperforms VoxelFM would falsify the superiority claim.

Figures

Figures reproduced from arXiv: 2604.04133 by Alba Magall\'on, Guang Yang, Rub\'en Moreno-Aguado, Victor Moreno, Yingying Fang.

Figure 1
Figure 1. Figure 1: Overview of the evaluation protocol. All evaluations use pre-computed embeddings from a pre-trained encoder backbone. The first common step is (A): embeddings are extracted by the backbone and cached for downstream use. (B) Classification and regression: two methods are available depending on the token type used. For the class token, a two-layer MLP is applied, and for patch tokens, a Q-Former with cross-a… view at source ↗
Figure 2
Figure 2. Figure 2: Bar chart summary of model performance across six tasks. Results are shown for classification (AUROC), regression (MAE ↓), survival analysis (AUROC), localisation (MAE ↓), segmentation (DICE), and retrieval (Recall@10), with 95% confidence intervals displayed for each model. TotalSegmentator segmentation is reported as micro-averaged DICE. CT-RATE and Merlin classification results are macro-averaged over a… view at source ↗
Figure 3
Figure 3. Figure 3: Per-abnormality breakdown of classification and report generation performance across 18 findings. (Left) Binary classification F1 scores (threshold = 0.5) for each of the 18 abnormalities in the CT-RATE dataset, where an individual Q-Former probe is trained per label as described in [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Effect of downstream dataset size. Q-Former probes are trained on various fractions of labelled training data (20%–100%) and evaluated on a fixed held-out test set. (Left) iCTCF-Covid. (Right) RSNA-STR. Error bars represent 95% confidence intervals [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of inference strategies and feature aggregation methods. (Left) MLP applied to the class token versus a single-layer Q-Former applied to patch tokens for classification tasks. CT-RATE and Merlin results are macro-averaged over their respective abnormality labels. Error bars represent 95% confidence intervals. (Right) Chunked 2.5D versus full 3D inference across classification tasks. 3. Discussio… view at source ↗
Figure 6
Figure 6. Figure 6: Overview of the VoxelFM pre-training framework. Student and teacher networks share identical ViT-based architectures with class and patch token heads. The teacher network is updated via an exponential moving average of the student parameters. For each CT volume, two global and eight local crops are generated. Only the student processes local crops, and random masking is applied to its global crops. Trainin… view at source ↗
read the original abstract

There is substantial interest in developing artificial intelligence systems to support radiologists across tasks ranging from segmentation to report generation. Existing computed tomography (CT) foundation models have largely focused on building generalist vision-language systems capable of tasks such as question answering and report generation. However, training reliable vision-language systems requires paired image-text data at a scale that remains unavailable in CT. Moreover, adapting the underlying visual representations to downstream tasks typically requires partial or full backbone fine-tuning, a computationally demanding process inaccessible to many research groups. Instead, foundation models should prioritise learning robust visual representations that enable efficient transfer to new tasks with minimal labelled data and without backbone fine-tuning. We present VoxelFM, a 3D CT foundation model trained with self-distillation using the DINO framework, which learns semantically rich features without language supervision. We evaluated VoxelFM across seven categories of clinically relevant downstream tasks using frozen backbone representations with lightweight probes: classification, regression, survival analysis, instance retrieval, localisation, segmentation, and report generation. VoxelFM matched or outperformed four existing CT foundation models across all task categories. Despite receiving no language supervision during pre-training, VoxelFM surpassed models explicitly trained with language-alignment objectives, including on report generation. Our results indicate that current CT foundation models perform significantly better as feature extractors for lightweight probes rather than as vision encoders for vision-language models. Model weights and training code are publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces VoxelFM, a 3D CT foundation model pretrained via DINO self-distillation without language supervision. It claims that frozen VoxelFM features plus lightweight probes match or outperform four existing CT foundation models across seven clinical task categories (classification, regression, survival analysis, instance retrieval, localisation, segmentation, and report generation), including surpassing language-aligned models on report generation. The authors conclude that current CT foundation models function better as feature extractors than as vision encoders for vision-language systems and release model weights and training code publicly.

Significance. If the performance claims are substantiated with statistical controls, the work would be significant for medical imaging AI. It provides evidence that purely visual self-supervised pretraining can yield robust, transferable features competitive with vision-language models across diverse tasks while enabling efficient adaptation without backbone fine-tuning. The public code and weight release supports reproducibility and is a clear strength.

major comments (1)
  1. [Experimental results / evaluation across task categories] The central claim that VoxelFM 'matched or outperformed' the four baselines across all seven task categories rests on single-run point estimates (AUC, Dice, BLEU, etc.) without standard deviations across random seeds, confidence intervals, or hypothesis testing. This is load-bearing for the assertion that VoxelFM surpasses language-supervised models on report generation, as probe training variance and dataset shifts can produce differences of the observed magnitude.
minor comments (2)
  1. [Abstract and Methods] The abstract and methods sections provide insufficient detail on the exact datasets, patient cohorts, preprocessing, and metric definitions used for each of the seven task categories, making it difficult to assess representativeness and potential confounds.
  2. [Evaluation protocol] Clarify the hyperparameter settings and training protocol for the lightweight probes (e.g., whether grid search or fixed defaults were used) and report any ablation on probe architecture choices.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the major comment on the statistical evaluation of our results below.

read point-by-point responses
  1. Referee: [Experimental results / evaluation across task categories] The central claim that VoxelFM 'matched or outperformed' the four baselines across all seven task categories rests on single-run point estimates (AUC, Dice, BLEU, etc.) without standard deviations across random seeds, confidence intervals, or hypothesis testing. This is load-bearing for the assertion that VoxelFM surpasses language-supervised models on report generation, as probe training variance and dataset shifts can produce differences of the observed magnitude.

    Authors: We acknowledge the validity of this concern. Our initial experiments reported single-run results due to the substantial computational resources required for 3D CT pretraining and downstream evaluations. To strengthen the manuscript, we will perform additional runs using different random seeds for the probe training across the task categories, particularly emphasizing the report generation task. We will report mean values along with standard deviations, include confidence intervals, and apply statistical hypothesis testing to confirm the significance of performance differences. These updates will be reflected in the revised version of the paper. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical comparisons

full rationale

The manuscript describes an empirical pipeline: VoxelFM is pretrained via the standard DINO self-distillation objective on unlabeled CT volumes, after which frozen backbone features are fed to lightweight task-specific probes and compared against four external CT foundation models on seven clinical task categories. No derivation chain, equations, or first-principles results are presented that reduce by construction to fitted inputs, self-definitions, or self-citation load-bearing premises. Performance claims are grounded in direct metric comparisons (AUC, Dice, BLEU, etc.) to independently trained baselines rather than any renaming of known results or smuggling of ansatzes via prior self-work. The evaluation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that DINO-style self-distillation generalizes effectively from 2D natural images to 3D CT volumes, with standard training hyperparameters adapted to the medical domain.

free parameters (1)
  • DINO training hyperparameters
    Parameters such as temperature, momentum coefficient, and augmentation strengths chosen or tuned for 3D CT pretraining.
axioms (1)
  • domain assumption Self-distillation with DINO produces semantically rich features suitable for downstream clinical tasks in 3D CT
    Extended from prior success in natural image domains without independent verification for CT in the abstract.

pith-pipeline@v0.9.0 · 5566 in / 1413 out tokens · 55882 ms · 2026-05-13T16:56:12.857831+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 11 internal anchors

  1. [1]

    Computed Tomography: Revolutionizing the Practice of Medicine for 40 Years

    Geoffrey D. Rubin. “Computed Tomography: Revolutionizing the Practice of Medicine for 40 Years”. In: Radiology273.2S (Nov. 2014), S45–S74.issn: 0033-8419.doi:10.1148/radiol.14141356

  2. [2]

    Workload for radiologists during on-call hours: dramatic increase in the past 15 years

    R. J. M. Bruls and R. M. Kwee. “Workload for radiologists during on-call hours: dramatic increase in the past 15 years”. en. In:Insights into Imaging11.1 (Nov. 2020), p. 121.issn: 1869-4101.doi: 10.1186/s13244-020- 00925-z

  3. [3]

    Effect of Shift, Schedule, and Volume on Interpretive Accuracy: A Retrospective Analysis of 2.9 Million Radiologic Examinations

    Tarek N. Hanna, Christine Lamoureux, Elizabeth A. Krupinski, Scott Weber, and Jamlik-Omari Johnson. “Effect of Shift, Schedule, and Volume on Interpretive Accuracy: A Retrospective Analysis of 2.9 Million Radiologic Examinations”. In:Radiology287.1 (Apr. 2018), pp. 205–212.issn: 0033-8419.doi: 10 . 1148 / radiol . 2017170555

  4. [4]

    Mandating Limits on Workload, Duty, and Speed in Radiology

    Robert Alexander, Stephen Waite, Michael A. Bruno, Elizabeth A. Krupinski, Leonard Berlin, Stephen Macknik, and Susana Martinez-Conde. “Mandating Limits on Workload, Duty, and Speed in Radiology”. In:Radiology 304.2 (Aug. 2022), pp. 274–282.issn: 0033-8419.doi:10.1148/radiol.212631

  5. [5]

    TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images

    Jakob Wasserthal et al. “TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images”. In: Radiology: Artificial Intelligence5.5 (Sept. 2023), e230024.doi:10.1148/ryai.230024

  6. [6]

    Vision-language foundation models for medical imaging: a review of current practices and innovations

    Ji Seung Ryu, Hyunyoung Kang, Yuseong Chu, and Sejung Yang. “Vision-language foundation models for medical imaging: a review of current practices and innovations”. en. In:Biomedical Engineering Letters15.5 (Sept. 2025), pp. 809–830.issn: 2093-985X.doi:10.1007/s13534-025-00484-6

  7. [7]

    The role of artificial intelligence-based foundation models and “copilots

    Cillian H. Cheng and Chi Chun Wong. “The role of artificial intelligence-based foundation models and “copilots” in cancer pathology: potential and challenges”. In:Journal of Experimental & Clinical Cancer Research : CR45 (Nov. 2025), p. 2.issn: 0392-9078.doi:10.1186/s13046-025-03592-4

  8. [8]

    arXiv preprint arXiv:2501.09001 (2025)

    Suraj Pai, Ibrahim Hadzic, Dennis Bontempi, Keno Bressem, Benjamin H. Kann, Andriy Fedorov, Raymond H. Mak, and Hugo J. W. L. Aerts.Vision Foundation Models for Computed Tomography. Feb. 2025.doi: 10.48550/arXiv.2501.09001

  9. [9]

    Merlin: a computed tomography vision–language foundation model and dataset

    Louis Blankemeier et al. “Merlin: a computed tomography vision–language foundation model and dataset”. en. In:Nature(Mar. 2026), pp. 1–11.issn: 1476-4687.doi:10.1038/s41586-026-10181-8

  10. [10]

    M3d:Ad- vancing 3d medical image analysis with multi-modal large language models

    Fan Bai, Yuxin Du, Tiejun Huang, Max Q.-H. Meng, and Bo Zhao.M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models. Mar. 2024.doi:10.48550/arXiv.2404.00578

  11. [11]

    Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data

    Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Hui Hui, Yanfeng Wang, and Weidi Xie. “Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data”. en. In:Nature Communications16.1 (Aug. 2025), p. 7866.issn: 2041-1723.doi:10.1038/s41467-025-62385-7

  12. [12]

    Ibrahim Ethem Hamamci et al.Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography. Oct. 2024.doi:10.48550/arXiv.2403.17834

  13. [13]

    MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis

    Jiaxin Zhuang, Linshan Wu, Qiong Wang, Peng Fei, Varut Vardhanabhuti, Lin Luo, and Hao Chen. “MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis”. In:IEEE Transactions on Medical Imaging44.9 (Sept. 2025), pp. 3727–3740.issn: 1558-254X.doi:10.1109/TMI.2025.3564382

  14. [14]

    Weiyun Wang et al.InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency. Aug. 2025.doi:10.48550/arXiv.2508.18265

  15. [15]

    Shuai Bai et al.Qwen3-VL Technical Report. Nov. 2025.doi:10.48550/arXiv.2511.21631. [16][2103.00020] Learning Transferable Visual Models From Natural Language Supervision

  16. [16]

    Michael Tschannen et al.SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Under- standing, Localization, and Dense Features. Feb. 2025.doi:10.48550/arXiv.2502.14786. 17

  17. [17]

    Daniel Bolya et al.Perception Encoder: The best visual embeddings are not at the output of the network. Apr. 2025.doi:10.48550/arXiv.2504.13181

  18. [18]

    David Fan et al.Scaling Language-Free Visual Representation Learning. Apr. 2025.doi: 10.48550/arXiv. 2504.01017

  19. [19]

    Emerging properties in self-supervised vision transformers.arXiv preprint arXiv:2104.14294,

    Mathilde Caron, Hugo Touvron, Ishan Misra, Herv´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging Properties in Self-Supervised Vision Transformers. May 2021.doi: 10.48550/arXiv.2104.14294

  20. [20]

    Maxime Oquab et al.DINOv2: Learning Robust Visual Features without Supervision. Feb. 2024.doi: 10.48550/ arXiv.2304.07193

  21. [21]

    Oriane Sim ´eoni et al.DINOv3. Aug. 2025.doi:10.48550/arXiv.2508.10104

  22. [22]

    Visual Instruction Tuning

    Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. “Visual Instruction Tuning”. en. In: ()

  23. [23]

    Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

    Siddharth Karamcheti, Suraj Nair, Ashwin Balakrishna, Percy Liang, Thomas Kollar, and Dorsa Sadigh. “Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models”. en. In: ()

  24. [24]

    Retrieval-Augmented Embodied Agents

    Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, and Saining Xie. “Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs”. In:2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA: IEEE, June 2024, pp. 9568–9578.isbn: 979-8-3503-5300-6. doi:10.1109/CVPR52733.2024.00914

  25. [25]

    Data or Language Supervision: What Makes CLIP Better than DINO?

    Yiming Liu, Yuhui Zhang, Dhruba Ghosh, Ludwig Schmidt, and Serena Yeung-Levy. “Data or Language Supervision: What Makes CLIP Better than DINO?” en. In: ()

  26. [26]

    DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment

    Cijo Jose et al. “DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment”. en. In: ()

  27. [27]

    et al.NSCLC-Radiomics

    Aerts, H.J.W.L. et al.NSCLC-Radiomics. 2014.doi:10.7937/K9/TCIA.2015.PF0M9REI

  28. [28]

    Timoth´ee Darcet, Maxime Oquab, Julien Mairal, and Piotr Bojanowski.Vision Transformers Need Registers. Apr. 2024.doi:10.48550/arXiv.2309.16588

  29. [29]

    Lungren, Curtis P

    Shih-Cheng Huang, Zepeng Huo, Ethan Steinberg, Chia-Chun Chiang, Matthew P. Lungren, Curtis P. Langlotz, Serena Yeung, Nigam H. Shah, and Jason A. Fries.INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis. Nov. 2023.doi:10.48550/arXiv.2311.10798

  30. [30]

    2013.doi: 10.7937/TCIA.HMQ8-J677

    National Lung Screening Trial Research Team.Data from the National Lung Screening Trial (NLST). 2013.doi: 10.7937/TCIA.HMQ8-J677

  31. [32]

    2020.doi:10.7937/TCIA.2020.NNC2-0461

    Ping Li, Shuo Wang, Tang Li, Jingfeng Lu, Yunxin HuangFu, and Dongxue Wang.A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis. 2020.doi:10.7937/TCIA.2020.NNC2-0461

  32. [34]

    2021.doi:10.7937/TCIA.BBAG-2923

    Joel Saltz, Mary Saltz, Prateek Prasanna, Richard Moffitt, Janos Hajagos, Erich Bremer, Joseph Balsamo, and Tahsin Kurc.Stony Brook University COVID-19 Positive Cases. 2021.doi:10.7937/TCIA.BBAG-2923

  33. [35]

    The RSNA Abdominal Traumatic Injury CT (RATIC) Dataset

    Jeffrey D. Rudie et al. “The RSNA Abdominal Traumatic Injury CT (RATIC) Dataset”. In:Radiology: Artificial Intelligence6.6 (Nov. 2024), e240101.doi:10.1148/ryai.240101

  34. [36]

    Smith, K

    K. Smith, K. Clark, W. Bennett, T. Nolan, J. Kirby, M. Wolfsberger, J. Moulton, B. Vendt, and J. Freymann.Data From CT COLONOGRAPHY. 2015.doi:10.7937/K9/TCIA.2015.NWTESAY1

  35. [37]

    AbdomenCT-1K: Is Abdominal Organ Segmentation a Solved Problem?

    Jun Ma et al. “AbdomenCT-1K: Is Abdominal Organ Segmentation a Solved Problem?” In:IEEE Transactions on Pattern Analysis and Machine Intelligence44.10 (Oct. 2022), pp. 6695–6714.issn: 1939-3539.doi: 10.1109/TPAMI.2021.3100536

  36. [38]

    Phase recognition in contrast-enhanced CT scans based on deep learning and random sampling

    Binh T. Dao, Thang V. Nguyen, Hieu H. Pham, and Ha Q. Nguyen. “Phase recognition in contrast-enhanced CT scans based on deep learning and random sampling”. en. In:Medical Physics49.7 (2022), pp. 4518–4528.issn: 2473-4209.doi:10.1002/mp.15551

  37. [39]

    Moawad et al.Multimodality annotated HCC cases with and without advanced imaging segmentation

    Ahmed W. Moawad et al.Multimodality annotated HCC cases with and without advanced imaging segmentation. 2021.doi:10.7937/TCIA.5FNA-0924

  38. [40]

    Oguz Akin et al.The Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma Collection (TCGA-KIRC). 2016. doi:10.7937/K9/TCIA.2016.V6PBVTDR

  39. [41]

    Tong and M

    T. Tong and M. Li.Abdominal or pelvic enhanced CT images within 10 days before surgery of 230 patients with stage II colorectal cancer (StageII-Colorectal-CT). 2022.doi:10.7937/p5k5-tg43

  40. [42]

    Shanah Kirk et al.The Cancer Genome Atlas Urothelial Bladder Carcinoma Collection (TCGA-BLCA). 2016. doi:10.7937/K9/TCIA.2016.8LNG8XDR. 18

  41. [43]

    Bradley J. Erickson, Shanah Kirk, Yueh Lee, Oliver Bathe, Melissa Kearns, Cindy Gerdes, Kimberly Rieger-Christ, and John Lemmerman.The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection (TCGA-LIHC). 2016.doi:10.7937/K9/TCIA.2016.IMMQW8UQ

  42. [44]

    2019.doi: 10.7937/TCIA

    Nicholas Heller et al.C4KC KiTS Challenge Kidney Tumor Segmentation Dataset. 2019.doi: 10.7937/TCIA. 2019.IX49E8NX

  43. [45]

    Lucchesi and Nat´alia D

    Fabiano R. Lucchesi and Nat´alia D. Aredes.The Cancer Genome Atlas Stomach Adenocarcinoma Collection (TCGA-STAD). 2016.doi:10.7937/K9/TCIA.2016.GDHL9KIM

  44. [46]

    Erickson, David Mutch, Lynne Lippmann, and Rose Jarosz.The Cancer Genome Atlas Uterine Corpus Endometrial Carcinoma Collection (TCGA-UCEC)

    Bradley J. Erickson, David Mutch, Lynne Lippmann, and Rose Jarosz.The Cancer Genome Atlas Uterine Corpus Endometrial Carcinoma Collection (TCGA-UCEC). 2016.doi:10.7937/K9/TCIA.2016.GKJ0ZWAC

  45. [47]

    2018.doi: 10

    National Cancer Institute Clinical Proteomic Tumor Analysis Consortium (CPTAC).The Clinical Proteomic Tumor Analysis Consortium Clear Cell Renal Cell Carcinoma Collection (CPTAC-CCRCC). 2018.doi: 10. 7937/K9/TCIA.2018.OBLAMN27

  46. [48]

    RADCURE: An open-source head and neck cancer CT dataset for clinical radiation therapy insights

    Mattea L. Welch et al. “RADCURE: An open-source head and neck cancer CT dataset for clinical radiation therapy insights”. eng. In:Medical Physics51.4 (Apr. 2024), pp. 3101–3109.issn: 2473-4209.doi: 10.1002/mp.16972

  47. [49]

    2020.doi:10.7937/k9/tcia.2020.a8sh-7363

    A Grossberg et al.HNSCC Version 4. 2020.doi:10.7937/k9/tcia.2020.a8sh-7363

  48. [50]

    Zuley et al.The Cancer Genome Atlas Head-Neck Squamous Cell Carcinoma Collection (TCGA- HNSC)

    Margarita L. Zuley et al.The Cancer Genome Atlas Head-Neck Squamous Cell Carcinoma Collection (TCGA- HNSC). 2016.doi:10.7937/K9/TCIA.2016.LXKQ47MS

  49. [51]

    2018.doi:10.7937/k9/tcia.2018.uw45nh81

    The Clinical Proteomic Tumor Analysis Consortium Head and Neck Squamous Cell Carcinoma Collection (CPTAC-HNSCC) (Version 19). 2018.doi:10.7937/k9/tcia.2018.uw45nh81

  50. [52]

    Kinahan, M

    P. Kinahan, M. Muzi, B. Bialecki, and L. Coombs.Data from the ACRIN 6685 Trial HNSCC-FDG-PET/CT.doi: 10.7937/K9/TCIA.2016.JQEJZZNG

  51. [53]

    2015.doi:10.7937/K9/TCIA.2015.K0F5CGLI

    Reinhard R Beichel et al.Data From QIN-HEADNECK. 2015.doi:10.7937/K9/TCIA.2015.K0F5CGLI

  52. [54]

    2017.doi: 10.7937/K9/ TCIA.2017.8OJE5Q00

    Martin Valli`eres, Emily Kay-Rivest, L ´eo Perrin, Xavier Liem, Christophe Furstoss, Nader Khaouam, Phuc Nguyen-Tan, Chang-Shu Wang, and Khalil Sultanem.Data from Head-Neck-PET-CT. 2017.doi: 10.7937/K9/ TCIA.2017.8OJE5Q00

  53. [55]

    2021.doi:10.7937/TCIA.T905-ZQ20

    Nadya Shusharina and Thomas Bortfeld.Glioma Image Segmentation for Radiotherapy: RT targets, barriers to cancer spread, and organs at risk (GLIS-RT). 2021.doi:10.7937/TCIA.T905-ZQ20

  54. [56]

    2024.doi:10.7937/AHQH-XC79

    Jacob Buatti, Christopher Kabat, Ruiqi Li, Sruthi Sivabhaskar, Michelle de Oliveira, Nikos Papanikolaou, Sotirios Stathakis, Nikos Paragios, and Neil Kirby.CT-RTSTRUCT-RTDOSE-RTPLAN Sets of Head and Neck Cancers Treated with Identical Prescriptions using IMRT: An Open Dataset for Deep Learning in Treatment Planning. 2024.doi:10.7937/AHQH-XC79

  55. [57]

    Bosch, William L

    Walter R. Bosch, William L. Straube, John W. Matthews, and James A. Purdy.Head-Neck Cetuximab. 2015.doi: 10.7937/K9/TCIA.2015.7AKGJUPZ

  56. [58]

    Zolotova et al.Burdenko’s Glioblastoma Progression Dataset (Burdenko-GBM-Progression)

    Svetlana V. Zolotova et al.Burdenko’s Glioblastoma Progression Dataset (Burdenko-GBM-Progression). 2023. doi:10.7937/E1QP-D183

  57. [59]

    Wee and A

    L. Wee and A. Dekker.Data from HEAD-NECK-RADIOMICS-HN1. 2019.doi: 10 . 7937 / tcia . 2019 . 8kap372n

  58. [60]

    The RSNA Pulmonary Embolism CT Dataset

    Errol Colak et al. “The RSNA Pulmonary Embolism CT Dataset”. In:Radiology: Artificial Intelligence3.2 (Mar. 2021), e200254.doi:10.1148/ryai.2021200254

  59. [61]

    Open resource of clinical data from patients with pneumonia for the prediction of COVID-19 outcomes via deep learning

    Wanshan Ning et al. “Open resource of clinical data from patients with pneumonia for the prediction of COVID-19 outcomes via deep learning”. en. In:Nature Biomedical Engineering4.12 (Nov. 2020), pp. 1197–1207.issn: 2157-846X.doi:10.1038/s41551-020-00633-5

  60. [62]

    2025.doi: 10.34740/KAGGLE/DS/6248246

    Zhilin Han, Yuyang Zhang, Wenlong Ding, and Zhiheng Xing.Mycobacterial CT Imaging Dataset. 2025.doi: 10.34740/KAGGLE/DS/6248246

  61. [63]

    Armato III et al.Data From LIDC-IDRI

    Samuel G. Armato III et al.Data From LIDC-IDRI. 2015.doi:10.7937/K9/TCIA.2015.LO9QL9SX

  62. [64]

    Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: The LUNA16 challenge

    Arnaud Arindra Adiyoso Setio et al. “Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: The LUNA16 challenge”. In:Medical Image Analysis42 (Dec. 2017), pp. 1–13.issn: 1361-8415.doi:10.1016/j.media.2017.06.015

  63. [65]

    Ahmed Shahin, Carmela Wegworth, David, Elizabeth Estes, Julia Elliott, Justin Zita, Simon Walsh, Slepetys, and Will Cukierski.OSIC Pulmonary Fibrosis Progression. 2020

  64. [66]

    et al.Mediastinal Lymph Node Quantification (LNQ)

    Idris, T. et al.Mediastinal Lymph Node Quantification (LNQ). 2024.doi:10.7937/QVAZ-JA09

  65. [67]

    A Custom Annotated Dataset for Segmentation of Pulmonary Veins, Arteries, and Airways

    Jian Liu, Zheng Zhang, Bing Niu, Shuai Kang, Juan Ren, Lei Wang, and Kai Xu. “A Custom Annotated Dataset for Segmentation of Pulmonary Veins, Arteries, and Airways”. en. In:Scientific Data12.1 (Nov. 2025), p. 1806. issn: 2052-4463.doi:10.1038/s41597-025-06074-6. 19

  66. [68]

    LoRA: Low-Rank Adaptation of Large Language Models

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen.LoRA: Low-Rank Adaptation of Large Language Models. Oct. 2021.doi: 10.48550/arXiv.2106.09685

  67. [69]

    Qwen3 Technical Report

    An Yang et al.Qwen3 Technical Report. May 2025.doi:10.48550/arXiv.2505.09388

  68. [70]

    OpenAI et al.gpt-oss-120b & gpt-oss-20b Model Card. Aug. 2025.doi:10.48550/arXiv.2508.10925

  69. [71]

    The meaning and use of the area under a receiver operating characteristic (ROC) curve

    J A Hanley and B J McNeil. “The meaning and use of the area under a receiver operating characteristic (ROC) curve.” en. In:Radiology143.1 (Apr. 1982), pp. 29–36.issn: 0033-8419, 1527-1315.doi: 10.1148/radiology. 143.1.7063747. 20