arxiv: 2604.15555 · v1 · submitted 2026-04-16 · 💻 cs.CV

Recognition: unknown

CXR-LT 2026 Challenge: Multi-Center Long-Tailed and Zero Shot Chest X-ray Classification

Hexin Dong , Yi Lin , Pengyu Zhou , Fengnian Zhao , Alan Clint Legasto , Juno Cho , Dohui Kim , Justin Namuk Kim

show 20 more authors

Mingeon Kim Sunwoo Kwak Gabriel Moy\`a-Alcover Ky Trung Nguyen Thanh-Huy Nguyen Ha-Hieu Pham Huy-Hieu Pham Huy Le Pham Nikhileswara Rao Sulake Aina Tur-Serrano Ruichi Zhang Ang Zu Adam E. Flanders Zhiyong Lu Ronald M. Summers Mingquan Lin Hao Chen Yuzhe Yang George Shih Yifan Peng

Authors on Pith no claims yet

Pith reviewed 2026-05-10 10:44 UTC · model grok-4.3

classification 💻 cs.CV

keywords chest X-ray classificationlong-tailed distributionzero-shot learningmulti-center datavision-language modelsrare disease detectionopen-world generalizationradiologist annotations

0 comments

The pith

Vision-language foundation models improve chest X-ray classification on both known and unseen rare classes in a multi-center setting, though rare findings under center shifts remain challenging.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the CXR-LT 2026 challenge as a benchmark for long-tailed multi-label classification and zero-shot generalization in chest X-ray interpretation using a new multi-center dataset. Over 145,000 images from PadChest and NIH are annotated by radiologists rather than derived from reports to create a more reliable evaluation. Two tasks are defined: robust classification of 30 known classes and open-world generalization to 6 unseen rare classes. Analysis of participating teams shows that vision-language foundation models enhance performance on in-distribution and zero-shot tasks. However, the results highlight ongoing difficulties in detecting rare findings when data shifts across medical centers.

Core claim

By providing a multi-center dataset with radiologist annotations and splitting it into 30 known classes for robust multi-label classification and 6 unseen rare classes for open-world generalization, the challenge reveals that vision-language foundation models improve both in-distribution and zero-shot performance, but detecting rare findings under multi-center shift remains challenging.

What carries the argument

The two-task benchmark of robust multi-label classification on 30 known pathology classes and open-world generalization to 6 unseen rare disease classes, backed by a multi-center dataset of over 145,000 radiologist-annotated images from PadChest and NIH.

If this is right

Vision-language models provide measurable gains for both in-distribution and zero-shot tasks on known and rare chest X-ray pathologies.
Multi-center data shifts create persistent accuracy gaps specifically for rare disease classes.
Direct radiologist annotations yield a more trustworthy benchmark than report-derived labels for clinical evaluation.
AI development for chest X-ray must prioritize robustness to long-tailed distributions and novel findings across institutions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Models that succeed here are more likely to handle the variability seen in actual hospital networks with different scanners and populations.
The benchmark could be extended with temporal sequences or other modalities to probe generalization further.
Techniques focused on domain adaptation or targeted augmentation for rare classes may be needed to close the remaining multi-center gaps.

Load-bearing premise

Radiologist annotations on the combined multi-center dataset create a substantially more reliable and clinically relevant evaluation than labels extracted from radiology reports, and the 30 known plus 6 unseen class split with center divisions adequately represents real-world long-tailed open-world conditions.

What would settle it

If a model achieves high accuracy on the 6 unseen rare classes across all centers without notable performance drop compared to single-center tests, or if vision-language models show no advantage over prior methods on this data, that would test whether the multi-center shift challenge for rare findings is fundamental.

Figures

Figures reproduced from arXiv: 2604.15555 by Adam E. Flanders, Aina Tur-Serrano, Alan Clint Legasto, Ang Zu, Dohui Kim, Fengnian Zhao, Gabriel Moy\`a-Alcover, George Shih, Ha-Hieu Pham, Hao Chen, Hexin Dong, Huy-Hieu Pham, Huy Le Pham, Juno Cho, Justin Namuk Kim, Ky Trung Nguyen, Mingeon Kim, Mingquan Lin, Nikhileswara Rao Sulake, Pengyu Zhou, Ronald M. Summers, Ruichi Zhang, Sunwoo Kwak, Thanh-Huy Nguyen, Yifan Peng, Yi Lin, Yuzhe Yang, Zhiyong Lu.

**Figure 2.** Figure 2: Validation progress over time for Task 1 and Task 2. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Radar plots summarizing the main results of CXR-LT 2026 for the two challenge tasks. [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Robustness analysis under test-time perturbations. Each panel compares model performance under [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Evaluation on held-out subsets for Task 1 and Task 2. Each panel compares model performance [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Generalization analysis across disease frequency and clinical centers. a) Head-to-tail performance [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

read the original abstract

Chest X-ray (CXR) interpretation is hindered by the long-tailed distribution of pathologies and the open-world nature of clinical environments. Existing benchmarks often rely on closed-set classes from a single institution, failing to capture the prevalence of rare diseases or the appearance of novel findings. To address this, we present the CXR-LT challenge. The first event, CXR-LT 2023, established a large-scale benchmark for long-tailed multi-label CXR classification and identified key challenges in rare disease recognition. CXR-LT 2024 further expanded the label space and introduced a zero-shot task to study generalization to unseen findings. Building on the success of CXR-LT 2023 and 2024, this third iteration of the benchmark introduces a multi-center dataset comprising over 145,000 images from PadChest and NIH Chest X-ray datasets. Additionally, all development and test sets in CXR-LT 2026 are annotated by radiologists, providing a more reliable and clinically grounded evaluation than report-derived labels. The challenge defines two core tasks this year: (1) Robust Multi-Label Classification on 30 known classes and (2) Open-World Generalization to 6 unseen (out-of-distribution) rare disease classes. This paper summarizes the overview of the CXR-LT 2026 challenge. We describe the data collection and annotation procedures, analyze solution strategies adopted by participating teams, and evaluate head-versus-tail performance, calibration, and cross-center generalization gaps. Our results show that vision-language foundation models improve both in-distribution and zero-shot performance, but detecting rare findings under multi-center shift remains challenging. Our study provides a foundation for developing and evaluating AI systems in realistic long-tailed and open-world clinical conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This challenge paper sets up a useful multi-center CXR benchmark with radiologist annotations for long-tailed and zero-shot tasks, but offers thin evidence that the new labels are actually more reliable than prior report-derived ones.

read the letter

The main point for you is that CXR-LT 2026 adds a multi-center collection of 145k images from PadChest and NIH, switches to direct radiologist annotations across development and test sets, and defines two tasks: multi-label classification on 30 known classes plus zero-shot on 6 unseen rare classes. It summarizes participant approaches and notes that vision-language models lift both in-distribution and zero-shot results while rare findings under center shift stay difficult. That high-level picture matches what people in medical imaging already suspect about current limits.

Referee Report

2 major / 2 minor

Summary. The paper introduces the CXR-LT 2026 challenge, which provides a multi-center dataset of over 145,000 chest X-ray images from PadChest and NIH Chest X-ray, with all labels provided by radiologist annotations rather than report-derived NLP labels. It defines two tasks: (1) robust multi-label classification over 30 known classes and (2) open-world generalization to 6 unseen rare disease classes. The manuscript describes the data collection and annotation process, summarizes strategies from participating teams, and analyzes performance on head-versus-tail classes, calibration, and cross-center generalization gaps, concluding that vision-language foundation models improve both in-distribution and zero-shot performance while rare findings under multi-center shift remain challenging.

Significance. If the radiologist-annotated multi-center benchmark holds up under scrutiny, this work offers a valuable advance over prior single-center or report-derived CXR benchmarks by explicitly targeting long-tailed distributions and open-world generalization. The emphasis on cross-center shift and zero-shot rare classes, combined with analysis of VL model strengths and persistent tail-class failures, can usefully guide development of clinically deployable systems. The challenge format itself, with public participant outcomes, adds reproducibility value.

major comments (2)

[Data Collection and Annotation Procedures] Data Collection and Annotation Procedures section: the claim that 'all development and test sets in CXR-LT 2026 are annotated by radiologists, providing a more reliable and clinically grounded evaluation than report-derived labels' is load-bearing for the central claim that observed performance gaps reflect model capability rather than benchmark artifacts. No supporting quantitative evidence is referenced, such as inter-rater agreement (Cohen's or Fleiss' kappa), number of annotators per image, adjudication protocol, or direct comparison against report labels on overlapping cases. Without this, the superiority of the new labels over prior work cannot be established.
[Results and participant analysis] Results and participant analysis (abstract and evaluation sections): the summary states that 'vision-language foundation models improve both in-distribution and zero-shot performance' and that 'detecting rare findings under multi-center shift remains challenging,' yet provides no specific metrics, confidence intervals, statistical tests, or aggregation details for participant submissions. This leaves the magnitude and robustness of the reported improvements difficult to assess and weakens the evidential basis for the conclusions.

minor comments (2)

[Abstract] Abstract: the exact total image count and the split between PadChest and NIH sources are given only approximately ('over 145,000'); providing the precise numbers and per-center breakdowns would improve clarity.
[Challenge definition] The 30+6 class split and center-based train/test division are presented as representative of real-world long-tailed open-world conditions, but no supporting prevalence statistics or comparison to clinical distributions are supplied; a brief justification or reference would help.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which helps us improve the clarity and rigor of the manuscript. We address each major comment point by point below and have incorporated revisions to strengthen the evidential basis where possible.

read point-by-point responses

Referee: Data Collection and Annotation Procedures section: the claim that 'all development and test sets in CXR-LT 2026 are annotated by radiologists, providing a more reliable and clinically grounded evaluation than report-derived labels' is load-bearing for the central claim that observed performance gaps reflect model capability rather than benchmark artifacts. No supporting quantitative evidence is referenced, such as inter-rater agreement (Cohen's or Fleiss' kappa), number of annotators per image, adjudication protocol, or direct comparison against report labels on overlapping cases. Without this, the superiority of the new labels over prior work cannot be established.

Authors: We agree that quantitative annotation quality metrics would provide stronger support for the reliability claim. The manuscript describes the radiologist annotation process for the multi-center dataset but does not include inter-rater statistics or direct comparisons. In the revision, we will expand the Data Collection and Annotation Procedures section to detail the number of board-certified radiologists involved, the standardized annotation protocol (including adjudication for disagreements), and any available agreement metrics from the process. We will also add a quantitative comparison of radiologist labels versus report-derived NLP labels on overlapping cases from the source datasets to demonstrate reduced noise. If full kappa values across all 145k images are not feasible due to scale, we will explicitly note this and reference supporting literature on the known error rates of report-derived labels. revision: yes
Referee: Results and participant analysis (abstract and evaluation sections): the summary states that 'vision-language foundation models improve both in-distribution and zero-shot performance' and that 'detecting rare findings under multi-center shift remains challenging,' yet provides no specific metrics, confidence intervals, statistical tests, or aggregation details for participant submissions. This leaves the magnitude and robustness of the reported improvements difficult to assess and weakens the evidential basis for the conclusions.

Authors: We concur that the absence of specific quantitative details weakens the conclusions. The current manuscript offers a high-level summary of participant strategies and qualitative trends from the challenge. In the revised version, we will expand the Results and participant analysis section (and update the abstract accordingly) to report concrete metrics, including mean AUC and F1 scores for vision-language models versus other approaches on the 30-class in-distribution task, zero-shot performance on the 6 unseen rare classes, head-versus-tail breakdowns, calibration errors, and cross-center gaps. We will include 95% confidence intervals, details on aggregation across submissions, and any statistical tests performed to support claims of improvement and persistent challenges. revision: yes

Circularity Check

0 steps flagged

No circularity: descriptive challenge overview with no derivations or self-referential reductions

full rationale

The paper is a benchmark challenge description that defines tasks, reports data collection/annotation procedures, and summarizes participant-submitted results on in-distribution and zero-shot performance. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided text or abstract. Claims about radiologist annotations providing more reliable labels than report-derived ones are unsupported assertions (a potential correctness gap), but they do not reduce any result to the inputs by construction, nor do they rely on self-citation chains, uniqueness theorems, or ansatzes. Prior CXR-LT iterations are referenced only for context, not as load-bearing justification for the current claims. The central statements about VL models improving performance are empirical summaries of external submissions, not internally forced quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a benchmark overview paper containing no mathematical derivations, fitted parameters, or postulated entities.

pith-pipeline@v0.9.0 · 5743 in / 1125 out tokens · 36239 ms · 2026-05-10T10:44:06.042454+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 7 canonical work pages

[1]

S Kevin Zhou, Hayit Greenspan, Christos Davatzikos, James S Duncan, Bram Van Ginneken, Anant Madabhushi, Jerry L Prince, Daniel Rueckert, and Ronald M Summers. A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises.Proceedings of the IEEE, 109(5):820–838, 2021

2021
[2]

Long-tailed classification of thorax diseases on chest x-ray: A new benchmark study

Gregory Holste, Song Wang, Ziyu Jiang, Thomas C Shen, George Shih, Ronald M Summers, Yifan Peng, and Zhangyang Wang. Long-tailed classification of thorax diseases on chest x-ray: A new benchmark study. InMICCAI Workshop on Data Augmentation, Labelling, and Imperfections, pages 22–32. Springer, 2022

2022
[3]

Mbnm: multi-branch network based on memory features for long-tailed medical image recognition.Computer Methods and Programs in Biomedicine, 212:106448, 2021

Ruru Zhang, E Haihong, Lifei Yuan, Jiawen He, Hongxing Zhang, Shengjuan Zhang, Yanhui Wang, Meina Song, and Lifei Wang. Mbnm: multi-branch network based on memory features for long-tailed medical image recognition.Computer Methods and Programs in Biomedicine, 212:106448, 2021

2021
[4]

Deep long-tailed learning: A survey.IEEE transactions on pattern analysis and machine intelligence, 45(9):10795–10816, 2023

Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, and Jiashi Feng. Deep long-tailed learning: A survey.IEEE transactions on pattern analysis and machine intelligence, 45(9):10795–10816, 2023

2023
[5]

Proco: Prototype-aware contrastive learning for long-tailed medical image classification

Zhixiong Yang, Junwen Pan, Yanzhan Yang, Xiaozhou Shi, Hong-Yu Zhou, Zhicheng Zhang, and Cheng Bian. Proco: Prototype-aware contrastive learning for long-tailed medical image classification. InInternational conference on medical image computing and computer-assisted intervention, pages 173–182. Springer, 2022

2022
[6]

Relational subsets knowledge distillation for long-tailed retinal diseases recognition

Lie Ju, Xin Wang, Lin Wang, Tongliang Liu, Xin Zhao, Tom Drummond, Dwarikanath Mahapatra, and Zongyuan Ge. Relational subsets knowledge distillation for long-tailed retinal diseases recognition. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 3–12. Springer, 2021

2021
[7]

Hidden stratification causes clinically meaningful failures in machine learning for medical imaging

Luke Oakden-Rayner, Jared Dunnmon, Gustavo Carneiro, and Christopher R ´e. Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. InProceedings of the ACM conference on health, inference, and learning, pages 151–159, 2020

2020
[8]

Cxr-lt: Multi-label long-tailed classification on chest x-rays.PhysioNet, 5(19):1, 2023

Gregory Holste, Song Wang, Ajay Jaiswal, Yuzhe Yang, Mingquan Lin, Yifan Peng, and Atlas Wang. Cxr-lt: Multi-label long-tailed classification on chest x-rays.PhysioNet, 5(19):1, 2023

2023
[9]

Towards long-tailed, multi-label disease classification from chest x-ray: Overview of the cxr-lt challenge.Medical Image Analysis, 97: 103224, 2024

Gregory Holste, Yiliang Zhou, Song Wang, Ajay Jaiswal, Mingquan Lin, Sherry Zhuge, Yuzhe Yang, Dongkyun Kim, Trong-Hieu Nguyen-Mau, Minh-Triet Tran, et al. Towards long-tailed, multi-label disease classification from chest x-ray: Overview of the cxr-lt challenge.Medical Image Analysis, 97: 103224, 2024

2024
[10]

Cxr-lt 2024: A miccai challenge on long-tailed, multi- label, and zero-shot disease classification from chest x-ray.arXiv preprint arXiv:2506.07984, 2025

Mingquan Lin, Gregory Holste, Song Wang, Yiliang Zhou, Yishu Wei, Imon Banerjee, Pengyi Chen, Tianjie Dai, Yuexi Du, Nicha C Dvornek, et al. Cxr-lt 2024: A miccai challenge on long-tailed, multi- label, and zero-shot disease classification from chest x-ray.arXiv preprint arXiv:2506.07984, 2025

work page arXiv 2024
[11]

Padchest: A large chest x-ray image dataset with multi-label annotated reports.Medical image analysis, 66:101797, 2020

Aurelia Bustos, Antonio Pertusa, Jose-Maria Salinas, and Maria De La Iglesia-Vaya. Padchest: A large chest x-ray image dataset with multi-label annotated reports.Medical image analysis, 66:101797, 2020. 16

2020
[12]

Nih chest x-ray dataset of 14 common thorax disease categories.NIH Clinical Center: Bethesda, MD, USA, 2019

R Summers. Nih chest x-ray dataset of 14 common thorax disease categories.NIH Clinical Center: Bethesda, MD, USA, 2019

2019
[13]

MIMIC-CXR Database.PhysioNet, September 2019

Alistair Johnson, Tom Pollard, Roger Mark, Seth Berkowitz, and Steven Horng. MIMIC-CXR Database.PhysioNet, September 2019. doi: 10.13026/C2JT1Q. URLhttps://doi.org/10. 13026/C2JT1Q. Version 2.0.0

work page doi:10.13026/c2jt1q 2019
[14]

Padchest-gr: A bilingual chest x-ray dataset for grounded radiology report generation.NEJM AI, 2(7):AIdbp2401120, 2025

Daniel Coelho de Castro, Aurelia Bustos, Shruthi Bannur, Stephanie L Hyland, Kenza Bouzid, Maria Teodora Wetscherek, Maria Dolores S ´anchez-Valverde, Lara Jaques-P ´erez, Lourdes P ´erez- Rodr´ıguez, Kenji Takeda, et al. Padchest-gr: A bilingual chest x-ray dataset for grounded radiology report generation.NEJM AI, 2(7):AIdbp2401120, 2025

2025
[15]

The pascal visual object classes (voc) challenge.International journal of computer vision, 88(2):303–338, 2010

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge.International journal of computer vision, 88(2):303–338, 2010

2010
[16]

Handling supervision scarcity in chest x-ray classification: Long-tailed and zero-shot learning.arXiv preprint arXiv:2602.13430, 2026

Ha-Hieu Pham, Hai-Dang Nguyen, Thanh-Huy Nguyen, Min Xu, Ulas Bagci, Trung-Nghia Le, and Huy-Hieu Pham. Handling supervision scarcity in chest x-ray classification: Long-tailed and zero-shot learning.arXiv preprint arXiv:2602.13430, 2026

work page arXiv 2026
[17]

Convnext v2: Co-designing and scaling convnets with masked autoencoders

Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, and Sain- ing Xie. Convnext v2: Co-designing and scaling convnets with masked autoencoders. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16133–16142, 2023

2023
[18]

A textbook remedy for domain shifts: Knowledge priors for medical image analysis

Yue Yang, Mona Gandhi, Yufei Wang, Yifan Wu, Michael Yao, Chris Callison-Burch, James Gee, and Mark Yatskar. A textbook remedy for domain shifts: Knowledge priors for medical image analysis. Advances in neural information processing systems, 37:90683–90713, 2024

2024
[19]

Reproducible scaling laws for contrastive language-image learning

Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, and Jenia Jitsev. Reproducible scaling laws for contrastive language-image learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2818–2829, 2023

2023
[20]

Cxr-lt 2026 challenge: Projection-aware multi-label and zero-shot chest x-ray classification.arXiv preprint arXiv:2604.02185, 2026

Juno Cho, Dohui Kim, Mingeon Kim, Hyunseo Jang, Chang Sun Lee, and Jong Chul Ye. Cxr-lt 2026 challenge: Projection-aware multi-label and zero-shot chest x-ray classification.arXiv preprint arXiv:2604.02185, 2026

work page arXiv 2026
[21]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021

2021
[22]

Multi-task vision transformer using low-level chest x-ray feature corpus for covid-19 diagnosis and severity quantification.Medical image analysis, 75:102299, 2022

Sangjoon Park, Gwanghyun Kim, Yujin Oh, Joon Beom Seo, Sang Min Lee, Jin Hwan Kim, Sungjun Moon, Jae-Kwang Lim, and Jong Chul Ye. Multi-task vision transformer using low-level chest x-ray feature corpus for covid-19 diagnosis and severity quantification.Medical image analysis, 75:102299, 2022

2022
[23]

Expert- level detection of pathologies from unannotated chest x-ray images via self-supervised learning.Nature biomedical engineering, 6(12):1399–1406, 2022

Ekin Tiu, Ellie Talius, Pujan Patel, Curtis P Langlotz, Andrew Y Ng, and Pranav Rajpurkar. Expert- level detection of pathologies from unannotated chest x-ray images via self-supervised learning.Nature biomedical engineering, 6(12):1399–1406, 2022

2022
[24]

Asymmetric loss for multi-label classification

Tal Ridnik, Emanuel Ben-Baruch, Nadav Zamir, Asaf Noy, Itamar Friedman, Matan Protter, and Lihi Zelnik-Manor. Asymmetric loss for multi-label classification. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 82–91, 2021. 17

2021
[25]

An efficient framework for Long-Tailed and Multi-Label classification on chest X-Rays

Nguyen Trung Ky, Huy Le Pham, Khoa Anh Ha, Thao Nguyen Thanh V o, Chau Thi Huyen Ly, Hien Ta, and Thang Van Thang. An efficient framework for Long-Tailed and Multi-Label classification on chest X-Rays. In2026 IEEE 23rd International Symposium on Biomedical Imaging (ISBI) (ISBI 2026), page 4, London, United Kingdom (Great Britain), April 2026

2026
[26]

A convnet for the 2020s

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022

2022
[27]

Query2label: A simple transformer way to multi-label clas- sification.arXiv preprint arXiv:2107.10834, 2021

Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, and Jun Zhu. Query2label: A simple transformer way to multi-label classification.arXiv preprint arXiv:2107.10834, 2021

work page arXiv 2021
[28]

A fully open ai foundation model applied to chest radiography.Nature, 643(8071):488–498, 2025

DongAo Ma, Jiaxuan Pang, Michael B Gotway, and Jianming Liang. A fully open ai foundation model applied to chest radiography.Nature, 643(8071):488–498, 2025

2025
[29]

Learning imbalanced datasets with label-distribution-aware margin loss.Advances in neural information processing systems, 32, 2019

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label-distribution-aware margin loss.Advances in neural information processing systems, 32, 2019

2019
[30]

Decoupling representa- tion and classifier for long-tailed recognition,

Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, and Yan- nis Kalantidis. Decoupling representation and classifier for long-tailed recognition.arXiv preprint arXiv:1910.09217, 2019

work page arXiv 1910
[31]

Torchxrayvision: A library of chest x-ray datasets and models

Joseph Paul Cohen, Joseph D Viviano, Paul Bertin, Paul Morrison, Parsa Torabian, Matteo Guarrera, Matthew P Lungren, Akshay Chaudhari, Rupert Brooks, Mohammad Hashir, et al. Torchxrayvision: A library of chest x-ray datasets and models. InInternational Conference on Medical Imaging with Deep Learning, pages 231–249. PMLR, 2022

2022
[32]

Multi-label contrastive learning: a comprehensive study.arXiv preprint arXiv:2412.00101, 2024

Alexandre Audibert, Aur ´elien Gauffre, and Massih-Reza Amini. Multi-label contrastive learning: a comprehensive study.arXiv preprint arXiv:2412.00101, 2024

work page arXiv 2024
[33]

Medklip: Medical knowledge enhanced language-image pre-training.Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, and Weidi Xie. Medklip: Medical knowledge enhanced language-image pre-training.Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2023
[34]

A review of generalized zero-shot learning methods.IEEE transactions on pattern analysis and machine intelligence, 45(4):4051–4070, 2022

Farhad Pourpanah, Moloud Abdar, Yuxuan Luo, Xinlei Zhou, Ran Wang, Chee Peng Lim, Xi-Zhao Wang, and QM Jonathan Wu. A review of generalized zero-shot learning methods.IEEE transactions on pattern analysis and machine intelligence, 45(4):4051–4070, 2022. 18 Extended Data Table 1: Main results with 95% bootstrap confidence intervals based on 1,000 resamplin...

2022