pith. sign in

arxiv: 2511.11030 · v6 · submitted 2025-11-14 · 💻 cs.CV · cs.AI

Algorithms Trained on Normal Chest X-rays Can Predict Health Insurance Types

Pith reviewed 2026-05-17 22:30 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords chest x-rayhealth insurancesocioeconomic statusmedical AI biasdeep learningnormal studiesfairness in AI
0
0 comments X

The pith

Deep vision models predict a patient's health insurance type from normal chest X-rays at AUC around 0.70.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

State-of-the-art models including DenseNet121, SwinV2-B, and MedMamba learn to classify health insurance status, a proxy for socioeconomic position, using only chest X-rays labeled as normal. The models reach this performance on two large public datasets even after the researchers control for age, race, and sex and even when training occurs inside one racial group alone. Occlusion experiments show the predictive information is spread across the upper and mid-chest rather than concentrated in any single anatomical feature. The result indicates that routine medical images already contain traces of how care is delivered differently across socioeconomic groups.

Core claim

Deep vision models trained on chest X-rays from normal studies can predict a patient's health insurance type with AUC around 0.70 on MIMIC-CXR-JPG and 0.68 on CheXpert. The signal survives controls for demographic variables and remains detectable within a single racial group. Patch-based occlusion localizes the information diffusely in the upper and mid-thoracic regions, consistent with subtle differences in clinical environments, equipment, or care pathways that correlate with insurance status.

What carries the argument

Patch-based occlusion analysis that identifies a diffuse signal in the upper and mid-thoracic regions after demographic controls.

If this is right

  • Medical images encode information about socioeconomic segregation through the pathways and hardware used to produce them.
  • Fairness work in medical AI must examine data collection practices in addition to balancing patient demographics.
  • Models may learn to associate insurance type with subtle imaging artifacts that arise from different care settings.
  • Diagnostic algorithms could inadvertently use these hidden signals when deployed in real clinical workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Audits of imaging hardware and site-specific protocols could reduce unintended socioeconomic leakage in future training sets.
  • The same approach might reveal parallel signals in other imaging modalities such as CT or MRI.
  • Developers could test whether explicit removal of hospital or scanner metadata during training eliminates the insurance prediction task.

Load-bearing premise

The models are picking up differences in clinical environments or equipment that happen to track insurance type rather than direct demographic features.

What would settle it

Retrain the same architectures on images acquired on identical equipment inside a single hospital for patients across all insurance types and check whether accuracy falls to chance level.

Figures

Figures reproduced from arXiv: 2511.11030 by Arash Asgari, Chi-Yu Chen, Deirdre Goode, Hassan Hamidi, Laleh Seyyed-Kalantari, Leo Anthony Celi, Ned McCague, Po-Chih Kuo, Rawan Abulibdeh, Sebasti\'an Andr\'es Cajas Ord\'o\~nez, Thomas Sounack.

Figure 1
Figure 1. Figure 1: (a) The left image provides an example of the Remove-One-Patch method. The right image shows insurance type predic￾tion AUC when only the corresponding patch area in the 3x3 grid is removed. (b) The left image provides an example of the Keep-One-Patch method. The right image shows insurance type prediction AUC when only the corresponding patch area in the 3x3 grid is retained. 3.3. Experiments on Demograph… view at source ↗
read the original abstract

Artificial intelligence is revealing what medicine never intended to encode. Deep vision models, trained on chest X-rays, can now detect not only disease but also invisible traces of social inequality. In this study, we show that state-of-the-art architectures (DenseNet121, SwinV2-B, MedMamba) can predict a patient's health insurance type, a strong proxy for socioeconomic status, from normal chest X-rays with significant accuracy (AUC around 0.70 on MIMIC-CXR-JPG, 0.68 on CheXpert). The signal was unlikely contributed by demographic features by our machine learning study combining age, race, and sex labels to predict health insurance types; it also remains detectable when the model is trained exclusively on a single racial group. Patch-based occlusion reveals that the signal is diffuse rather than localized, embedded in the upper and mid-thoracic regions. This suggests that deep networks may be internalizing subtle traces of clinical environments, equipment differences, or care pathways; learning socioeconomic segregation itself. These findings challenge the assumption that medical images are neutral biological data. By uncovering how models perceive and exploit these hidden social signatures, this work reframes fairness in medical AI: the goal is no longer only to balance datasets or adjust thresholds, but to interrogate and disentangle the social fingerprints embedded in clinical data itself.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents empirical evidence that deep neural networks (DenseNet121, SwinV2-B, MedMamba) trained on 'normal' chest X-rays from the MIMIC-CXR-JPG and CheXpert datasets can predict a patient's health insurance type—a proxy for socioeconomic status—with AUCs of approximately 0.70 and 0.68, respectively. The authors argue that this predictive capability is not primarily attributable to demographic variables (age, race, sex) based on auxiliary prediction experiments and single-race subgroup training, and interpret patch-based occlusion maps as indicating a diffuse signal in the upper and mid-thoracic regions, potentially reflecting clinical environment or care pathway differences.

Significance. If the central empirical result holds after addressing institutional confounders, the work would be significant for medical AI fairness research by showing that routine chest X-rays can encode socioeconomic information via subtle institutional cues. Credit is due for evaluating multiple modern architectures on public datasets and including occlusion-based interpretability analysis, which provides a concrete starting point for reproducibility.

major comments (3)
  1. [Abstract] Abstract: The assertion that the signal 'was unlikely contributed by demographic features' rests on an auxiliary 'machine learning study combining age, race, and sex labels' whose quantitative results (e.g., AUC of the demographic-only predictor versus the imaging model) are not reported, preventing evaluation of whether demographics are adequately controlled.
  2. [Abstract] Abstract: Both MIMIC-CXR-JPG and CheXpert are single-institution datasets; the manuscript does not control for or discuss confounding by acquisition device, department, or workflow factors that correlate with insurance type. The reported demographic controls and single-race subgroup results do not address these institutional variables, which remain a load-bearing alternative explanation for the observed AUCs.
  3. [Abstract] Abstract: No information is provided on train/test splits, class balance for insurance categories, preprocessing steps, or statistical testing (confidence intervals, p-values) for the reported AUCs of 0.70 and 0.68, limiting assessment of result robustness.
minor comments (2)
  1. [Abstract] The abstract uses 'significant accuracy' without accompanying statistical tests or effect-size context; replace with precise quantitative language.
  2. Consider adding explicit discussion of dataset limitations and generalizability in a dedicated limitations paragraph.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We have revised the abstract and expanded the discussion to address the concerns about transparency, controls, and methodological details.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The assertion that the signal 'was unlikely contributed by demographic features' rests on an auxiliary 'machine learning study combining age, race, and sex labels' whose quantitative results (e.g., AUC of the demographic-only predictor versus the imaging model) are not reported, preventing evaluation of whether demographics are adequately controlled.

    Authors: We agree that the abstract should report the quantitative results of the auxiliary demographic study for proper evaluation. The full manuscript describes this control experiment; we have revised the abstract to explicitly summarize that the combined demographic model (age, race, sex) shows lower performance than the imaging model, with the specific AUC values and methodology now highlighted in the abstract and cross-referenced to the main text. revision: yes

  2. Referee: [Abstract] Abstract: Both MIMIC-CXR-JPG and CheXpert are single-institution datasets; the manuscript does not control for or discuss confounding by acquisition device, department, or workflow factors that correlate with insurance type. The reported demographic controls and single-race subgroup results do not address these institutional variables, which remain a load-bearing alternative explanation for the observed AUCs.

    Authors: We acknowledge this as a substantive concern. The manuscript already interprets the diffuse signal as potentially arising from clinical environment or care pathway differences, which are institutional. However, the single-race subgroup analysis controls for race but does not isolate device or workflow factors. In the revision we have added an explicit limitations paragraph in the Discussion acknowledging these institutional confounders as a plausible alternative and recommending multi-institutional validation in future work. revision: yes

  3. Referee: [Abstract] Abstract: No information is provided on train/test splits, class balance for insurance categories, preprocessing steps, or statistical testing (confidence intervals, p-values) for the reported AUCs of 0.70 and 0.68, limiting assessment of result robustness.

    Authors: We thank the referee for noting this omission in the abstract. These details appear in the Methods section of the full manuscript. We have revised the abstract to concisely include the patient-level train/test split approach, insurance category distributions, standard preprocessing steps, and the bootstrap procedure used to obtain confidence intervals around the reported AUCs. revision: yes

Circularity Check

0 steps flagged

Empirical ML evaluation on held-out data with no self-referential derivations

full rationale

The paper reports AUC performance from training standard vision architectures on public datasets (MIMIC-CXR-JPG, CheXpert) and evaluating on held-out test splits. No equations, ansatzes, or derivations are presented that would reduce the reported metrics to a parameter fitted directly to the insurance-type labels. Demographic controls and single-race subgroup experiments are additional empirical checks rather than definitional reductions. The central claim rests on observable model outputs against external test data and is therefore self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim rests on standard assumptions of supervised deep learning on public medical imaging datasets plus the domain assumption that insurance type correlates with visual features in normal X-rays independent of listed demographics.

free parameters (1)
  • model training hyperparameters
    Learning rates, batch sizes, and augmentation choices typical in training DenseNet, Swin, and Mamba architectures.
axioms (1)
  • domain assumption Normal chest X-rays contain diffuse visual features correlated with socioeconomic status via clinical environment or equipment differences
    Invoked in the interpretation of patch occlusion results and the claim that the signal is not demographic.

pith-pipeline@v0.9.0 · 5598 in / 1300 out tokens · 42800 ms · 2026-05-17T22:30:25.028666+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Quantum Kernel Advantage over Classical Collapse in Medical Foundation Model Embeddings

    quant-ph 2026-04 conditional novelty 5.0

    Quantum kernels in QSVM deliver higher minority-class F1 scores than classical linear or RBF kernels on medical foundation model embeddings for binary insurance classification, avoiding classical collapse in noiseless...

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Chexpert plus: Hundreds of thousands of aligned radiology texts, im- ages and patients.arXiv preprint arXiv:2405.19538, 2024

    Pierre Chambon, Jean-Benoit Delbrouck, Thomas Sounack, Shih-Cheng Huang, Zhihong Chen, Maya 7 Short Title Varma, Steven QH Truong, Chu The Chuong, and Curtis P Langlotz. Chexpert plus: Augmenting a large chest x-ray dataset with text radiology re- ports, patient demographics and additional image formats.arXiv preprint arXiv:2405.19538,

  2. [2]

    MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs

    Alistair Johnson, Matt Lungren, Yifan Peng, Zhiy- ong Lu, Roger Mark, Seth Berkowitz, and Steven Horng. Mimic-cxr-jpg-chest radiographs with structured labels.PhysioNet, 101:215–220, 2019a. Alistair EW Johnson, Tom J Pollard, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Yifan Peng, Zhiyong Lu, Roger G Mark, Seth J Berkowitz, and Steven Horng....

  3. [3]

    No llm is free from bias: A com- prehensive study of bias evaluation in large lan- guage models.arXiv preprint arXiv:2503.11985,

    Charaka Vinayak Kumar, Ashok Urlana, Gopichand Kanumolu, Bala Mallikarjunarao Garlapati, and Pruthwik Mishra. No llm is free from bias: A com- prehensive study of bias evaluation in large lan- guage models.arXiv preprint arXiv:2503.11985,

  4. [4]

    Cautious optimizers: Improving training with one line of code.arXiv preprint arXiv:2411.16085,

    8 Short Title Kaizhao Liang, Lizhang Chen, Bo Liu, and Qiang Liu. Cautious optimizers: Improving training with one line of code.arXiv preprint arXiv:2411.16085,

  5. [5]

    Chexclusion: Fairness gaps in deep chest x-ray classifiers

    Laleh Seyyed-Kalantari, Guanxiong Liu, Matthew McDermott, Irene Y Chen, and Marzyeh Ghas- semi. Chexclusion: Fairness gaps in deep chest x-ray classifiers. InBIOCOMPUTING 2021: pro- ceedings of the Pacific symposium, pages 232–243. World Scientific,

  6. [6]

    Medmamba: Vision mamba for medical image classification,

    Yubiao Yue and Zhenzhang Li. Medmamba: Vi- sion mamba for medical image classification.arXiv preprint arXiv:2403.03849,

  7. [7]

    Dataset description MIMIC-IV v3.0 Johnson et al

    Appendix A. Dataset description MIMIC-IV v3.0 Johnson et al. (2024,

  8. [8]

    (2019a,b) is an ex- tended image dataset for MIMIC-IV v3.0, including 377,110 chest X-ray images in total

    is a large medical dataset containing over 265,000 pa- tients’ data collected at Beth Israel Deaconess Medi- cal Center in Boston, MA, in the intensive care unit or emergency department between 2008-2022, while MIMIC-CXR-JPG Johnson et al. (2019a,b) is an ex- tended image dataset for MIMIC-IV v3.0, including 377,110 chest X-ray images in total. On the oth...

  9. [9]

    The recent CheXpert Plus paper Chambon et al

    The images were downsized to 390 x 320 in the downsized version. The recent CheXpert Plus paper Chambon et al. (2024) provides additional demographic infor- mation of each patient, including their health insur- ance type, race, sex, and age. 9