arxiv: 2604.20331 · v2 · submitted 2026-04-22 · 💻 cs.CL · cs.AI· cs.LG

Recognition: unknown

Surrogate modeling for interpreting black-box LLMs in medical predictions

Changho Han , Songsoo Kim , Dong Won Kim , Leo Anthony Celi , Jaewoong Kim , SungA Bae , Dukyong Yoon

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:25 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords surrogate modelingLLM interpretabilityblack-box modelsmedical predictionsbias detectionknowledge extractionprompting

0 comments

The pith

A surrogate modeling framework approximates LLM knowledge from input-output pairs to reveal variable influences and hidden biases in medical predictions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes a surrogate modeling framework to interpret the encoded knowledge inside black-box large language models by generating extensive input-output data through prompting across many simulated medical scenarios. The method then measures how strongly each input variable relates to the model's output predictions, allowing quantitative assessment of what the LLM has learned. In proof-of-concept experiments, the framework detected both associations that contradict standard medical knowledge and the continued presence of disproven racial assumptions in the model's responses. A reader would care because LLMs are increasingly proposed for medical use where their opaque reasoning could lead to incorrect or biased decisions. If the approach holds, it offers a practical test for reliability without needing direct access to model internals.

Core claim

The central claim is that for a hypothesis drawn from domain knowledge, the surrogate framework approximates the latent LLM knowledge space using only observable input-output pairs obtained through extensive prompting across a comprehensive range of simulated scenarios, thereby revealing the extent to which LLMs perceive each input variable in relation to the output and quantitatively exposing both associations that contradict established medical knowledge and the persistence of scientifically refuted racial assumptions within LLM-encoded knowledge.

What carries the argument

Surrogate modeling framework that approximates the LLM's latent knowledge space via extensive prompting to produce observable input-output pairs.

If this is right

The framework quantifies the influence of each input variable on LLM outputs in medical prediction tasks.
It identifies associations in LLM predictions that contradict established medical knowledge.
It detects persistence of refuted racial assumptions within the LLM's encoded knowledge.
This provides a practical red-flag indicator for safe and reliable use of LLMs in medical applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same surrogate approach could be applied outside medicine to check for encoded biases in LLMs used in other high-stakes domains.
If the approximations prove reliable, the framework could support targeted adjustments to LLM behavior based on detected flawed associations.
The findings imply that simply cleaning training data may not eliminate all problematic encoded assumptions.

Load-bearing premise

That extensive prompting across many simulated scenarios can accurately approximate an LLM's full latent knowledge using only the observable input-output pairs.

What would settle it

If altering an input variable in direct LLM queries produces output changes that the surrogate model does not predict, the approximation of latent knowledge would be shown as incomplete.

read the original abstract

Large language models (LLMs), trained on vast datasets, encode extensive real-world knowledge within their parameters, yet their black-box nature obscures the mechanisms and extent of this encoding. Surrogate modeling, which uses simplified models to approximate complex systems, can offer a path toward better interpretability of black-box models. We propose a surrogate modeling framework that quantitatively explains LLM-encoded knowledge. For a specific hypothesis derived from domain knowledge, this framework approximates the latent LLM knowledge space using observable elements (input-output pairs) through extensive prompting across a comprehensive range of simulated scenarios. Through proof-of-concept experiments in medical predictions, we demonstrate our framework's effectiveness in revealing the extent to which LLMs "perceive" each input variable in relation to the output. Particularly, given concerns that LLMs may perpetuate inaccuracies and societal biases embedded in their training data, our experiments using this framework quantitatively revealed both associations that contradict established medical knowledge and the persistence of scientifically refuted racial assumptions within LLM-encoded knowledge. By disclosing these issues, our framework can act as a red-flag indicator to support the safe and reliable application of these models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The surrogate framework for probing LLM medical knowledge is a timely idea but the proof-of-concept lacks the held-out checks needed to trust the extracted associations.

read the letter

The paper puts forward a surrogate modeling pipeline that prompts an LLM on many simulated medical scenarios, collects the input-output pairs, and fits simple models to quantify how much each variable seems to influence the output. They apply this to surface both medically implausible associations and lingering racial biases in the LLM's responses. That concrete demonstration in the medical domain is the main new element; surrogate techniques and prompting probes exist already, but the exhaustive scenario approach tailored to clinical safety questions is a useful extension. The work does a solid job framing the safety problem and showing example outputs that match known concerns about LLMs in healthcare. The experiments are presented as proof-of-concept rather than definitive, which keeps expectations reasonable. The soft spot is the missing validation that the surrogate actually tracks the original LLM on scenarios outside the training prompts. Without that check, the reported variable importances or contradictions could stem from the surrogate's inductive bias or incomplete scenario coverage instead of the LLM's true encoded knowledge. The abstract and stress-test note both flag this gap, and it remains the central limitation even after reading the full methods. Readers focused on AI auditing or safe deployment in medicine will get practical value from the pipeline description and the bias examples. The paper shows clear thinking about the circularity risk and engages the relevant literature without overclaiming. It deserves peer review because the underlying safety issue is important and the core method is coherent enough to improve with targeted revisions on validation and quantification details. I would send it to referees rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a surrogate modeling framework to interpret black-box LLMs by approximating their latent knowledge via input-output pairs generated through extensive prompting over simulated scenarios. Surrogate models (linear or tree-based) are then fit to these pairs, with coefficients or feature importances interpreted as quantitative measures of how the LLM 'perceives' each input variable's relation to the output. Proof-of-concept experiments in medical predictions are presented as demonstrating the framework's ability to uncover associations that contradict established medical knowledge as well as the persistence of refuted racial biases within the LLM's encoded knowledge.

Significance. If the surrogates can be shown to faithfully recover LLM behavior, the framework would offer a practical auditing method for detecting biases and factual inaccuracies in LLMs applied to high-stakes medical tasks, extending existing surrogate-based interpretability techniques to prompted LLM outputs. The quantitative focus on both contradictory associations and societal biases is timely given deployment concerns in healthcare. The approach is conceptually straightforward and leverages observable pairs without requiring internal access, but its current evidential support is limited by missing validation steps.

major comments (2)

[§3] §3: The surrogate construction trains linear or tree-based models on prompted input-output pairs and interprets their parameters as LLM perceptions, yet no quantitative fidelity check is reported comparing surrogate predictions to the original LLM on held-out scenarios outside the prompting distribution. This validation is load-bearing for the central claim that the extracted associations reflect LLM-encoded knowledge rather than surrogate inductive bias or prompt artifacts.
[Abstract] Abstract and experimental description: The claim that experiments 'quantitatively revealed both associations that contradict established medical knowledge and the persistence of scientifically refuted racial assumptions' is asserted without details on the medical hypotheses tested, number of simulated scenarios, method for quantifying contradictions against external benchmarks, or controls for prompt engineering sensitivity. These omissions prevent assessment of whether the data support the reported findings.

minor comments (2)

[Abstract] The abstract and §1 repeat the motivation about LLMs encoding biases from training data; condensing this would improve readability without loss of content.
[§3] Notation distinguishing the LLM's latent function f from the surrogate approximator g could be introduced explicitly in §3 to clarify the approximation step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments highlight important aspects of validation and transparency that will improve the manuscript. We address each major comment below and will incorporate revisions as indicated.

read point-by-point responses

Referee: [§3] The surrogate construction trains linear or tree-based models on prompted input-output pairs and interprets their parameters as LLM perceptions, yet no quantitative fidelity check is reported comparing surrogate predictions to the original LLM on held-out scenarios outside the prompting distribution. This validation is load-bearing for the central claim that the extracted associations reflect LLM-encoded knowledge rather than surrogate inductive bias or prompt artifacts.

Authors: We agree that a direct fidelity assessment on held-out scenarios is necessary to substantiate that the surrogate captures LLM behavior rather than its own biases or prompt effects. The current proof-of-concept experiments focus on associations derived within the prompted distribution, but we recognize the gap. In the revised manuscript we will add a dedicated validation subsection that generates new scenarios outside the original prompting set, obtains LLM outputs on them, and reports quantitative agreement metrics (e.g., prediction correlation or classification accuracy) between the surrogate and the LLM. This addition will directly address the load-bearing concern. revision: yes
Referee: [Abstract] Abstract and experimental description: The claim that experiments 'quantitatively revealed both associations that contradict established medical knowledge and the persistence of scientifically refuted racial assumptions' is asserted without details on the medical hypotheses tested, number of simulated scenarios, method for quantifying contradictions against external benchmarks, or controls for prompt engineering sensitivity. These omissions prevent assessment of whether the data support the reported findings.

Authors: We acknowledge that the abstract and high-level experimental description lack sufficient specifics for independent evaluation. Although the body of the manuscript describes the medical prediction tasks and scenario generation, we will revise the abstract to concisely state the hypotheses examined, the scale of the simulated scenarios, the external medical benchmarks used to identify contradictions, and the prompt-variation controls performed. We will also expand the methods/results sections with explicit quantification procedures and sensitivity results to ensure the claims are fully supported and verifiable. revision: yes

Circularity Check

1 steps flagged

Surrogate interpretations of LLM perceptions reduce to fits on prompted input-output pairs

specific steps

fitted input called prediction [Abstract]
"this framework approximates the latent LLM knowledge space using observable elements (input-output pairs) through extensive prompting across a comprehensive range of simulated scenarios. Through proof-of-concept experiments in medical predictions, we demonstrate our framework's effectiveness in revealing the extent to which LLMs 'perceive' each input variable in relation to the output."

The latent space is operationally replaced by prompted pairs; a surrogate is then fitted to those pairs and its coefficients/importances are presented as the LLM's perceptions. Because the surrogate is optimized directly on the pairs, any revealed associations (e.g., racial assumptions or contradictions with medical knowledge) are by construction the surrogate's explanation of the prompted data rather than an independently validated recovery of internal LLM structure.

full rationale

The paper defines its core contribution as a surrogate framework that generates input-output pairs via prompting to approximate latent LLM knowledge, then trains simplified models (linear, tree-based) on those pairs to extract variable importances as 'perceptions.' This matches the fitted-input-called-prediction pattern because the extracted associations are statistically forced by the surrogate fit to the very data used to stand in for the LLM; the abstract presents this as revealing LLM-encoded knowledge (including medically contradictory associations) without describing held-out fidelity checks against the original LLM on new scenarios. The derivation chain therefore reduces the claimed explanation of latent space to modeling the prompted observations by construction. No self-citation load-bearing or uniqueness theorems appear; the circularity is limited to the surrogate step itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not enumerate any free parameters, axioms, or invented entities. The approach implicitly assumes that domain-derived hypotheses and prompting can stand in for internal model states, but no explicit ledger is provided.

pith-pipeline@v0.9.0 · 5513 in / 1294 out tokens · 36582 ms · 2026-05-10T00:25:37.479135+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

79 extracted references · 5 canonical work pages · 1 internal anchor

[1]

Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023)

2023
[2]

Thirunavukarasu, A. J. et al. Large language models in medicine. Nat Med 29, 1930–1940 (2023)

1930
[3]

& Barrera, J

Marcondes, D., Simonis, A. & Barrera, J. Back to basics to open the black box. Nat. Mach. Intell. 6, 498–501 (2024)

2024
[4]

Nature 619, 671–672 (2023)

ChatGPT is a black box: how AI research can break it open. Nature 619, 671–672 (2023)

2023
[5]

Nussberger, A.-M., Luo, L., Celis, L. E. & Crockett, M. J. Public attitudes value interpretability but prioritize accuracy in Artificial Intelligence. Nat Commun 13, 5821 (2022)

2022
[6]

Academia and industry must oversee it - together

More-powerful AI is coming. Academia and industry must oversee it - together. Nature 636, 273 (2024)

2024
[7]

Zack, T. et al. Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study. Lancet Digit Health 6, e12–e22 (2024)

2024
[8]

& Zou, J

Abid, A., Farooqi, M. & Zou, J. Large language models associate Muslims with violence. Nat. Mach. Intell. 3, 461–463 (2021)

2021
[9]

Gallifant, J. et al. Ethical debates amidst flawed healthcare artificial intelligence metrics. NPJ Digit Med 7, 243 (2024)

2024
[10]

Kwong, J. C. C., Wang, S. C. Y., Nickel, G. C., Cacciamani, G. E. & Kvedar, J. C. The long but necessary road to responsible use of large language models in healthcare research. NPJ Digit Med 7, 177 (2024)

2024
[11]

Chen, S. et al. Cross-Care: Assessing the healthcare implications of pre-training data on language model bias. arXiv [cs.CL] (2024) doi:10.48550/ARXIV.2405.05506

work page doi:10.48550/arxiv.2405.05506 2024
[12]

& Gal, Y

Farquhar, S., Kossen, J., Kuhn, L. & Gal, Y. Detecting hallucinations in large language models using semantic entropy. Nature 630, 625–630 (2024)

2024
[13]

Forrester, A. I. J., Sobester, A. & Keane, A. Engineering Design via Surrogate Modelling: A Practical Guide. (Wiley-Blackwell, Hoboken, NJ, 2008)

2008
[14]

Hassija, V. et al. Interpreting black-box models: A review on explainable Artificial Intelligence. Cognit. Comput. 16, 45–74 (2024)

2024
[15]

Han, C. et al. Evaluation of GPT-4 for 10-year cardiovascular risk prediction: Insights from the UK Biobank and KoGES data. iScience 27, 109022 (2024)

2024
[16]

GBD 2013 Mortality and Causes of Death Collaborators. Global, regional, and national age- sex specific all-cause and cause-specific mortality for 240 causes of death, 1990-2013: a systematic analysis for the Global Burden of Disease Study 2013. Lancet 385, 117–171 (2015)

2013
[18]

Arnett, D. K. et al. 2019 ACC/AHA Guideline on the Primary Prevention of Cardiovascular Disease: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Circulation 140, e596–e646 (2019). 27

2019
[19]

& Orth, M

Klawonn, F., Hoffmann, G. & Orth, M. Quantitative laboratory results: normal or lognormal distribution? Journal of Laboratory Medicine 44, 143–150 (2020)

2020
[20]

W., Freeman, D

Makuch, R. W., Freeman, D. H., Jr & Johnson, M. F. Justification for the lognormal distribution as a model for blood pressure. J Chronic Dis 32, 245–250 (1979)

1979
[21]

Konishi, T. et al. Human lipoproteins comprise at least 12 different classes that are lognormally distributed. PLoS One 17, e0275066 (2022)

2022
[22]

https://openai.com/chatgpt

Website. https://openai.com/chatgpt
[23]

https://www.anthropic.com/news/claude-3-5-sonnet

Claude 3.5 Sonnet. https://www.anthropic.com/news/claude-3-5-sonnet
[24]

Google DeepMind https://deepmind.google/technologies/gemini/pro/

Gemini Pro. Google DeepMind https://deepmind.google/technologies/gemini/pro/
[25]

Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 12, e1001779 (2015)

2015
[26]

B., Sr et al

D’Agostino, R. B., Sr et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation 117, 743–753 (2008)

2008
[27]

A Unified Approach to Interpreting Model Predictions

Lundberg, S. & Lee, S.-I. A unified approach to interpreting model predictions. arXiv [cs.AI] (2017) doi:10.48550/ARXIV.1705.07874

work page Pith review doi:10.48550/arxiv.1705.07874 2017
[28]

https://llm-surrogate.com/

LLM-surrogate. https://llm-surrogate.com/
[29]

A., Lester, J

Omiye, J. A., Lester, J. C., Spichak, S., Rotemberg, V. & Daneshjou, R. Large language models propagate race-based medicine. NPJ Digit Med 6, 195 (2023)

2023
[30]

Inker, L. A. et al. New Creatinine- and Cystatin C-Based Equations to Estimate GFR without Race. N Engl J Med 385, 1737–1749 (2021)

2021
[31]

Levey, A. S. et al. A new equation to estimate glomerular filtration rate. Ann Intern Med 150, 604–612 (2009)

2009
[32]

Delgado, C. et al. A Unifying Approach for GFR Estimation: Recommendations of the NKF- ASN Task Force on Reassessing the Inclusion of Race in Diagnosing Kidney Disease. Am J Kidney Dis 79, 268–288.e1 (2022)

2022
[34]

T., Pogue, J

de Koning, L., Merchant, A. T., Pogue, J. & Anand, S. S. Waist circumference and waist-to- hip ratio as predictors of cardiovascular events: meta-regression analysis of prospective studies. Eur Heart J 28, 850–856 (2007)

2007
[37]

Gill, D. et al. Urate, Blood Pressure, and Cardiovascular Disease: Evidence From Mendelian Randomization and Meta-Analysis of Clinical Trials. Hypertension 77, 383–392 (2021)

2021
[39]

Liu, G. K.-M. Perspectives on the social impacts of reinforcement learning with human feedback. arXiv [cs.CY] (2023) doi:10.48550/ARXIV.2303.02891

work page doi:10.48550/arxiv.2303.02891 2023
[40]

Ganguli, D. et al. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv [cs.CL] (2022) doi:10.48550/ARXIV.2209.07858. 28

work page internal anchor Pith review doi:10.48550/arxiv.2209.07858 2022
[41]

& Bhayana, R

Krishna, S., Bhambra, N., Bleakney, R. & Bhayana, R. Evaluation of Reliability, Repeatability, Robustness, and Confidence of GPT-3.5 and GPT-4 on a Radiology Board- style Examination. Radiology 311, e232715 (2024)

2024
[42]

Ballard, D. H. Inconsistently Accurate: Repeatability of GPT-3.5 and GPT-4 in Answering Radiology Board-style Multiple Choice Questions. Radiology 311, e241173 (2024)

2024
[43]

Exploring the pitfalls of large language models: Inconsistency and inaccuracy in answering pathology board examination-style questions

Koga, S. Exploring the pitfalls of large language models: Inconsistency and inaccuracy in answering pathology board examination-style questions. Pathol Int 73, 618–620 (2023)

2023
[44]

Liu, J., Wang, Y., Du, J., Zhou, J. T. & Liu, Z. MedCoT: Medical chain of thought via hierarchical expert. in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing 17371–17389 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2024)

2024
[45]

Kwon, T. et al. Large language models are clinical reasoners: Reasoning-aware diagnosis framework with prompt-generated rationales. Proc. Conf. AAAI Artif. Intell. 38, 18417– 18425 (2024)

2024
[46]

Lăzăroiu, G. et al. The economics of deep and machine learning-based algorithms for COVID-19 prediction, detection, and diagnosis shaping the organizational management of hospitals. Oecon. Copernic. 15, 27–58 (2024)

2024
[47]

Steyerberg, E. W. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. (Springer Nature, Cham, Switzerland, 2019)

2019
[48]

Jenkins, D. A. et al. Continual updating and monitoring of clinical prediction models: time for dynamic prediction systems? Diagn Progn Res 5, 1 (2021)

2021
[49]

Re-evaluating GPT-4’s bar exam performance

Martí nez, E. Re-evaluating GPT-4’s bar exam performance. Artif. Intell. Law (2024) doi:10.1007/s10506-024-09396-9

work page doi:10.1007/s10506-024-09396-9 2024
[50]

Singhal, K. et al. Toward expert-level medical question answering with large language models. Nat Med 31, 943–950 (2025)

2025
[51]

Yang, X. et al. A large language model for electronic health records. NPJ Digit Med 5, 194 (2022)

2022
[52]

creative freedom

Dufouil, C. et al. Revised Framingham Stroke Risk Profile to Reflect Temporal Trends. Circulation 135, 1145–1159 (2017). 29 Supplementary Information Supplementary methods Supplementary Method S1. Variable selection (Experiment 1) Supplementary Method S2. Lognormal distribution Supplementary Method S3. Temperature experiments Supplementary results Supplem...

2017
[53]

C., Jr et al

Goff, D. C., Jr et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation 129, S49–73 (2014)

2013
[54]

Pirillo, A., Casula, M., Olmastroni, E., Norata, G. D. & Catapano, A. L. Global epidemiology of dyslipidaemias. Nat. Rev. Cardiol. 18, 689–700 (2021)

2021
[55]

Yusuf, S. et al. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study. Lancet 364, 937–952 (2004)

2004
[56]

Lip, G. Y. H., Tse, H. F. & Lane, D. A. Atrial fibrillation. Lancet 379, 648–661 (2012)

2012
[57]

Odutayo, A. et al. Atrial fibrillation and risks of cardiovascular disease, renal disease, and death: systematic review and meta-analysis. BMJ 354, i4482 (2016)

2016
[58]

& Marx, N

Jankowski, J., Floege, J., Fliser, D., Böhm, M. & Marx, N. Cardiovascular Disease in Chronic Kidney Disease: Pathophysiological Insights and Therapeutic Options. Circulation 143, 1157–1172 (2021)

2021
[59]

H., Nitsch, D., Neuen, B

Kalantar-Zadeh, K., Jafar, T. H., Nitsch, D., Neuen, B. L. & Perkovic, V. Chronic kidney disease. Lancet 398, 786–802 (2021)

2021
[60]

McCusker, M. E. et al. Family history of heart disease and cardiovascular disease risk-reducing behaviors. Genet. Med. 6, 153–158 (2004)

2004
[61]

Lloyd-Jones, D. M. et al. Parental cardiovascular disease as a risk factor for cardiovascular disease in middle - aged adults: a prospective study of parents and offspring. JAMA 291, 2204–2211 (2004)

2004
[62]

Flint, A. C. et al. Effect of systolic and diastolic blood pressure on cardiovascular outcomes. N. Engl. J. Med. 381, 243–251 (2019)

2019
[63]

Khan, S. S. et al. Association of body mass index with lifetime risk of cardiovascular disease and compression of morbidity. JAMA Cardiol. 3, 280–287 (2018)

2018
[64]

Powell-Wiley, T. M. et al. Obesity and Cardiovascular Disease: A Scientific Statement From the American Heart Association. Circulation 143, e984–e1010 (2021)

2021
[65]

T., Pogue, J

de Koning, L., Merchant, A. T., Pogue, J. & Anand, S. S. Waist circumference and waist -to-hip ratio as predictors of cardiovascular events: meta-regression analysis of prospective studies. Eur Heart J 28, 850–856 (2007)

2007
[66]

Blood cholesterol and vascular mortality by age, sex, and blood pressure: a meta-analysis of individual data from 61 prospective studies with 55,000 vascular deaths

Prospective Studies Collaboration et al. Blood cholesterol and vascular mortality by age, sex, and blood pressure: a meta-analysis of individual data from 61 prospective studies with 55,000 vascular deaths. Lancet 370, 1829–1839 (2007)

2007
[67]

Lipid-related markers and cardiovascular disease prediction

Emerging Risk Factors Collaboration et al. Lipid-related markers and cardiovascular disease prediction. JAMA 307, 2499–2506 (2012)

2012
[68]

Nordestgaard, B. G. & Varbo, A. Triglycerides and cardiovascular disease. Lancet 384, 626–635 (2014)

2014
[69]

Selvin, E. et al. Glycated hemoglobin, diabetes, and cardiovascular risk in nondiabetic adults. N. Engl. J. Med. 362, 800–811 (2010)

2010
[70]

Hall, W. D. Abnormalities of kidney function as a cause and a consequence of cardiovascular disease. Am. J. Med. Sci. 317, 176–182 (1999)

1999
[71]

L., Benn, M

Sibilitz, K. L., Benn, M. & Nordestgaard, B. G. Creatinine, eGFR and association with myocardial infarction, ischemic heart disease and early death in the general population. Atherosclerosis 237, 67–75 (2014)

2014
[72]

Borghi, C. et al. Hyperuricaemia and gout in cardiovascular, metabolic and kidney disease. Eur. J. Intern. Med. 80, 1–11 (2020)

2020
[73]

& Cheng, J.-D

Yu, W. & Cheng, J.-D. Uric Acid and Cardiovascular Disease: An Update From Molecular Mechanism to Clinical Perspective. Front Pharmacol 11, 582680 (2020)

2020
[74]

C-reactive protein concentration and risk of coronary heart disease, stroke, and mortality: an individual participant meta-analysis

Emerging Risk Factors Collaboration et al. C-reactive protein concentration and risk of coronary heart disease, stroke, and mortality: an individual participant meta-analysis. Lancet 375, 132–140 (2010)

2010
[75]

Lagrand, W. K. et al. C-reactive protein as a cardiovascular risk factor: more than an epiphenomenon? Circulation 100, 96–102 (1999)

1999
[76]

T., Levy, R

Friedewald, W. T., Levy, R. I. & Fredrickson, D. S. Estimation of the concentration of low -density lipoprotein cholesterol in plasma, without use of the preparative ultracentrifuge. Clin Chem 18, 499–502 (1972)

1972
[77]

https://platform.openai.com/docs/api-reference/chat

Website. https://platform.openai.com/docs/api-reference/chat. 34
[78]

Dufouil, C. et al. Revised Framingham Stroke Risk Profile to Reflect Temporal Trends. Circulation 135, 1145– 1159 (2017)

2017
[79]

J., Lebowitz, M

Knudson, R. J., Lebowitz, M. D., Holberg, C. J. & Burrows, B. Changes in the normal maximal expiratory flow - volume curve with growth and aging. Am Rev Respir Dis 127, 725–734 (1983)

1983
[80]

Quanjer, P. H. et al. Multi-ethnic reference values for spirometry for the 3-95-yr age range: the global lung function 2012 equations. Eur Respir J 40, 1324–1343 (2012)

2012
[81]

Quanjer, P. H. et al. Lung volumes and forced ventilatory flows. Eur Respir J 6 Suppl 16, 5–40 (1993)

1993
[82]

Bhakta, N. R. et al. Race and Ethnicity in Pulmonary Function Test Interpretation: An Official American Thoracic Society Statement. Am J Respir Crit Care Med 207, 978–995 (2023)

2023
[83]

Pulmonary Function Prediction Equations-Clinical Ramifications of a Universal Standard

Gaffney, A. Pulmonary Function Prediction Equations-Clinical Ramifications of a Universal Standard. JAMA Netw Open 6, e2316129 (2023)

2023
[84]

Surrogate prediction model (+None)

Bowerman, C. et al. A Race-neutral Approach to the Interpretation of Lung Function Measurements. Am J Respir Crit Care Med 207, 768–774 (2023). 35 Supplementary figures Supplementary Fig. S1. Density plots for the continuous variables in the simulated dataset (Experiment 1). We selected a probability distribution for each variable and generated a simulate...

2023