Recognition: 2 theorem links
· Lean TheoremShapley Regression for Rare Disease Diagnosis Support: a case study on APDS
Pith reviewed 2026-05-12 00:54 UTC · model grok-4.3
The pith
Shapley regression replaces linear predictors with k-additive games to model symptom co-occurrences for APDS diagnosis.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Shapley regression is a game-theoretic model that replaces the linear predictor with a k-additive cooperative game, explicitly modeling co-occurrence of symptoms while retaining the transparency and convexity of logistic regression. On eight biomedical datasets a 2-additive model with l2 regularization achieves the optimal trade-off between predictive power and noise robustness. Applied to a real-world cohort of 222 patients, the approach accurately distinguishes APDS cases from matched controls, confirms known associated phenotypes, and enables exploration of pairwise symptom interactions validated by clinical experts.
What carries the argument
Shapley regression: a k-additive cooperative game that substitutes for the linear term inside logistic regression to encode symptom co-occurrences.
If this is right
- The model can flag potential APDS patients earlier from standard electronic records without requiring specialist input.
- Pairwise symptom interactions become directly inspectable and can be checked against clinical knowledge.
- The same lightweight approach applies to other rare diseases whose symptoms overlap with common conditions.
- Interpretability is preserved so clinicians can trace which symptom combinations drive each prediction.
Where Pith is reading between the lines
- Similar game-based replacements could be tested in other medical domains where interactions among binary features matter but full deep models are too opaque.
- Deploying the method inside hospital record systems might shorten the typical multi-year diagnostic delay for APDS and comparable disorders.
- Larger or multi-center cohorts would reveal whether the 2-additive limit remains sufficient or whether certain patients require higher-order terms.
- The convexity property may allow efficient integration with existing clinical decision-support tools that already use logistic scores.
Load-bearing premise
Truncating the game to pairs of symptoms plus l2 regularization is enough to capture the clinically important interactions without missing higher-order effects or inheriting bias from the control-matching process.
What would settle it
An independent APDS cohort where adding triple-wise symptom terms raises predictive accuracy by more than 5 percent or where the 2-additive model fails to separate cases from controls at the reported level.
Figures
read the original abstract
Activated PI3K8 Syndrome (APDS) is a rare genetic immune disorder caused by variants in PIK3CD or PIK3R1, with highly heterogeneous symptoms that often delay diagnosis. Early recognition is hampered by overlapping clinical presentations and limited clinician awareness, motivating systematic, data-driven approaches to detect APDS-associated phenotypic patterns in routine electronic health records. Traditional linear scoring systems cannot capture complex symptom interactions, while deep learning models, though expressive, often lack interpretability. To bridge this gap, we propose Shapley regression, a novel game-theoretic model replacing the linear predictor with a k-additive cooperative game, explicitly modeling co-occurrence of symptoms while maintaining the transparency and convexity of logistic regression. We carry out an empirical study of our lightweight method on eight public biomedical datasets, showing that a 2-additive model with $l_{2}$ regularization achieves an optimal trade-off between predictive power and noise robustness. We also apply it to a real-world cohort of 222 patients, on which Shapley regression accurately distinguished APDS cases from matched controls, confirming and validating phenotypes known to be associated with APDS, and facilitating the exploration of pairwise interactions between symptoms, validated by clinical experts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Shapley regression, a k-additive cooperative game extension to logistic regression that explicitly models symptom co-occurrences while preserving convexity and interpretability. It reports that a 2-additive model with L2 regularization achieves an optimal trade-off on eight public biomedical datasets. The method is then applied to a 222-patient APDS cohort, where it distinguishes cases from matched controls, validates known phenotypes, and identifies expert-confirmed pairwise symptom interactions.
Significance. If the quantitative claims hold, the work provides a transparent, convex alternative to linear models or black-box approaches for phenotyping rare diseases from EHR data, with explicit handling of pairwise interactions. The expert validation step adds clinical utility, and the focus on a real-world rare-disease cohort demonstrates practical applicability. No machine-checked proofs or fully reproducible code artifacts are described, but the game-theoretic framing offers a clear path for falsifiable follow-up studies.
major comments (2)
- [Abstract and Empirical Evaluation] Abstract and Empirical Evaluation sections: the claim that a 2-additive model with L2 regularization achieves an optimal trade-off between predictive power and noise robustness is unsupported by any reported metrics (AUC, accuracy, F1), cross-validation details, confidence intervals, or ablation tables comparing k values and regularization strengths. This is load-bearing for the central empirical claim and must be addressed with specific numbers and baselines.
- [APDS Cohort Application] APDS Cohort Application section: the matching procedure used to select controls in the 222-patient cohort is unspecified (variables, criteria, or algorithm). This is load-bearing for the claim that Shapley regression 'accurately distinguished APDS cases from matched controls' and validated phenotypes, because matching on age, sex, or correlated factors can alter symptom co-occurrence statistics and introduce spurious signals, as highlighted by the study design critique. Sensitivity checks (unmatched controls or alternative schemes) are required.
minor comments (1)
- [Abstract] The abstract lists 'eight public biomedical datasets' without naming them or providing accession details, which hinders immediate reproducibility assessment.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address each major comment point by point below. Where the comments identify gaps in reporting or description, we have revised the manuscript to incorporate the requested details and analyses.
read point-by-point responses
-
Referee: [Abstract and Empirical Evaluation] Abstract and Empirical Evaluation sections: the claim that a 2-additive model with L2 regularization achieves an optimal trade-off between predictive power and noise robustness is unsupported by any reported metrics (AUC, accuracy, F1), cross-validation details, confidence intervals, or ablation tables comparing k values and regularization strengths. This is load-bearing for the central empirical claim and must be addressed with specific numbers and baselines.
Authors: We agree that the central empirical claim requires explicit quantitative support. The manuscript reports results from an empirical study across eight public biomedical datasets but does not include the specific performance metrics, cross-validation details, confidence intervals, or ablation tables in the main text. In the revised manuscript we will add a table in the Empirical Evaluation section reporting AUC, accuracy, and F1 scores for k=1, 2, and 3 under no regularization, L1, and L2 regularization, together with 5-fold cross-validation results and 95% confidence intervals. Standard logistic regression will be included as an explicit baseline to demonstrate the claimed trade-off. revision: yes
-
Referee: [APDS Cohort Application] APDS Cohort Application section: the matching procedure used to select controls in the 222-patient cohort is unspecified (variables, criteria, or algorithm). This is load-bearing for the claim that Shapley regression 'accurately distinguished APDS cases from matched controls' and validated phenotypes, because matching on age, sex, or correlated factors can alter symptom co-occurrence statistics and introduce spurious signals, as highlighted by the study design critique. Sensitivity checks (unmatched controls or alternative schemes) are required.
Authors: We agree that the matching procedure must be fully specified and that sensitivity checks are necessary to support the claims. In the revised manuscript we will expand the APDS Cohort Application section to describe the matching variables, criteria, and algorithm in detail. We will also add sensitivity analyses using unmatched controls and at least one alternative matching scheme, reporting the resulting performance and phenotype validation outcomes to assess robustness against potential confounding. revision: yes
Circularity Check
No circularity: model definition and empirical results are independent
full rationale
The paper defines Shapley regression by replacing the linear term in logistic regression with a k-additive cooperative game (standard axioms plus convexity), then reports empirical performance on public datasets and a 222-patient cohort. No equation reduces a reported prediction or validation result to a fitted parameter by construction, no self-citation chain is load-bearing for the central claims, and the expert-validated phenotype interactions are external to the model's equations. The matching-procedure concern is a potential validity issue, not a circularity reduction.
Axiom & Free-Parameter Ledger
free parameters (2)
- additivity order k =
2
- l2 regularization coefficient
axioms (2)
- domain assumption Symptom interactions are adequately captured by coalitions of size at most k.
- standard math The resulting game value function remains convex and therefore compatible with logistic regression optimization.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose Shapley regression, a novel game-theoretic model replacing the linear predictor with a k-additive cooperative game... 2-additive model with ℓ2 regularization achieves an optimal trade-off between predictive power and noise robustness.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery and orbit embedding unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The learnable coefficients I(A) are exactly the Shapley Interaction Indices... ϕ_A(x) = sum ... min_i∈C x_i
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Clinical. PLOS ONE , author =. 2017 , pages =. doi:10.1371/journal.pone.0170365 , language =
-
[2]
Vincent, Marc and Douillet, Maxime and Lerner, Ivan and Neuraz, Antoine and others , editor =. Using. Studies in. 2022 , doi =
work page 2022
-
[3]
Small. JAMA Network Open , author =. 2020 , pages =. doi:10.1001/jamanetworkopen.2020.1965 , language =
-
[4]
Enhanced rare disease mapping for phenome-wide genetic association in the. Genome Medicine , author =. 2022 , pages =. doi:10.1186/s13073-022-01094-y , abstract =
-
[5]
Machine. New England Journal of Medicine , author =. 2019 , pages =. doi:10.1056/NEJMra1814259 , language =
-
[6]
and Verghese, Abraham , year =
Topol, Eric J. and Verghese, Abraham , year =. Deep medicine: how artificial intelligence can make healthcare human again , isbn =
-
[7]
Orphanet Journal of Rare Diseases , author =
Can a decision support system accelerate rare disease diagnosis?. Orphanet Journal of Rare Diseases , author =. 2019 , pages =. doi:10.1186/s13023-019-1040-6 , language =
-
[8]
The fifty shades of black: about black box. Medical Law Review , author =. 2025 , pages =. doi:10.1093/medlaw/fwaf005 , abstract =
-
[9]
Current Medicinal Chemistry , author =
Global. Current Medicinal Chemistry , author =. doi:10.2174/0929867324666170511111803 , language =
-
[10]
Computer-assisted initial diagnosis of rare diseases , volume =. PeerJ , author =. 2016 , pages =. doi:10.7717/peerj.2211 , language =
-
[11]
Al-Worafi, Yaser Mohammed , editor =. Patient. Handbook of. 2024 , doi =
work page 2024
-
[12]
Orphanet Journal of Rare Diseases , author =
Time to diagnosis for a rare disease: managing medical uncertainty. Orphanet Journal of Rare Diseases , author =. 2024 , pages =. doi:10.1186/s13023-024-03319-2 , abstract =
-
[13]
Journal of Biomedical Informatics , author =
A clinician friendly data warehouse oriented toward narrative reports:. Journal of Biomedical Informatics , author =. 2018 , pages =. doi:10.1016/j.jbi.2018.02.019 , language =
-
[14]
Journal of Allergy and Clinical Immunology , author =
Evaluating large language model performance to support the diagnosis and management of patients with primary immune disorders , volume =. Journal of Allergy and Clinical Immunology , author =. 2025 , pages =. doi:10.1016/j.jaci.2025.02.004 , language =
- [15]
-
[16]
Deep Learning , author=
-
[17]
Chen, Tianqi and Guestrin, Carlos , title =. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , year =
-
[18]
Random Forests , author=. Machine Learning , volume=. 2001 , publisher=
work page 2001
-
[19]
Journal of Medical Internet Research , author =
Beyond. Journal of Medical Internet Research , author =. 2025 , pages =. doi:10.2196/77721 , language =
-
[20]
Orphanet Journal of Rare Diseases , author =
The use of machine learning in rare diseases: a scoping review , volume =. Orphanet Journal of Rare Diseases , author =. 2020 , pages =. doi:10.1186/s13023-020-01424-6 , language =
-
[21]
Orphanet Journal of Rare Diseases , author =
Diagnosis support systems for rare diseases: a scoping review , volume =. Orphanet Journal of Rare Diseases , author =. 2020 , pages =. doi:10.1186/s13023-020-01374-z , language =
-
[22]
Orphanet Journal of Rare Diseases , author =
Performance and clinical utility of a new supervised machine-learning pipeline in detecting rare ciliopathy patients based on deep phenotyping from electronic health records and semantic similarity , volume =. Orphanet Journal of Rare Diseases , author =. 2024 , pages =. doi:10.1186/s13023-024-03063-7 , language =
-
[23]
Clinical Immunology , author =
Who's your data?. Clinical Immunology , author =. 2023 , pages =. doi:10.1016/j.clim.2023.109759 , language =
-
[24]
Tavakol, Marzieh and Jamee, Mahnaz and Azizi, Gholamreza and others , journal =. Diagnostic. 2020 , doi =
work page 2020
-
[25]
Orphanet: an online rare disease and orphan drug database , author =. 1999 , url =
work page 1999
-
[26]
Angulo, Ivan and Vadas, Oscar and Garçon, Fabien and others , journal =. Phosphoinositide 3-. 2013 , doi =
work page 2013
-
[27]
Activated phosphoinositide 3-kinase syndrome:
Maccari, Maria Elena and Wolkewitz, Martin and Schwab, Charlotte and others , journal =. Activated phosphoinositide 3-kinase syndrome:. 2023 , doi =
work page 2023
-
[28]
Prevalence of primary immunodeficiencies in
Mahlaoui, Nizar and Jais, Jean-Philippe and Brosselin, Pauline and others , journal =. Prevalence of primary immunodeficiencies in. 2017 , doi =
work page 2017
-
[29]
An updated review on activated
Singh, Ankita and Joshi, Vibhu and Jindal, Ankur Kumar and others , journal =. An updated review on activated. 2020 , doi =
work page 2020
-
[30]
Activated phosphoinositde 3-kinase (
Lougaris, Vassilios and Piane, Federico Le and Cancrini, Caterina and others , journal =. Activated phosphoinositde 3-kinase (. 2024 , doi =
work page 2024
- [31]
- [32]
-
[33]
Activated phosphoinositide 3‐kinase delta syndrome:
Zhu, Ke and Li, Qifan and Han, Lingli and others , journal =. Activated phosphoinositide 3‐kinase delta syndrome:. 2024 , doi =
work page 2024
-
[34]
Game theoretic Extensions of Logistic Regression , author =. Theory Decis. , issn =
-
[35]
Modeling Decisions: Information Fusion and Aggregation Operators , author =. 2007 , publisher =
work page 2007
- [36]
-
[37]
New Perspectives in Multiple Criteria Decision Making: Innovative Applications and Case Studies , editor =. 2019 , publisher =
work page 2019
- [38]
-
[39]
A k -additive Choquet integral-based approach to approximate the
Pelegrina, Guilherme Dean and Duarte, Leonardo Tomazeli and Grabisch, Michel , journal =. A k -additive Choquet integral-based approach to approximate the. 2022 , publisher =
work page 2022
-
[40]
Dantas, Leila F. and Peres, Igor T. and Bastos, Leonardo S. and others , journal =. App-based symptom tracking to optimize. 2021 , publisher =
work page 2021
-
[41]
Pattern Recognition Letters , volume =
Heuristics-based learning approach for choquistic regression models , author =. Pattern Recognition Letters , volume =. 2021 , publisher =
work page 2021
-
[42]
Choquistic Regression: Generalizing Logistic Regression using the
Tehrani, Afshin Fallah and Cheng, Weiwei and Hüllermeier, Eyke , booktitle =. Choquistic Regression: Generalizing Logistic Regression using the. 2011 , publisher =
work page 2011
-
[43]
arXiv preprint arXiv:2502.04763v1 , year =
Shapley Value Approximation Based on k -Additive Games , author =. arXiv preprint arXiv:2502.04763v1 , year =
work page internal anchor Pith review arXiv
-
[44]
Contributions to the Theory of Games , volume =
A Value for n -Person Games , author =. Contributions to the Theory of Games , volume =. 1953 , editor =
work page 1953
-
[45]
Weighted Voting Doesn't Work: A Mathematical Analysis , author =. Rutgers Law Review , volume =
-
[46]
Multilinear Extensions of Games , author =. Management Science , volume =
-
[47]
Learning monotone Choquet capacities for multi‑label classification , author =. Machine Learning , volume =. 2012 , doi =
work page 2012
-
[48]
Advances in Neural Information Processing Systems 30 (NIPS 2017) , pages =
A Unified Approach to Interpreting Model Predictions , author =. Advances in Neural Information Processing Systems 30 (NIPS 2017) , pages =. 2017 , url =
work page 2017
-
[49]
Proceedings of the 9th Fuzzy System Symposium , pages =
Techniques for reading fuzzy measures (III): interaction index , author =. Proceedings of the 9th Fuzzy System Symposium , pages =
-
[50]
Set Functions, Games and Capacities in Decision Making , author =. 2016 , doi =
work page 2016
-
[51]
Training with Noise is Equivalent to Tikhonov Regularization , author =. Neural Computation , volume =
- [52]
-
[53]
Fuzzy Sets and Systems , volume =
Approximation by Max--Min Operators: A General Theory and Its Applications , author =. Fuzzy Sets and Systems , volume =. 2020 , doi =
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.