Recognition: no theorem link
Social Determinants of Health and Fentanyl Overdose Mortality Across US Counties: An XGBoost and SHAP Analysis Identifying Silent Risk Counties and Treatment Deserts
Pith reviewed 2026-05-12 01:00 UTC · model grok-4.3
The pith
County-level social factors predict fentanyl overdose deaths and flag hidden high-risk areas.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
An XGBoost model using county social vulnerability, health behavior, and resource data predicts overdose standardized mortality ratios with Spearman correlation 0.67 and identifies treatment deserts with markedly elevated rates plus 143 silent risk counties via K-means clustering.
What carries the argument
XGBoost model with SHAP attribution on social determinants of health, followed by K-means clustering to label county categories such as treatment deserts and silent risks.
Load-bearing premise
The 975 counties with non-suppressed overdose data represent patterns across all US counties despite many rural and treatment-desert counties having suppressed records.
What would settle it
Re-running the model after incorporating data from previously suppressed counties would show whether the top predictors and the 52.6 percent mortality gap between treatment deserts and other counties remain stable.
read the original abstract
Background: Fentanyl overdose deaths are still increasing across the U.S. We do not fully understand which county-level social and structural conditions lead to higher overdose death rates. Social determinants of health, including disability, treatment access, and behavioral health issues, may help identify vulnerable counties before deaths become severe. No earlier study has used explainable machine learning with SHAP attribution on 2022 CDC WONDER data to study treatment access gaps and silent risk counties. Methods: We combined data from four government sources for 975 U.S. counties, including CDC WONDER (2022) overdose mortality data, CDC Social Vulnerability Index (SVI), CDC PLACES health behavior data, and Area Health Resources Files. An XGBoost model was used to predict overdose mortality risk using Standardized Mortality Ratio (SMR). Five-fold cross-validation was used to test model accuracy, and SHAP values were used to show which factors increase or decrease risk. Results: XGBoost outperformed all tested models (Spearman rho=0.67, R2=0.457, MAE=0.409, high-risk recall=71.1%). Top predictors were disability rate, hypertension, smoking, and lack of vehicle access. Treatment desert counties had 52.6% higher overdose mortality (SMR 1.786 vs 1.170; p<0.0001). K-means identified 143 silent risk counties. Overdose deaths were spatially clustered (Moran's I=0.505, p=0.001) with 75 hotspots and 136 coldspots. Suppressed counties were 58.2% of WONDER counties, mostly rural (72%) and treatment deserts (65%). Conclusions: County-level SDOH factors predict overdose deaths, especially disability, treatment access, and behavioral health burden. MOUD expansion should prioritize treatment desert counties, and silent risk counties need early intervention before mortality worsens.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript applies XGBoost with SHAP explainability to county-level SDOH data from 975 US counties (CDC WONDER 2022 overdose SMR, SVI, PLACES, AHRF) to predict fentanyl overdose mortality. It reports Spearman rho=0.67, R2=0.457, top SHAP predictors (disability rate, hypertension, smoking, vehicle access), 52.6% higher SMR in treatment desert counties (1.786 vs 1.170, p<0.0001), K-means identification of 143 silent risk counties, and spatial clustering (Moran's I=0.505). The study notes 58.2% WONDER suppression (mostly rural/treatment deserts) and recommends prioritizing MOUD expansion and early intervention in high-risk areas.
Significance. If the reported associations prove robust, the work offers a practical, interpretable ML framework for identifying county-level overdose risk using public data, with direct relevance to targeting interventions. Strengths include 5-fold CV, SHAP rankings, and explicit acknowledgment of suppression rates. The central predictive claim remains plausible but requires stronger evidence against selection bias to support policy recommendations.
major comments (3)
- [Methods] Methods (data inclusion): Analysis is restricted to 975 counties with non-suppressed WONDER data, yet 58.2% of counties are suppressed (72% rural, 65% treatment deserts per Results). No imputation, sensitivity analysis, or explicit handling of this exclusion is described, so the XGBoost fit (rho=0.67) and SHAP attributions are trained on a non-representative urban/non-desert subsample. This selection bias directly undermines generalizability of the SDOH prediction claim and the treatment-desert SMR comparison.
- [Results] Results (SMR comparison): The reported 52.6% higher SMR in treatment desert counties (1.786 vs 1.170, p<0.0001) and the K-means silent-risk clusters are derived entirely from the 975-county subset. Because suppression disproportionately removes treatment deserts, the observed elevation and cluster assignments cannot be assumed to hold for the full set of US counties without a robustness check or alternative data source.
- [Results] Results (model details): Hyperparameter tuning for XGBoost, exact feature preprocessing, and the operational definition of 'treatment desert' and 'silent risk' counties are not specified. SHAP attributions are presented as identifying risk factors for intervention, but the manuscript should explicitly state that these are correlational (not causal) and note the high-risk recall threshold used for the 71.1% figure.
minor comments (3)
- [Abstract] Abstract: Define 'treatment desert counties' and 'silent risk counties' operationally, as these terms are central to the conclusions but undefined here.
- [Methods] Methods: Add citations with exact years and access dates for all four government data sources.
- [Results] Results: Report the exact risk threshold or percentile used to compute high-risk recall (71.1%).
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. The comments highlight important issues around data selection, generalizability, and methodological transparency. We address each major comment below and will revise the manuscript to incorporate clarifications and additional analyses where feasible.
read point-by-point responses
-
Referee: [Methods] Methods (data inclusion): Analysis is restricted to 975 counties with non-suppressed WONDER data, yet 58.2% of counties are suppressed (72% rural, 65% treatment deserts per Results). No imputation, sensitivity analysis, or explicit handling of this exclusion is described, so the XGBoost fit (rho=0.67) and SHAP attributions are trained on a non-representative urban/non-desert subsample. This selection bias directly undermines generalizability of the SDOH prediction claim and the treatment-desert SMR comparison.
Authors: We agree this is a substantive limitation. The manuscript already reports the 58.2% suppression rate and notes that suppressed counties are disproportionately rural (72%) and treatment deserts (65%). In the revision, we will expand the Methods section to explicitly state the exclusion criteria and add a Limitations subsection that discusses the resulting selection bias and implications for generalizability. We will also add a sensitivity analysis comparing available SDOH characteristics between included and suppressed counties to quantify differences. These changes will improve transparency without claiming the model applies to suppressed counties. revision: yes
-
Referee: [Results] Results (SMR comparison): The reported 52.6% higher SMR in treatment desert counties (1.786 vs 1.170, p<0.0001) and the K-means silent-risk clusters are derived entirely from the 975-county subset. Because suppression disproportionately removes treatment deserts, the observed elevation and cluster assignments cannot be assumed to hold for the full set of US counties without a robustness check or alternative data source.
Authors: The SMR comparison and K-means clustering are performed only on the 975 counties with non-suppressed mortality data, as stated in the Results. We will revise the text to explicitly qualify these findings as applying to counties with available data and add a robustness discussion noting that suppression removes many treatment deserts. While we cannot impute suppressed SMR values without strong assumptions, the observed elevation within the analyzed sample supports the policy relevance for similar counties. We will acknowledge that extrapolation to the full U.S. would require alternative data sources and treat this as a limitation. revision: partial
-
Referee: [Results] Results (model details): Hyperparameter tuning for XGBoost, exact feature preprocessing, and the operational definition of 'treatment desert' and 'silent risk' counties are not specified. SHAP attributions are presented as identifying risk factors for intervention, but the manuscript should explicitly state that these are correlational (not causal) and note the high-risk recall threshold used for the 71.1% figure.
Authors: We will revise the Methods section to specify hyperparameter tuning details (including the grid search ranges and selected values), feature preprocessing steps (e.g., handling of missing values, scaling), and operational definitions: treatment deserts as counties lacking MOUD providers within a defined geographic threshold, and silent-risk counties as those flagged by K-means clustering on high predicted risk but low observed SMR. We will also add explicit language that SHAP values reflect associations rather than causal effects and state the probability threshold used to compute the 71.1% high-risk recall. These additions will be included in the revised manuscript. revision: yes
Circularity Check
No circularity: standard ML pipeline on external data with cross-validation
full rationale
The paper trains XGBoost on 975 counties drawn from independent government sources (CDC WONDER, SVI, PLACES, AHRF) to predict observed SMR, reports 5-fold CV performance (rho=0.67), SHAP attributions, direct SMR group comparisons, and K-means clustering. None of these steps reduce a reported result to its own inputs by definition, fitted-parameter renaming, or self-citation chain; the target SMR is an external count-based ratio, the model is evaluated out-of-fold, and all comparisons are computed on the held-in sample without tautological equivalence. The analysis is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- XGBoost hyperparameters
- K in K-means clustering
axioms (2)
- domain assumption CDC WONDER, SVI, PLACES, and AHRF data are accurate and complete for the analyzed counties
- domain assumption Standardized Mortality Ratio is a valid and unbiased measure of county-level overdose risk
invented entities (2)
-
silent risk counties
no independent evidence
-
treatment desert counties
no independent evidence
Reference graph
Works this paper leans on
-
[1]
NCHS Data Brief (457), 1–8 (2022) https://doi.org/10.15620/ cdc:122556
Spencer, M.R., Mini˜ no, A.M., Warner, M.: Drug overdose deaths in the United States, 2001–2021. NCHS Data Brief (457), 1–8 (2022) https://doi.org/10.15620/ cdc:122556
work page 2001
-
[2]
NCHS Data Brief (426) (2021) https://doi.org/ 10.15620/cdc:112340
Hedegaard, H., Mini˜ no, A.M., Spencer, M.R., Warner, M.: Drug overdose deaths in the United States, 1999–2020. NCHS Data Brief (426) (2021) https://doi.org/ 10.15620/cdc:112340
-
[3]
Centers for Disease Control and Prevention: Understanding the opioid overdose epidemic. Overdose Prevention. Accessed: April 24, 2026 (2025). https://www.cdc.gov/overdose-prevention/about/ understanding-the-opioid-overdose-epidemic.html
work page 2026
-
[4]
Ciccarone, D.: The rise of illicit fentanyls, stimulants and the fourth wave of the opioid overdose crisis. Curr. Opin. Psychiatry34(4), 344–350 (2021) https: //doi.org/10.1097/YCO.0000000000000717
-
[5]
Monnat, S.M.: Factors associated with county-level differences in U.S. drug- related mortality rates. Am. J. Prev. Med.54(5), 611–619 (2018) https://doi. org/10.1016/j.amepre.2018.01.040 19
-
[6]
BMC Public Health22(1), 236 (2022) https://doi.org/10.1186/ s12889-022-12653-8
Rangachari, P., Govindarajan, A., Mehta, R., Seehusen, D., Rethemeyer, R.K.: The relationship between Social Determinants of Health (SDoH) and death from cardiovascular disease or opioid use in counties across the United States (2009-2018). BMC Public Health22(1), 236 (2022) https://doi.org/10.1186/ s12889-022-12653-8
work page 2009
-
[7]
PLoS One19(5), 0304256 (2024) https://doi.org/10.1371/journal.pone.0304256
Lindenfeld, Z., Silver, D., Pag´ an, J.A., Zhang, D.S., Chang, J.E.: Examining the relationship between social determinants of health, measures of structural racism and county-level overdose deaths from 2017-2020. PLoS One19(5), 0304256 (2024) https://doi.org/10.1371/journal.pone.0304256
-
[8]
Kariisa, M., Davis, N.L., Kumar, S., Seth, P., Mattson, C.L., Chowdhury, F., Jones, C.M.: Vital signs: Drug overdose deaths, by selected sociodemographic and social determinants of health characteristics - 25 states and the district of columbia, 2019-2020. MMWR Morb. Mortal. Wkly. Rep.71(29), 940–947 (2022) https://doi.org/10.15585/mmwr.mm7129e2
-
[9]
Haffajee, R.L., Lin, L.A., Bohnert, A.S.B., Goldstick, J.E.: Characteristics of US counties with high opioid overdose mortality and low capacity to deliver medications for opioid use disorder. JAMA Netw. Open2(6), 196373 (2019) https://doi.org/10.1001/jamanetworkopen.2019.6373
-
[10]
Lancet Public Health6, 720–728 (2021) https://doi.org/10.1016/S2468-2667(21)00080-3
Marks, C., Abramovitz, D., Donnelly, C.,et al.: Identifying counties at risk of high overdose mortality burden during the emerging fentanyl epidemic in the USA: a predictive statistical modelling study. Lancet Public Health6, 720–728 (2021) https://doi.org/10.1016/S2468-2667(21)00080-3
-
[11]
Kumar, V., Butler, R.: Opioid overdose death prediction using machine learning and risk factor analysis using SHAP values for US counties. Int. J. Ment. Heal. Addict., 1–18 (2025) https://doi.org/10.1007/s11469-025-01563-6
-
[12]
Chen, T., Guestrin, C.: XGBoost: A scalable tree boosting system, 785–794 (2016) https://doi.org/10.1145/2939672.2939785
-
[13]
Journal of Urban Health102(3), 627–639 (2025) https://doi.org/10.1007/s11524-025-00986-9
Kang, H., Janakos, K., Varga, C.: Spatiotemporal analysis of fentanyl-associated overdose deaths in chicago, il, usa. Journal of Urban Health102(3), 627–639 (2025) https://doi.org/10.1007/s11524-025-00986-9
-
[14]
Drug and Alcohol Dependence208, 107779 (2020) https: //doi.org/10.1016/j.drugalcdep.2019.107779
Rosenblum, D., Unick, J., Ciccarone, D.: The rapidly changing us illicit drug market and the potential for an improved early warning system: Evidence from ohio drug crime labs. Drug and Alcohol Dependence208, 107779 (2020) https: //doi.org/10.1016/j.drugalcdep.2019.107779
-
[15]
JAMA Network Open2(2), 190040 (2019) https://doi
Kiang, M.V., Basu, S., Chen, J., Alexander, M.J.: Assessment of changes in the geographical distribution of opioid-related mortality across the united states by opioid type, 1999-2016. JAMA Network Open2(2), 190040 (2019) https://doi. org/10.1001/jamanetworkopen.2019.0040 20
-
[16]
PLOS Global Public Health3(3), 0000769 (2023) https://doi.org/10.1371/ journal.pgph.0000769 21
D’Orsogna, M.R., B¨ ottcher, L., Chou, T.: Fentanyl-driven acceleration of racial, gender and geographical disparities in drug overdose deaths in the united states. PLOS Global Public Health3(3), 0000769 (2023) https://doi.org/10.1371/ journal.pgph.0000769 21
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.