pith. machine review for the scientific record. sign in

arxiv: 2605.08230 · v1 · submitted 2026-05-06 · 💻 cs.LG · stat.AP

Recognition: no theorem link

Social Determinants of Health and Fentanyl Overdose Mortality Across US Counties: An XGBoost and SHAP Analysis Identifying Silent Risk Counties and Treatment Deserts

Kabi Raj Tiruwa (Clark University) , Abhisan Ghimire (Clark University) , Anuj Kumar Shah (Yeshiva University)

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:00 UTC · model grok-4.3

classification 💻 cs.LG stat.AP
keywords fentanyl overdose mortalitysocial determinants of healthXGBoostSHAP analysistreatment desertssilent risk countiesUS countiesmachine learning prediction
0
0 comments X

The pith

County-level social factors predict fentanyl overdose deaths and flag hidden high-risk areas.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper trains an XGBoost model on data from 975 US counties to show that social determinants of health forecast fentanyl overdose mortality rates measured as standardized mortality ratios. Disability rate, hypertension, smoking, and lack of vehicle access emerge as the strongest predictors, while treatment desert counties record 52.6 percent higher death rates than other counties. The analysis also uses clustering to isolate 143 silent risk counties that show vulnerability factors but not yet elevated mortality, and it documents spatial clustering of deaths.

Core claim

An XGBoost model using county social vulnerability, health behavior, and resource data predicts overdose standardized mortality ratios with Spearman correlation 0.67 and identifies treatment deserts with markedly elevated rates plus 143 silent risk counties via K-means clustering.

What carries the argument

XGBoost model with SHAP attribution on social determinants of health, followed by K-means clustering to label county categories such as treatment deserts and silent risks.

Load-bearing premise

The 975 counties with non-suppressed overdose data represent patterns across all US counties despite many rural and treatment-desert counties having suppressed records.

What would settle it

Re-running the model after incorporating data from previously suppressed counties would show whether the top predictors and the 52.6 percent mortality gap between treatment deserts and other counties remain stable.

read the original abstract

Background: Fentanyl overdose deaths are still increasing across the U.S. We do not fully understand which county-level social and structural conditions lead to higher overdose death rates. Social determinants of health, including disability, treatment access, and behavioral health issues, may help identify vulnerable counties before deaths become severe. No earlier study has used explainable machine learning with SHAP attribution on 2022 CDC WONDER data to study treatment access gaps and silent risk counties. Methods: We combined data from four government sources for 975 U.S. counties, including CDC WONDER (2022) overdose mortality data, CDC Social Vulnerability Index (SVI), CDC PLACES health behavior data, and Area Health Resources Files. An XGBoost model was used to predict overdose mortality risk using Standardized Mortality Ratio (SMR). Five-fold cross-validation was used to test model accuracy, and SHAP values were used to show which factors increase or decrease risk. Results: XGBoost outperformed all tested models (Spearman rho=0.67, R2=0.457, MAE=0.409, high-risk recall=71.1%). Top predictors were disability rate, hypertension, smoking, and lack of vehicle access. Treatment desert counties had 52.6% higher overdose mortality (SMR 1.786 vs 1.170; p<0.0001). K-means identified 143 silent risk counties. Overdose deaths were spatially clustered (Moran's I=0.505, p=0.001) with 75 hotspots and 136 coldspots. Suppressed counties were 58.2% of WONDER counties, mostly rural (72%) and treatment deserts (65%). Conclusions: County-level SDOH factors predict overdose deaths, especially disability, treatment access, and behavioral health burden. MOUD expansion should prioritize treatment desert counties, and silent risk counties need early intervention before mortality worsens.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript applies XGBoost with SHAP explainability to county-level SDOH data from 975 US counties (CDC WONDER 2022 overdose SMR, SVI, PLACES, AHRF) to predict fentanyl overdose mortality. It reports Spearman rho=0.67, R2=0.457, top SHAP predictors (disability rate, hypertension, smoking, vehicle access), 52.6% higher SMR in treatment desert counties (1.786 vs 1.170, p<0.0001), K-means identification of 143 silent risk counties, and spatial clustering (Moran's I=0.505). The study notes 58.2% WONDER suppression (mostly rural/treatment deserts) and recommends prioritizing MOUD expansion and early intervention in high-risk areas.

Significance. If the reported associations prove robust, the work offers a practical, interpretable ML framework for identifying county-level overdose risk using public data, with direct relevance to targeting interventions. Strengths include 5-fold CV, SHAP rankings, and explicit acknowledgment of suppression rates. The central predictive claim remains plausible but requires stronger evidence against selection bias to support policy recommendations.

major comments (3)
  1. [Methods] Methods (data inclusion): Analysis is restricted to 975 counties with non-suppressed WONDER data, yet 58.2% of counties are suppressed (72% rural, 65% treatment deserts per Results). No imputation, sensitivity analysis, or explicit handling of this exclusion is described, so the XGBoost fit (rho=0.67) and SHAP attributions are trained on a non-representative urban/non-desert subsample. This selection bias directly undermines generalizability of the SDOH prediction claim and the treatment-desert SMR comparison.
  2. [Results] Results (SMR comparison): The reported 52.6% higher SMR in treatment desert counties (1.786 vs 1.170, p<0.0001) and the K-means silent-risk clusters are derived entirely from the 975-county subset. Because suppression disproportionately removes treatment deserts, the observed elevation and cluster assignments cannot be assumed to hold for the full set of US counties without a robustness check or alternative data source.
  3. [Results] Results (model details): Hyperparameter tuning for XGBoost, exact feature preprocessing, and the operational definition of 'treatment desert' and 'silent risk' counties are not specified. SHAP attributions are presented as identifying risk factors for intervention, but the manuscript should explicitly state that these are correlational (not causal) and note the high-risk recall threshold used for the 71.1% figure.
minor comments (3)
  1. [Abstract] Abstract: Define 'treatment desert counties' and 'silent risk counties' operationally, as these terms are central to the conclusions but undefined here.
  2. [Methods] Methods: Add citations with exact years and access dates for all four government data sources.
  3. [Results] Results: Report the exact risk threshold or percentile used to compute high-risk recall (71.1%).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments highlight important issues around data selection, generalizability, and methodological transparency. We address each major comment below and will revise the manuscript to incorporate clarifications and additional analyses where feasible.

read point-by-point responses
  1. Referee: [Methods] Methods (data inclusion): Analysis is restricted to 975 counties with non-suppressed WONDER data, yet 58.2% of counties are suppressed (72% rural, 65% treatment deserts per Results). No imputation, sensitivity analysis, or explicit handling of this exclusion is described, so the XGBoost fit (rho=0.67) and SHAP attributions are trained on a non-representative urban/non-desert subsample. This selection bias directly undermines generalizability of the SDOH prediction claim and the treatment-desert SMR comparison.

    Authors: We agree this is a substantive limitation. The manuscript already reports the 58.2% suppression rate and notes that suppressed counties are disproportionately rural (72%) and treatment deserts (65%). In the revision, we will expand the Methods section to explicitly state the exclusion criteria and add a Limitations subsection that discusses the resulting selection bias and implications for generalizability. We will also add a sensitivity analysis comparing available SDOH characteristics between included and suppressed counties to quantify differences. These changes will improve transparency without claiming the model applies to suppressed counties. revision: yes

  2. Referee: [Results] Results (SMR comparison): The reported 52.6% higher SMR in treatment desert counties (1.786 vs 1.170, p<0.0001) and the K-means silent-risk clusters are derived entirely from the 975-county subset. Because suppression disproportionately removes treatment deserts, the observed elevation and cluster assignments cannot be assumed to hold for the full set of US counties without a robustness check or alternative data source.

    Authors: The SMR comparison and K-means clustering are performed only on the 975 counties with non-suppressed mortality data, as stated in the Results. We will revise the text to explicitly qualify these findings as applying to counties with available data and add a robustness discussion noting that suppression removes many treatment deserts. While we cannot impute suppressed SMR values without strong assumptions, the observed elevation within the analyzed sample supports the policy relevance for similar counties. We will acknowledge that extrapolation to the full U.S. would require alternative data sources and treat this as a limitation. revision: partial

  3. Referee: [Results] Results (model details): Hyperparameter tuning for XGBoost, exact feature preprocessing, and the operational definition of 'treatment desert' and 'silent risk' counties are not specified. SHAP attributions are presented as identifying risk factors for intervention, but the manuscript should explicitly state that these are correlational (not causal) and note the high-risk recall threshold used for the 71.1% figure.

    Authors: We will revise the Methods section to specify hyperparameter tuning details (including the grid search ranges and selected values), feature preprocessing steps (e.g., handling of missing values, scaling), and operational definitions: treatment deserts as counties lacking MOUD providers within a defined geographic threshold, and silent-risk counties as those flagged by K-means clustering on high predicted risk but low observed SMR. We will also add explicit language that SHAP values reflect associations rather than causal effects and state the probability threshold used to compute the 71.1% high-risk recall. These additions will be included in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: standard ML pipeline on external data with cross-validation

full rationale

The paper trains XGBoost on 975 counties drawn from independent government sources (CDC WONDER, SVI, PLACES, AHRF) to predict observed SMR, reports 5-fold CV performance (rho=0.67), SHAP attributions, direct SMR group comparisons, and K-means clustering. None of these steps reduce a reported result to its own inputs by definition, fitted-parameter renaming, or self-citation chain; the target SMR is an external count-based ratio, the model is evaluated out-of-fold, and all comparisons are computed on the held-in sample without tautological equivalence. The analysis is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

The analysis rests on standard public-health data assumptions and operational definitions for risk categories; no new physical entities are postulated, but several fitted modeling choices and data-selection rules are implicit.

free parameters (2)
  • XGBoost hyperparameters
    Model training involves numerous fitted parameters whose values are not reported.
  • K in K-means clustering
    Number of clusters used to define silent risk counties is unspecified.
axioms (2)
  • domain assumption CDC WONDER, SVI, PLACES, and AHRF data are accurate and complete for the analyzed counties
    All inputs and the target SMR derive from these sources without independent validation.
  • domain assumption Standardized Mortality Ratio is a valid and unbiased measure of county-level overdose risk
    SMR is used as the prediction target throughout.
invented entities (2)
  • silent risk counties no independent evidence
    purpose: Counties with elevated predicted risk but currently low observed mortality
    Defined post-hoc via K-means on model outputs and observed SMR.
  • treatment desert counties no independent evidence
    purpose: Counties lacking adequate MOUD access
    Defined from Area Health Resources Files treatment access variables.

pith-pipeline@v0.9.0 · 5695 in / 1753 out tokens · 62527 ms · 2026-05-12T01:00:26.499352+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    NCHS Data Brief (457), 1–8 (2022) https://doi.org/10.15620/ cdc:122556

    Spencer, M.R., Mini˜ no, A.M., Warner, M.: Drug overdose deaths in the United States, 2001–2021. NCHS Data Brief (457), 1–8 (2022) https://doi.org/10.15620/ cdc:122556

  2. [2]

    NCHS Data Brief (426) (2021) https://doi.org/ 10.15620/cdc:112340

    Hedegaard, H., Mini˜ no, A.M., Spencer, M.R., Warner, M.: Drug overdose deaths in the United States, 1999–2020. NCHS Data Brief (426) (2021) https://doi.org/ 10.15620/cdc:112340

  3. [3]

    Overdose Prevention

    Centers for Disease Control and Prevention: Understanding the opioid overdose epidemic. Overdose Prevention. Accessed: April 24, 2026 (2025). https://www.cdc.gov/overdose-prevention/about/ understanding-the-opioid-overdose-epidemic.html

  4. [4]

    Ciccarone, D.: The rise of illicit fentanyls, stimulants and the fourth wave of the opioid overdose crisis. Curr. Opin. Psychiatry34(4), 344–350 (2021) https: //doi.org/10.1097/YCO.0000000000000717

  5. [5]

    drug- related mortality rates

    Monnat, S.M.: Factors associated with county-level differences in U.S. drug- related mortality rates. Am. J. Prev. Med.54(5), 611–619 (2018) https://doi. org/10.1016/j.amepre.2018.01.040 19

  6. [6]

    BMC Public Health22(1), 236 (2022) https://doi.org/10.1186/ s12889-022-12653-8

    Rangachari, P., Govindarajan, A., Mehta, R., Seehusen, D., Rethemeyer, R.K.: The relationship between Social Determinants of Health (SDoH) and death from cardiovascular disease or opioid use in counties across the United States (2009-2018). BMC Public Health22(1), 236 (2022) https://doi.org/10.1186/ s12889-022-12653-8

  7. [7]

    PLoS One19(5), 0304256 (2024) https://doi.org/10.1371/journal.pone.0304256

    Lindenfeld, Z., Silver, D., Pag´ an, J.A., Zhang, D.S., Chang, J.E.: Examining the relationship between social determinants of health, measures of structural racism and county-level overdose deaths from 2017-2020. PLoS One19(5), 0304256 (2024) https://doi.org/10.1371/journal.pone.0304256

  8. [8]

    MMWR Morb

    Kariisa, M., Davis, N.L., Kumar, S., Seth, P., Mattson, C.L., Chowdhury, F., Jones, C.M.: Vital signs: Drug overdose deaths, by selected sociodemographic and social determinants of health characteristics - 25 states and the district of columbia, 2019-2020. MMWR Morb. Mortal. Wkly. Rep.71(29), 940–947 (2022) https://doi.org/10.15585/mmwr.mm7129e2

  9. [9]

    JAMA Netw

    Haffajee, R.L., Lin, L.A., Bohnert, A.S.B., Goldstick, J.E.: Characteristics of US counties with high opioid overdose mortality and low capacity to deliver medications for opioid use disorder. JAMA Netw. Open2(6), 196373 (2019) https://doi.org/10.1001/jamanetworkopen.2019.6373

  10. [10]

    Lancet Public Health6, 720–728 (2021) https://doi.org/10.1016/S2468-2667(21)00080-3

    Marks, C., Abramovitz, D., Donnelly, C.,et al.: Identifying counties at risk of high overdose mortality burden during the emerging fentanyl epidemic in the USA: a predictive statistical modelling study. Lancet Public Health6, 720–728 (2021) https://doi.org/10.1016/S2468-2667(21)00080-3

  11. [11]

    Kumar, V., Butler, R.: Opioid overdose death prediction using machine learning and risk factor analysis using SHAP values for US counties. Int. J. Ment. Heal. Addict., 1–18 (2025) https://doi.org/10.1007/s11469-025-01563-6

  12. [12]

    Chen, T., Guestrin, C.: XGBoost: A scalable tree boosting system, 785–794 (2016) https://doi.org/10.1145/2939672.2939785

  13. [13]

    Journal of Urban Health102(3), 627–639 (2025) https://doi.org/10.1007/s11524-025-00986-9

    Kang, H., Janakos, K., Varga, C.: Spatiotemporal analysis of fentanyl-associated overdose deaths in chicago, il, usa. Journal of Urban Health102(3), 627–639 (2025) https://doi.org/10.1007/s11524-025-00986-9

  14. [14]

    Drug and Alcohol Dependence208, 107779 (2020) https: //doi.org/10.1016/j.drugalcdep.2019.107779

    Rosenblum, D., Unick, J., Ciccarone, D.: The rapidly changing us illicit drug market and the potential for an improved early warning system: Evidence from ohio drug crime labs. Drug and Alcohol Dependence208, 107779 (2020) https: //doi.org/10.1016/j.drugalcdep.2019.107779

  15. [15]

    JAMA Network Open2(2), 190040 (2019) https://doi

    Kiang, M.V., Basu, S., Chen, J., Alexander, M.J.: Assessment of changes in the geographical distribution of opioid-related mortality across the united states by opioid type, 1999-2016. JAMA Network Open2(2), 190040 (2019) https://doi. org/10.1001/jamanetworkopen.2019.0040 20

  16. [16]

    PLOS Global Public Health3(3), 0000769 (2023) https://doi.org/10.1371/ journal.pgph.0000769 21

    D’Orsogna, M.R., B¨ ottcher, L., Chou, T.: Fentanyl-driven acceleration of racial, gender and geographical disparities in drug overdose deaths in the united states. PLOS Global Public Health3(3), 0000769 (2023) https://doi.org/10.1371/ journal.pgph.0000769 21