Challenges in Explaining Pretrained Clinical Text Classifiers
Pith reviewed 2026-06-29 12:34 UTC · model grok-4.3
The pith
Token-level explanation methods like LIME and SHAP overemphasize non-informative tokens and produce unstable attributions on pretrained clinical text classifiers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Token-level and perturbation-based methods exhibit core limitations on pretrained clinical text classifiers, specifically overemphasis on non-informative tokens, instability in attributions, and high-confidence predictions for incoherent input variants, as shown through demonstrations on length-of-stay prediction.
What carries the argument
Targeted demonstrations on a single hospital length-of-stay prediction task that expose flaws in LIME and SHAP when applied to clinical narratives.
If this is right
- Explanation methods for clinical NLP must become clinically meaningful and semantically grounded.
- Current token-level techniques require greater robustness to linguistic noise in medical texts.
- High-confidence predictions on incoherent inputs indicate a need to verify input coherence before trusting attributions.
- New strategies beyond standard perturbation-based tools are needed to support reliable use in medical settings.
Where Pith is reading between the lines
- The same attribution problems may appear in other clinical tasks such as diagnosis coding or discharge summarization.
- Clinicians interpreting these explanations risk focusing on non-causal features when deciding on patient care.
- Methods that incorporate domain knowledge or sentence-level structure could mitigate the observed instabilities.
Load-bearing premise
Limitations seen on one length-of-stay task are representative of the same methods across all pretrained clinical text classifiers.
What would settle it
Finding stable attributions that consistently highlight medically relevant tokens across multiple clinical classification tasks using LIME or SHAP would contradict the reported limitations.
Figures
read the original abstract
Explaining the predictions of neural models in clinical NLP remains a significant challenge, especially for complex tasks involving long, unstructured medical texts. While post-hoc methods like LIME and SHAP are widely used, they often fall short when applied to clinical narratives. In this paper, we identify core limitations of token-level and perturbation-based explanation techniques through targeted demonstra- tions on a hospital length-of-stay prediction task. Our findings reveal issues such as overemphasis on non-informative tokens, instability in at- tributions, and high-confidence predictions for incoherent input variants. These results underscore the need for explanation strategies that are clin- ically meaningful, semantically grounded, and robust to linguistic noise.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that token-level and perturbation-based post-hoc explanation methods (e.g., LIME and SHAP) exhibit core limitations when applied to pretrained clinical text classifiers. These limitations—overemphasis on non-informative tokens, attribution instability, and high-confidence predictions on incoherent input variants—are identified via targeted demonstrations on a single hospital length-of-stay prediction task, motivating the need for clinically meaningful, semantically grounded, and robust explanation strategies.
Significance. If the demonstrated issues prove generalizable beyond the chosen task and model, the work would usefully flag practical shortcomings in widely adopted explainability tools for clinical NLP. The absence of cross-task replication or quantitative controls, however, leaves the scope of the claimed 'core limitations' open to question.
major comments (2)
- [Abstract] Abstract and the demonstrations section: the claim that the observed issues constitute 'core limitations' of token-level and perturbation-based methods 'across pretrained clinical text classifiers' is load-bearing for the paper's central thesis, yet rests exclusively on targeted demonstrations for one length-of-stay task without cross-task replication, ablation on other clinical NLP benchmarks, or an explicit argument that length-of-stay is paradigmatic.
- [Abstract] Abstract: the description of the demonstrations supplies no quantitative metrics, baseline comparisons, model architecture details, dataset statistics, or statistical controls, preventing verification that the reported issues are systematic rather than anecdotal.
minor comments (1)
- The manuscript would benefit from an explicit limitations subsection that directly addresses the single-task scope and the conditions under which the findings might or might not generalize.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address the two major points below, providing our response and indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract and the demonstrations section: the claim that the observed issues constitute 'core limitations' of token-level and perturbation-based methods 'across pretrained clinical text classifiers' is load-bearing for the paper's central thesis, yet rests exclusively on targeted demonstrations for one length-of-stay task without cross-task replication, ablation on other clinical NLP benchmarks, or an explicit argument that length-of-stay is paradigmatic.
Authors: We agree that the empirical demonstrations are confined to a single task. Length-of-stay prediction was deliberately chosen as it involves long, unstructured clinical narratives that are characteristic of many clinical NLP problems; the failures we document (overemphasis on non-informative tokens, attribution instability, and confident predictions on incoherent inputs) stem directly from the local, token-level perturbation mechanics of LIME/SHAP rather than from task idiosyncrasies. We will revise the manuscript to (a) add an explicit paragraph arguing why this task is representative and (b) qualify the scope of the 'core limitations' claim to reflect the single-task evidence base, thereby avoiding overgeneralization. revision: partial
-
Referee: [Abstract] Abstract: the description of the demonstrations supplies no quantitative metrics, baseline comparisons, model architecture details, dataset statistics, or statistical controls, preventing verification that the reported issues are systematic rather than anecdotal.
Authors: Abstracts are concise summaries; the quantitative metrics, baseline comparisons, model details (pretrained clinical transformers), dataset statistics, and controls are all reported in the demonstrations section of the full manuscript. We can add a brief pointer in the abstract to the relevant section if that improves verifiability, but we do not believe the abstract itself must contain these details. revision: no
Circularity Check
No circularity: empirical demonstrations only
full rationale
The paper reports targeted empirical demonstrations on a single length-of-stay task to surface observed behaviors of token-level and perturbation-based explainers. No derivation chain, fitted parameters, equations, or self-citation load-bearing premises exist; the central claims are direct observations rather than reductions of outputs to inputs by construction. The analysis is therefore self-contained against external benchmarks with no circular steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Cancers15(6), 1853 (2023)
Benedum, C.M., Sondhi, A., Fidyk, E., Cohen, A.B., Nemeth, S., Adamson, B., Estévez, M., Bozkurt, S.: Replication of real-world evidence in oncology using elec- tronic health record data extracted by machine learning. Cancers15(6), 1853 (2023)
2023
-
[2]
Packt Publishing Ltd (2022)
Bhattacharya, A.: Applied Machine Learning Explainability Techniques: Make ML models explainable and trustworthy for practical applications using LIME, SHAP, and more. Packt Publishing Ltd (2022)
2022
-
[3]
In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., Elhadad, N.: Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 1721–1730. ACM (2015)
2015
-
[4]
arXiv preprint arXiv:2204.06683 (2022)
Dai, X., Chalkidis, I., Darkner, S., Elliott, D.: Revisiting transformer-based models for long document classification. arXiv preprint arXiv:2204.06683 (2022)
-
[5]
Journal of Biomedical Informatics126, 103980 (2022)
Holzinger, A., Müller, H., Biesinger, B., Pattichis, C., Kell, D.B.: Towards the augmented pathologist: Knowledge-driven explainable ai. Journal of Biomedical Informatics126, 103980 (2022)
2022
-
[6]
In: Proceedings of the 31st International Conference on Neural Information Processing Systems
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. p. 4768–4777 (2017)
2017
-
[7]
JMIR medical informatics9(8), e30470 (2021)
Palojoki, S., Saranto, K., Reponen, E., Skants, N., Vakkuri, A., Vuokko, R., et al.: Classification of electronic health record–related patient safety incidents: Develop- ment and validation study. JMIR medical informatics9(8), e30470 (2021)
2021
-
[8]
Ribeiro, M.T., Singh, S., Guestrin, C.: "Why should I trust you?": Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD Interna- tional Conference on Knowledge Discovery and Data Mining. p. 1135–1144 (2016). https://doi.org/10.1145/2939672.2939778
-
[9]
Wiley Interdisciplinary Reviews: Computational Statistics 13(6), e1549 (2021)
Tayefi, M., Ngo, P., Chomutare, T., Dalianis, H., Salvi, E., Budrionis, A., Godtlieb- sen, F.: Challenges and opportunities beyond structured data in analysis of elec- tronic health records. Wiley Interdisciplinary Reviews: Computational Statistics 13(6), e1549 (2021)
2021
-
[10]
In: Proceedings of the 4th Machine Learning for Healthcare Conference
Tonekaboni,S.,Joshi,S.,McCradden,M.D.,Goldenberg,A.:Whatclinicianswant: Contextualizing explainable machine learning for clinical end use. In: Proceedings of the 4th Machine Learning for Healthcare Conference. MLHC, vol. 106, pp. 359–
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.