Challenges in Explaining Pretrained Clinical Text Classifiers

Blaz \v{S}krlj; Kristian Miok; Marko Robnik \v{S}ikonja; Matej Klemen

arxiv: 2605.28060 · v1 · pith:VL643CIQnew · submitted 2026-05-27 · 💻 cs.CL

Challenges in Explaining Pretrained Clinical Text Classifiers

Kristian Miok , Matej Klemen , Blaz \v{S}krlj , Marko Robnik \v{S}ikonja This is my paper

Pith reviewed 2026-06-29 12:34 UTC · model grok-4.3

classification 💻 cs.CL

keywords clinical NLPexplainable AILIMESHAPpretrained modelslength of stay predictiontoken attributionsmodel interpretability

0 comments

The pith

Token-level explanation methods like LIME and SHAP overemphasize non-informative tokens and produce unstable attributions on pretrained clinical text classifiers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines why standard post-hoc explanation techniques fall short for neural models handling long, unstructured medical texts. It conducts targeted tests on a hospital length-of-stay prediction task and documents three recurring problems: heavy weighting of irrelevant tokens, inconsistent results across similar inputs, and confident outputs even when the input is semantically broken. A reader would care because clinical decisions require explanations that stay grounded in medical meaning rather than surface artifacts. The work concludes that new explanation approaches must prioritize clinical relevance and robustness to noise.

Core claim

Token-level and perturbation-based methods exhibit core limitations on pretrained clinical text classifiers, specifically overemphasis on non-informative tokens, instability in attributions, and high-confidence predictions for incoherent input variants, as shown through demonstrations on length-of-stay prediction.

What carries the argument

Targeted demonstrations on a single hospital length-of-stay prediction task that expose flaws in LIME and SHAP when applied to clinical narratives.

If this is right

Explanation methods for clinical NLP must become clinically meaningful and semantically grounded.
Current token-level techniques require greater robustness to linguistic noise in medical texts.
High-confidence predictions on incoherent inputs indicate a need to verify input coherence before trusting attributions.
New strategies beyond standard perturbation-based tools are needed to support reliable use in medical settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same attribution problems may appear in other clinical tasks such as diagnosis coding or discharge summarization.
Clinicians interpreting these explanations risk focusing on non-causal features when deciding on patient care.
Methods that incorporate domain knowledge or sentence-level structure could mitigate the observed instabilities.

Load-bearing premise

Limitations seen on one length-of-stay task are representative of the same methods across all pretrained clinical text classifiers.

What would settle it

Finding stable attributions that consistently highlight medically relevant tokens across multiple clinical classification tasks using LIME or SHAP would contradict the reported limitations.

Figures

Figures reproduced from arXiv: 2605.28060 by Blaz \v{S}krlj, Kristian Miok, Marko Robnik \v{S}ikonja, Matej Klemen.

**Figure 2.** Figure 2: Distribution of common LIME attribution tokens across 20 clinical dis [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Explaining the predictions of neural models in clinical NLP remains a significant challenge, especially for complex tasks involving long, unstructured medical texts. While post-hoc methods like LIME and SHAP are widely used, they often fall short when applied to clinical narratives. In this paper, we identify core limitations of token-level and perturbation-based explanation techniques through targeted demonstra- tions on a hospital length-of-stay prediction task. Our findings reveal issues such as overemphasis on non-informative tokens, instability in at- tributions, and high-confidence predictions for incoherent input variants. These results underscore the need for explanation strategies that are clin- ically meaningful, semantically grounded, and robust to linguistic noise.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows familiar LIME/SHAP problems on one length-of-stay task but the single demonstration does not support calling them core limitations for clinical classifiers.

read the letter

The paper takes known weaknesses of token-level and perturbation-based explainers and illustrates them on a hospital length-of-stay prediction task. The examples cover overemphasis on non-informative tokens, attribution instability, and confident predictions on incoherent inputs. That part is straightforward and points to real deployment risks in clinical NLP.

It does a reasonable job of making those issues concrete for a medical audience. The demonstrations are targeted and the abstract states the problems clearly without overclaiming new methods.

The main limitation is scope. Everything rests on one task with no cross-task checks, no other clinical benchmarks, and no quantitative metrics or controls mentioned. The stress-test note is right: a single demonstration does not establish these as core limitations across pretrained clinical text classifiers. Without that breadth the "core" framing does not hold up.

The work is aimed at practitioners already applying post-hoc explainers in healthcare settings who want to see the pitfalls spelled out. Readers looking for new techniques or systematic comparisons will find little here.

I would send it for peer review. The topic is practically relevant even if the evidence is narrow, and referees could push for more tasks or metrics to strengthen the case.

Referee Report

2 major / 1 minor

Summary. The paper claims that token-level and perturbation-based post-hoc explanation methods (e.g., LIME and SHAP) exhibit core limitations when applied to pretrained clinical text classifiers. These limitations—overemphasis on non-informative tokens, attribution instability, and high-confidence predictions on incoherent input variants—are identified via targeted demonstrations on a single hospital length-of-stay prediction task, motivating the need for clinically meaningful, semantically grounded, and robust explanation strategies.

Significance. If the demonstrated issues prove generalizable beyond the chosen task and model, the work would usefully flag practical shortcomings in widely adopted explainability tools for clinical NLP. The absence of cross-task replication or quantitative controls, however, leaves the scope of the claimed 'core limitations' open to question.

major comments (2)

[Abstract] Abstract and the demonstrations section: the claim that the observed issues constitute 'core limitations' of token-level and perturbation-based methods 'across pretrained clinical text classifiers' is load-bearing for the paper's central thesis, yet rests exclusively on targeted demonstrations for one length-of-stay task without cross-task replication, ablation on other clinical NLP benchmarks, or an explicit argument that length-of-stay is paradigmatic.
[Abstract] Abstract: the description of the demonstrations supplies no quantitative metrics, baseline comparisons, model architecture details, dataset statistics, or statistical controls, preventing verification that the reported issues are systematic rather than anecdotal.

minor comments (1)

The manuscript would benefit from an explicit limitations subsection that directly addresses the single-task scope and the conditions under which the findings might or might not generalize.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address the two major points below, providing our response and indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract and the demonstrations section: the claim that the observed issues constitute 'core limitations' of token-level and perturbation-based methods 'across pretrained clinical text classifiers' is load-bearing for the paper's central thesis, yet rests exclusively on targeted demonstrations for one length-of-stay task without cross-task replication, ablation on other clinical NLP benchmarks, or an explicit argument that length-of-stay is paradigmatic.

Authors: We agree that the empirical demonstrations are confined to a single task. Length-of-stay prediction was deliberately chosen as it involves long, unstructured clinical narratives that are characteristic of many clinical NLP problems; the failures we document (overemphasis on non-informative tokens, attribution instability, and confident predictions on incoherent inputs) stem directly from the local, token-level perturbation mechanics of LIME/SHAP rather than from task idiosyncrasies. We will revise the manuscript to (a) add an explicit paragraph arguing why this task is representative and (b) qualify the scope of the 'core limitations' claim to reflect the single-task evidence base, thereby avoiding overgeneralization. revision: partial
Referee: [Abstract] Abstract: the description of the demonstrations supplies no quantitative metrics, baseline comparisons, model architecture details, dataset statistics, or statistical controls, preventing verification that the reported issues are systematic rather than anecdotal.

Authors: Abstracts are concise summaries; the quantitative metrics, baseline comparisons, model details (pretrained clinical transformers), dataset statistics, and controls are all reported in the demonstrations section of the full manuscript. We can add a brief pointer in the abstract to the relevant section if that improves verifiability, but we do not believe the abstract itself must contain these details. revision: no

Circularity Check

0 steps flagged

No circularity: empirical demonstrations only

full rationale

The paper reports targeted empirical demonstrations on a single length-of-stay task to surface observed behaviors of token-level and perturbation-based explainers. No derivation chain, fitted parameters, equations, or self-citation load-bearing premises exist; the central claims are direct observations rather than reductions of outputs to inputs by construction. The analysis is therefore self-contained against external benchmarks with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no mathematical derivations, fitted parameters, or postulated entities are described.

pith-pipeline@v0.9.1-grok · 5653 in / 973 out tokens · 30841 ms · 2026-06-29T12:34:46.640578+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 2 canonical work pages

[1]

Cancers15(6), 1853 (2023)

Benedum, C.M., Sondhi, A., Fidyk, E., Cohen, A.B., Nemeth, S., Adamson, B., Estévez, M., Bozkurt, S.: Replication of real-world evidence in oncology using elec- tronic health record data extracted by machine learning. Cancers15(6), 1853 (2023)

2023
[2]

Packt Publishing Ltd (2022)

Bhattacharya, A.: Applied Machine Learning Explainability Techniques: Make ML models explainable and trustworthy for practical applications using LIME, SHAP, and more. Packt Publishing Ltd (2022)

2022
[3]

In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., Elhadad, N.: Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 1721–1730. ACM (2015)

2015
[4]

arXiv preprint arXiv:2204.06683 (2022)

Dai, X., Chalkidis, I., Darkner, S., Elliott, D.: Revisiting transformer-based models for long document classification. arXiv preprint arXiv:2204.06683 (2022)

work page arXiv 2022
[5]

Journal of Biomedical Informatics126, 103980 (2022)

Holzinger, A., Müller, H., Biesinger, B., Pattichis, C., Kell, D.B.: Towards the augmented pathologist: Knowledge-driven explainable ai. Journal of Biomedical Informatics126, 103980 (2022)

2022
[6]

In: Proceedings of the 31st International Conference on Neural Information Processing Systems

Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. p. 4768–4777 (2017)

2017
[7]

JMIR medical informatics9(8), e30470 (2021)

Palojoki, S., Saranto, K., Reponen, E., Skants, N., Vakkuri, A., Vuokko, R., et al.: Classification of electronic health record–related patient safety incidents: Develop- ment and validation study. JMIR medical informatics9(8), e30470 (2021)

2021
[8]

Why should I trust you?

Ribeiro, M.T., Singh, S., Guestrin, C.: "Why should I trust you?": Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD Interna- tional Conference on Knowledge Discovery and Data Mining. p. 1135–1144 (2016). https://doi.org/10.1145/2939672.2939778

work page doi:10.1145/2939672.2939778 2016
[9]

Wiley Interdisciplinary Reviews: Computational Statistics 13(6), e1549 (2021)

Tayefi, M., Ngo, P., Chomutare, T., Dalianis, H., Salvi, E., Budrionis, A., Godtlieb- sen, F.: Challenges and opportunities beyond structured data in analysis of elec- tronic health records. Wiley Interdisciplinary Reviews: Computational Statistics 13(6), e1549 (2021)

2021
[10]

In: Proceedings of the 4th Machine Learning for Healthcare Conference

Tonekaboni,S.,Joshi,S.,McCradden,M.D.,Goldenberg,A.:Whatclinicianswant: Contextualizing explainable machine learning for clinical end use. In: Proceedings of the 4th Machine Learning for Healthcare Conference. MLHC, vol. 106, pp. 359–

[1] [1]

Cancers15(6), 1853 (2023)

Benedum, C.M., Sondhi, A., Fidyk, E., Cohen, A.B., Nemeth, S., Adamson, B., Estévez, M., Bozkurt, S.: Replication of real-world evidence in oncology using elec- tronic health record data extracted by machine learning. Cancers15(6), 1853 (2023)

2023

[2] [2]

Packt Publishing Ltd (2022)

Bhattacharya, A.: Applied Machine Learning Explainability Techniques: Make ML models explainable and trustworthy for practical applications using LIME, SHAP, and more. Packt Publishing Ltd (2022)

2022

[3] [3]

In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., Elhadad, N.: Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 1721–1730. ACM (2015)

2015

[4] [4]

arXiv preprint arXiv:2204.06683 (2022)

Dai, X., Chalkidis, I., Darkner, S., Elliott, D.: Revisiting transformer-based models for long document classification. arXiv preprint arXiv:2204.06683 (2022)

work page arXiv 2022

[5] [5]

Journal of Biomedical Informatics126, 103980 (2022)

Holzinger, A., Müller, H., Biesinger, B., Pattichis, C., Kell, D.B.: Towards the augmented pathologist: Knowledge-driven explainable ai. Journal of Biomedical Informatics126, 103980 (2022)

2022

[6] [6]

In: Proceedings of the 31st International Conference on Neural Information Processing Systems

Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. p. 4768–4777 (2017)

2017

[7] [7]

JMIR medical informatics9(8), e30470 (2021)

Palojoki, S., Saranto, K., Reponen, E., Skants, N., Vakkuri, A., Vuokko, R., et al.: Classification of electronic health record–related patient safety incidents: Develop- ment and validation study. JMIR medical informatics9(8), e30470 (2021)

2021

[8] [8]

Why should I trust you?

Ribeiro, M.T., Singh, S., Guestrin, C.: "Why should I trust you?": Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD Interna- tional Conference on Knowledge Discovery and Data Mining. p. 1135–1144 (2016). https://doi.org/10.1145/2939672.2939778

work page doi:10.1145/2939672.2939778 2016

[9] [9]

Wiley Interdisciplinary Reviews: Computational Statistics 13(6), e1549 (2021)

Tayefi, M., Ngo, P., Chomutare, T., Dalianis, H., Salvi, E., Budrionis, A., Godtlieb- sen, F.: Challenges and opportunities beyond structured data in analysis of elec- tronic health records. Wiley Interdisciplinary Reviews: Computational Statistics 13(6), e1549 (2021)

2021

[10] [10]

In: Proceedings of the 4th Machine Learning for Healthcare Conference

Tonekaboni,S.,Joshi,S.,McCradden,M.D.,Goldenberg,A.:Whatclinicianswant: Contextualizing explainable machine learning for clinical end use. In: Proceedings of the 4th Machine Learning for Healthcare Conference. MLHC, vol. 106, pp. 359–