arxiv: 2604.05790 · v1 · submitted 2026-04-07 · 💻 cs.HC

Recognition: no theorem link

Improving Explanations: Applying the Feature Understandability Scale for Cost-Sensitive Feature Selection

Nicola Rossberg , Bennett Kleinberg , Barry O'Sullivan , Luca Longo , Andrea Visentin

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:33 UTC · model grok-4.3

classification 💻 cs.HC

keywords explainable AIfeature selectionunderstandabilitymachine learningtabular datamodel interpretabilityco-optimization

0 comments

The pith

Machine learning explanations become more understandable by selecting features according to user comprehension scores while preserving high accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether feature selection guided by the Feature Understandability Scale can improve the quality of natural-language explanations for tabular data models. The authors collect understandability scores for features in two datasets and introduce a co-optimization method that balances these scores against classification accuracy. They show that the resulting models keep strong predictive performance while the chosen features produce explanations that read as more accessible. The work treats interpretability as something that can be designed into the model rather than added afterward. Readers would care because clearer explanations could help people actually use and trust AI decisions in practice.

Core claim

The paper establishes that accuracy and understandability can be successfully co-optimised while maintaining high classification performances. By treating understandability scores as costs in feature selection, the method produces explanations that are considered more understandable at face value on the tested datasets. This is presented as a proof-of-concept contribution to building model interpretability by design.

What carries the argument

The Feature Understandability Scale applied within a co-optimisation methodology for cost-sensitive feature selection.

If this is right

Explanations for tabular classification tasks can be made more accessible without sacrificing predictive performance.
Feature selection can directly incorporate user comprehension as a design criterion.
The co-optimization approach works across at least two different datasets while keeping high accuracy.
Interpretability becomes an integrated part of model construction rather than a post-hoc fix.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be tested on non-tabular data or other model types to see if the co-optimization still holds.
Individual differences in user background might require personalized understandability scores rather than a single scale.
Similar cost-based selection could be explored for other explanation qualities such as completeness or causal clarity.

Load-bearing premise

The Feature Understandability Scale provides a reliable measure of how well users will actually comprehend the selected features in explanations.

What would settle it

A user study in which participants rate the understandability of explanations from the co-optimized models versus standard models; if the ratings show no improvement or accuracy falls substantially, the central claim would not hold.

Figures

Figures reproduced from arXiv: 2604.05790 by Andrea Visentin, Barry O'Sullivan, Bennett Kleinberg, Luca Longo, Nicola Rossberg.

**Figure 1.** Figure 1: Workflow of this study. 3.2 Data Understanding The FUS is applied to two public datasets to ensure that any changes in explanation quality are not domain-specific. The first dataset is the ‘Telco Customer Churn’ dataset, which contains data that can be used to predict whether a customer will switch service providers given their family and usage history. The dataset is publicly available 5 . This dataset wi… view at source ↗

**Figure 2.** Figure 2: Workflow of the Cost-Sensitive feature selection [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Average feature cost and standard deviation for the phone company customer [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Average feature cost and standard deviation for the medical dataset. Note that [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

read the original abstract

With the growing pervasiveness of artificial intelligence, the ability to explain the inferences made by machine learning models has become increasingly important. Numerous techniques for model explainability have been proposed, with natural-language textual explanations among the most widely used approaches. When applied to tabular data, these explanations typically draw on input features to justify a given inference. Consequently, a user's ability to interpret the explanation depends on their understanding of the input features. To quantify this feature-level understanding, Rossberg et al. introduced the Feature Understandability Scale. Building on that work, this proof-of-concept study collects understandability scores across two datasets, proposes a co-optimisation methodology of understandability and accuracy and presents the resulting explanations alongside the model accuracies. This work contributes to the body of knowledge on model interpretability by design. It is found that accuracy and understandability can be successfully co-optimised while maintaining high classification performances. The resulting explanations are considered more understandable at face value. Further research will aim to confirm these findings through user evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This proof-of-concept applies the Feature Understandability Scale to co-optimize feature selection for accuracy and understandability on tabular data, but the main claim rests on unvalidated proxy scores.

read the letter

The paper takes the earlier Feature Understandability Scale and uses it to guide cost-sensitive feature selection. They collect scores on two datasets, set up a co-optimization routine that balances those scores against model accuracy, and show the resulting explanations while reporting that classification performance stays high. That is the concrete step forward: turning a prior measurement tool into an active selection criterion rather than leaving it as a post-hoc rating.

Referee Report

2 major / 2 minor

Summary. This proof-of-concept manuscript applies the Feature Understandability Scale (Rossberg et al.) to cost-sensitive feature selection for natural-language explanations of tabular ML models. It collects scale scores on two datasets, proposes a co-optimization procedure balancing classification accuracy and feature understandability, and presents the resulting explanations together with model accuracies, claiming that accuracy and understandability can be co-optimized while preserving high performance and yielding more understandable explanations at face value. User studies to confirm comprehension gains are deferred to future work.

Significance. If the co-optimization procedure and scale-based selection prove robust, the work could meaningfully advance human-centered XAI by embedding feature-level understandability directly into model design rather than post-hoc explanation. The explicit reuse of an existing, cited scale is a strength that avoids ad-hoc invention and supports cumulative progress in interpretability research.

major comments (2)

[Abstract] Abstract: the claim that 'accuracy and understandability can be successfully co-optimised while maintaining high classification performances' is stated without any quantitative metrics (accuracy values, understandability scores, baselines, error bars, or dataset identifiers), rendering the central empirical finding impossible to evaluate or reproduce from the provided text.
[Methodology / Results] Methodology and Results sections: the co-optimization rests on the Feature Understandability Scale serving as a reliable proxy for user comprehension, yet the manuscript explicitly defers validation of actual comprehension gains to 'further research … through user evaluation.' This assumption is load-bearing for the claim of improved explanations, as no check is described showing that higher scale scores predict downstream user performance or that the selection step preserves explanation fidelity beyond reported accuracies.

minor comments (2)

[Abstract] The abstract uses the vague qualifier 'at face value'; replace with a concrete statement of what was observed (e.g., mean scale scores or qualitative comparison).
Dataset characteristics, sources, and preprocessing steps are not described; add a dedicated subsection or table to support reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our proof-of-concept manuscript. We address each major comment below, proposing targeted revisions where appropriate to improve clarity and precision while preserving the scope of the current work.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'accuracy and understandability can be successfully co-optimised while maintaining high classification performances' is stated without any quantitative metrics (accuracy values, understandability scores, baselines, error bars, or dataset identifiers), rendering the central empirical finding impossible to evaluate or reproduce from the provided text.

Authors: We agree that the abstract would benefit from greater specificity to allow immediate evaluation of the central claim. In the revised version, we will incorporate key quantitative results, including the achieved classification accuracies on both datasets, the corresponding Feature Understandability Scale scores before and after co-optimization, and explicit dataset identifiers. This addition will make the empirical finding reproducible from the abstract while respecting length constraints. revision: yes
Referee: [Methodology / Results] Methodology and Results sections: the co-optimization rests on the Feature Understandability Scale serving as a reliable proxy for user comprehension, yet the manuscript explicitly defers validation of actual comprehension gains to 'further research … through user evaluation.' This assumption is load-bearing for the claim of improved explanations, as no check is described showing that higher scale scores predict downstream user performance or that the selection step preserves explanation fidelity beyond reported accuracies.

Authors: We acknowledge that the Feature Understandability Scale functions as a proxy in the current study and that direct validation of comprehension gains via user studies is explicitly deferred to future work, as stated in the manuscript. The proof-of-concept contribution is limited to demonstrating that the scale scores and model accuracy can be co-optimized while retaining high classification performance; the 'more understandable at face value' phrasing refers directly to the scale scores obtained. We will revise the Methodology and Results sections to more explicitly frame the scale as a proxy, restate the scope of the current claims, and clarify that fidelity is assessed via maintained accuracy (as a necessary but not sufficient indicator). No additional empirical checks on predictive validity of the scale are available at this stage. revision: partial

Circularity Check

0 steps flagged

No significant circularity; co-optimization claim uses prior scale as input without self-referential reduction

full rationale

The paper's chain consists of citing the Feature Understandability Scale from prior work, collecting scores on two datasets, proposing a co-optimization procedure for feature selection, and reporting empirical outcomes on accuracy and scale scores. No equations, fitted parameters renamed as predictions, or self-definitional loops are present. The self-citation introduces a measurement tool treated as an external input rather than deriving the central claim from itself. The paper acknowledges the need for separate user evaluation, confirming the reported results are not presented as closed within this manuscript. This is a standard application of prior methodology with new empirical steps and does not meet criteria for any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No full manuscript text was accessible; therefore no free parameters, axioms, or invented entities could be audited from the paper itself.

pith-pipeline@v0.9.0 · 5487 in / 1053 out tokens · 29077 ms · 2026-05-10T18:33:36.450552+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 7 canonical work pages

[1]

In: World Conference on Explainable Artificial Intelligence

Ahmed, T., Biecek, P., Longo, L.: Latent space interpretation and mechanistic clipping of subject-specific variational autoencoders of eeg topographic maps for artefacts reduction. In: World Conference on Explainable Artificial Intelligence. pp. 327–350. Springer (2025)

2025
[2]

IEEE Access10, 107575–107586 (2022)

Ahmed, T., Longo, L.: Examining the size of the latent space of convolutional variational autoencoders trained with spectral topographic maps of eeg frequency bands. IEEE Access10, 107575–107586 (2022)

2022
[3]

In: Pro- ceedings of the AAAI conference on artificial intelligence

Arik, S.Ö., Pfister, T.: Tabnet: Attentive interpretable tabular learning. In: Pro- ceedings of the AAAI conference on artificial intelligence. vol. 35, pp. 6679–6687 (2021)

2021
[4]

Journal of Reliable Intelligent Environments11(1), 1 (2025)

Assis, A., Dantas, J., Andrade, E.: The performance-interpretability trade-off: A comparative study of machine learning models. Journal of Reliable Intelligent Environments11(1), 1 (2025)

2025
[5]

In: 2023 IEEE Latin American Conference on Computational Intelligence (LA-CCI)

Assis, A., Véras, D., Andrade, E.: Explainable artificial intelligence-an analysis of the trade-offs between performance and explainability. In: 2023 IEEE Latin American Conference on Computational Intelligence (LA-CCI). pp. 1–6. IEEE (2023)

2023
[6]

In: International Conference on Information Systems Architecture and Technology

Bach, M., Werner, A.: Cost-sensitive feature selection for class imbalance problem. In: International Conference on Information Systems Architecture and Technology. pp. 182–194. Springer (2017)

2017
[7]

In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval

Balog, K., Radlinski, F.: Measuring recommendation explanation quality: The conflicting goals of explanations. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. pp. 329–338 (2020)

2020
[8]

Advanced Drug Delivery Reviews202, 115108 (2023)

Bao, Z., Bufton, J., Hickman, R.J., Aspuru-Guzik, A., Bannigan, P., Allen, C.: Revolutionizing drug formulation development: The increasing impact of machine learning. Advanced Drug Delivery Reviews202, 115108 (2023)

2023
[9]

Minds and Machines33(2), 347–377 (2023)

Baron, S.: Explainable ai and causal understanding: Counterfactual approaches considered. Minds and Machines33(2), 347–377 (2023)

2023
[10]

Explanation is all you need in distillation: Mitigating bias and shortcut learning.arXiv preprint arXiv:2407.09788, 2024

Bassi, P.R., Cavalli, A., Decherchi, S.: Explanation is all you need in distillation: Mitigating bias and shortcut learning. arXiv preprint arXiv:2407.09788 (2024)

work page arXiv 2024
[11]

In: Proceedings of the 2022 ACM conference on fairness, accountability, and transparency

Bell, A., Solano-Kamaiko, I., Nov, O., Stoyanovich, J.: It’s just not that simple: an empirical study of the accuracy-explainability trade-off in machine learning for pub- lic policy. In: Proceedings of the 2022 ACM conference on fairness, accountability, and transparency. pp. 248–266 (2022)

2022
[12]

Computers & Operations Research 106, 169–178 (2019)

Benítez-Peña, S., Blanquero, R., Carrizosa, E., Ramírez-Cobo, P.: Cost-sensitive feature selection for support vector machines. Computers & Operations Research 106, 169–178 (2019)

2019
[13]

Mathematical Problems in Engineering2016(1), 8752181 (2016) Application of the Feature Understandability Scale 21

Bian, J., Peng, X.g., Wang, Y., Zhang, H.: An efficient cost-sensitive feature selection using chaos genetic algorithm for class imbalance problem. Mathematical Problems in Engineering2016(1), 8752181 (2016) Application of the Feature Understandability Scale 21

2016
[14]

In: World Conference on Explainable Artificial Intelligence

Brdnik, S., Colakovic, I., Karakatič, S.: Non-experts’ trust in xai is unreasonably high. In: World Conference on Explainable Artificial Intelligence. pp. 184–197. Springer (2025)

2025
[15]

In: Guidotti, R., Schmid, U., Longo, L

Cahlik,V.,Alves,R.,Kordik,P.:Reasoning-groundednaturallanguageexplanations for language models. In: Guidotti, R., Schmid, U., Longo, L. (eds.) Explainable Artificial Intelligence. pp. 3–18. Springer Nature Switzerland, Cham (2026)

2026
[16]

Communication methods and measures12(1), 25–44 (2018)

Carpenter, S.: Ten steps in scale development and reporting: A guide for researchers. Communication methods and measures12(1), 25–44 (2018)

2018
[17]

In: International Cross-Domain Conference for Machine Learning and Knowledge Extraction

Carrington, A., Fieguth, P., Chen, H.: Measures of model interpretability for model selection. In: International Cross-Domain Conference for Machine Learning and Knowledge Extraction. pp. 329–349. Springer (2018)

2018
[18]

arXiv preprint arXiv:2211.05667 , year=

Chen, Z., Subhash, V., Havasi, M., Pan, W., Doshi-Velez, F.: What makes a good explanation?: A harmonized view of properties of explanations. arXiv preprint arXiv:2211.05667 (2022)

work page arXiv 2022
[19]

In: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society

Dai, J., Upadhyay, S., Aivodji, U., Bach, S.H., Lakkaraju, H.: Fairness via expla- nation quality: Evaluating disparities in the quality of post hoc explanations. In: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. pp. 203–214 (2022)

2022
[20]

In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & data mining

Danilevsky, M., Dhanorkar, S., Li, Y., Popa, L., Qian, K., Xu, A.: Explainability for natural language processing. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & data mining. pp. 4033–4034 (2021)

2021
[21]

In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems

DeVrio, A., Cheng, M., Egede, L., Olteanu, A., Blodgett, S.L.: A taxonomy of linguistic expressions that contribute to anthropomorphism of language technologies. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. pp. 1–18 (2025)

2025
[22]

In: World Conference on Explainable Artificial Intelligence

Domnich, M., Veski, R.M., Välja, J., Tulver, K., Vicente, R.: Predicting satisfaction of counterfactual explanations from human ratings of explanatory qualities. In: World Conference on Explainable Artificial Intelligence. pp. 210–229. Springer (2025)

2025
[23]

K., Ben-David, S

Dziugaite, G.K., Ben-David, S., Roy, D.M.: Enforcing interpretability and its statistical impacts: Trade-offs between accuracy and interpretability. arXiv preprint arXiv:2010.13764 (2020)

work page arXiv 2010
[24]

Pattern Recognition79, 328–339 (2018)

Gao, W., Hu, L., Zhang, P.: Class-specific mutual information variation for feature selection. Pattern Recognition79, 328–339 (2018)

2018
[25]

Machine learning46(1), 389–422 (2002)

Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classifica- tion using support vector machines. Machine learning46(1), 389–422 (2002)

2002
[26]

part ii: Explanations

Halpern, J.Y., Pearl, J.: Causes and explanations: A structural-model approach. part ii: Explanations. The British journal for the philosophy of science (2005)

2005
[27]

Psychological Bul- letin107(1), 65 (1990)

Hilton, D.J.: Conversational processes and causal explanation. Psychological Bul- letin107(1), 65 (1990)

1990
[28]

Metrics for Explainable AI: Challenges and Prospects

Hoffman, R.R., Mueller, S.T., Klein, G., Litman, J.: Metrics for explainable ai: Challenges and prospects. arXiv preprint arXiv:1812.04608 (2018)

work page Pith review arXiv 2018
[29]

In: World Conference on Explainable Artificial Intelligence

Höllig, J., Markus, A.F., De Slegte, J., Bagave, P.: Semantic meaningfulness: evaluating counterfactual approaches for real-world plausibility and feasibility. In: World Conference on Explainable Artificial Intelligence. pp. 636–659. Springer (2023) 22 N. Rossberg et al

2023
[30]

KI-Künstliche Intelligenz34(2), 193–198 (2020)

Holzinger, A., Carrington, A., Müller, H.: Measuring the quality of explanations: the system causability scale (scs) comparing human and machine explanations. KI-Künstliche Intelligenz34(2), 193–198 (2020)

2020
[31]

Computers in Human Behavior: Artificial Humans p

Hunsicker, T., König, C.J., Langer, M.: Investigating choices regarding the accuracy- transparency trade-off of ai-based systems across contexts. Computers in Human Behavior: Artificial Humans p. 100216 (2025)

2025
[32]

IEEE Transactions on Systems, Man, and Cybernetics: Systems51(3), 1747–1756 (2019)

Jiang, L., Kong, G., Li, C.: Wrapper framework for test-cost-sensitive feature selection. IEEE Transactions on Systems, Man, and Cybernetics: Systems51(3), 1747–1756 (2019)

2019
[33]

Algorithms18(9), 556 (2025)

Kabir, S., Hossain, M.S., Andersson, K.: A review of explainable artificial intel- ligence from the perspectives of challenges and opportunities. Algorithms18(9), 556 (2025)

2025
[34]

Kästner,L.,Crook,B.:Explainingaithroughmechanisticinterpretability.European journal for philosophy of science14(4), 52 (2024)

2024
[35]

In: World Conference on Explainable Artificial Intelligence

Koenen, N., Wright, M.N.: Toward understanding the disagreement problem in neural network feature attribution. In: World Conference on Explainable Artificial Intelligence. pp. 247–269. Springer (2024)

2024
[36]

In: World Conference on Explainable Artificial Intelligence

Kuhl, U., Artelt, A., Hammer, B.: For better or worse: the impact of counterfactual explanations’ directionality on user behavior in xai. In: World Conference on Explainable Artificial Intelligence. pp. 280–300. Springer (2023)

2023
[37]

Journal of Research in Personality43(3), 489–493 (2009)

Kulas, J.T., Stachowski, A.A.: Middle category endorsement in odd-numbered likert response scales: Associated item characteristics, cognitive demands, and preferred meanings. Journal of Research in Personality43(3), 489–493 (2009)

2009
[38]

IEEE Transactions on Image Processing27(3), 1323–1335 (2017)

Liu, M., Xu, C., Luo, Y., Xu, C., Wen, Y., Tao, D.: Cost-sensitive feature selection by optimizing f-measures. IEEE Transactions on Image Processing27(3), 1323–1335 (2017)

2017
[39]

Journal of Medical Internet Research27, e73374 (2025)

Liu, Y., Liu, C., Zheng, J., Xu, C., Wang, D.: Improving explainability and integrability of medical ai to promote health care professional acceptance and use: mixed systematic review. Journal of Medical Internet Research27, e73374 (2025)

2025
[40]

Information Fusion106, 102301 (2024)

Longo, L., Brcic, M., Cabitza, F., Choi, J., Confalonieri, R., Del Ser, J., Guidotti, R., Hayashi, Y., Herrera, F., Holzinger, A., et al.: Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions. Information Fusion106, 102301 (2024)

2024
[41]

Nature Machine Intelligence2(1), 2522–5839 (2020)

Lundberg, S.M., Erion, G., Chen, H., DeGrave, A., Prutkin, J.M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., Lee, S.I.: From local explanations to global under- standing with explainable ai for trees. Nature Machine Intelligence2(1), 2522–5839 (2020)

2020
[42]

Engineering Applications of Artificial Intelligence 136, 109035 (2024)

Ma, X.A., Xu, H., Liu, Y., Zhang, J.Z.: Class-specific feature selection using fuzzy information-theoretic metrics. Engineering Applications of Artificial Intelligence 136, 109035 (2024)

2024
[43]

In: Proceedings of the 26th International Conference on Intelligent User Interfaces

Mohseni, S., Block, J.E., Ragan, E.: Quantitative evaluation of machine learn- ing explanations: A human-grounded benchmark. In: Proceedings of the 26th International Conference on Intelligent User Interfaces. pp. 22–31 (2021)

2021
[44]

Materials Today: Proceedings80, 2803–2806 (2023) Application of the Feature Understandability Scale 23

Pallathadka, H., Mustafa, M., Sanchez, D.T., Sajja, G.S., Gour, S., Naved, M.: Impact of machine learning on management, healthcare and agriculture. Materials Today: Proceedings80, 2803–2806 (2023) Application of the Feature Understandability Scale 23

2023
[45]

Journal of Machine Learning Research12, 2825–2830 (2011)

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research12, 2825–2830 (2011)

2011
[46]

Interna- tional Journal on Artificial Intelligence Tools21(02), 1240005 (2012)

Phillips-Wren, G.: Ai tools in decision making support systems: a review. Interna- tional Journal on Artificial Intelligence Tools21(02), 1240005 (2012)

2012
[47]

AI and Ethics4(3), 691–698 (2024)

Placani, A.: Anthropomorphism in ai: hype and fallacy. AI and Ethics4(3), 691–698 (2024)

2024
[48]

why should i trust you?

Ribeiro, M.T., Singh, S., Guestrin, C.: " why should i trust you?" explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. pp. 1135–1144 (2016)

2016
[49]

arXiv preprint arXiv:2510.07050 (2025)

Rossberg, N., Kleinberg, B., O’Sullivan, B., Longo, L., Visentin, A.: The feature understandability scale for human-centred explainable ai: Assessing tabular feature importance. arXiv preprint arXiv:2510.07050 (2025)

work page arXiv 2025
[50]

In: World Conference on Explainable Artificial Intelligence

Scholbeck, C.A., Funk, H., Casalicchio, G.: Algorithm-agnostic feature attributions for clustering. In: World Conference on Explainable Artificial Intelligence. pp. 217–240. Springer (2023)

2023
[51]

arXiv preprint arXiv:2210.07126 (2022)

Schuff, H., Adel, H., Qi, P., Vu, N.T.: Challenges in explanation quality evaluation. arXiv preprint arXiv:2210.07126 (2022)

work page arXiv 2022
[52]

In: Proceedings of the IEEE international conference on computer vision

Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad- cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision. pp. 618–626 (2017)

2017
[53]

In: International Conference on Data & Information Sciences

Shah, R., Pawar, A., Kumar, M.: Enhancing machine learning model using explain- able ai. In: International Conference on Data & Information Sciences. pp. 287–297. Springer (2023)

2023
[54]

World Information Technology and Engineering Journal10(07), 3897–3904 (2023)

Sharifani, K., Amini, M.: Machine learning and deep learning: A review of methods and applications. World Information Technology and Engineering Journal10(07), 3897–3904 (2023)

2023
[55]

performance: Bridging the trade-off in deep learning models

Shuvra, M.K., Gony, M.N., Fatema, K.: Explainability vs. performance: Bridging the trade-off in deep learning models. International Journal of Advanced Research in Computer Science & Technology (IJARCST)7(5), 10931–10941 (2024)

2024
[56]

International Journal of Recent Advances in (2024)

Simuni, G.: Explainable ai in ml: The path to transparency and accountability. International Journal of Recent Advances in (2024)

2024
[57]

Journal of Statistical Modeling and Analytics (JOSMA)7(2) (2025)

Sinha, A., Nayem, H., Kibria, B.G.: Sample size requirements for the central limit theorem for skewed distributions: A simulation study. Journal of Statistical Modeling and Analytics (JOSMA)7(2) (2025)

2025
[58]

Review of Philosophy and Psychology2(1), 77–88 (2011)

Verdejo, V.M., Quesada, D.: Levels of explanation vindicated. Review of Philosophy and Psychology2(1), 77–88 (2011)

2011
[59]

Machine Learning and Knowledge Extraction3(3), 615–661 (2021)

Vilone, G., Longo, L.: Classification of explainable artificial intelligence methods through their output formats. Machine Learning and Knowledge Extraction3(3), 615–661 (2021)

2021
[60]

Information Fusion76, 89–106 (2021) 24 N

Vilone, G., Longo, L.: Notions of explainability and evaluation approaches for explainable artificial intelligence. Information Fusion76, 89–106 (2021) 24 N. Rossberg et al

2021
[61]

In: World Conference on Explainable Artificial Intelligence

Vilone, G., Longo, L.: Development of a human-centred psychometric test for the evaluation of explanations produced by xai methods. In: World Conference on Explainable Artificial Intelligence. pp. 205–232. Springer (2023)

2023
[62]

Wang, Z., Huang, C., Yao, X.: A roadmap of explainable artificial intelligence: Explain to whom, when, what and how? ACM Transactions on Autonomous and Adaptive Systems19(4), 1–40 (2024)

2024
[63]

IEEE Access (2025)

Wiratsin, I.o., Ragkhitwetsagul, C.: Effectiveness of explainable artificial intelli- gence (xai) techniques for improving human trust in machine learning models: A systematic literature review. IEEE Access (2025)

2025
[64]

In: Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining

Wirth, R., Hipp, J.: Crisp-dm: Towards a standard process model for data mining. In: Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining. vol. 1, pp. 29–39. Manchester (2000)

2000
[65]

International Journal of Human-Computer Studies 193, 103376 (2025)

Xuan, Y., Small, E., Sokol, K., Hettiachchi, D., Sanderson, M.: Comprehension is a double-edged sword: Over-interpreting unspecified information in intelligible machine learning explanations. International Journal of Human-Computer Studies 193, 103376 (2025)

2025
[66]

arXiv preprint arXiv:1907.06831 (2019)

Yang, F., Du, M., Hu, X.: Evaluating explanation without ground truth in inter- pretable machine learning. arXiv preprint arXiv:1907.06831 (2019)

work page arXiv 1907
[67]

International Journal of Approximate Reasoning104, 25–37 (2019)

Zhao, H., Yu, S.: Cost-sensitive feature selection via the l2, 1-norm. International Journal of Approximate Reasoning104, 25–37 (2019)

2019
[68]

Electronics10(5), 593 (2021)

Zhou, J., Gandomi, A.H., Chen, F., Holzinger, A.: Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics10(5), 593 (2021)

2021
[69]

Knowledge-based systems95, 1–11 (2016)

Zhou, Q., Zhou, H., Li, T.: Cost-sensitive feature selection using random forest: Selecting low-cost subsets of informative features. Knowledge-based systems95, 1–11 (2016)

2016
[70]

Advances in Neural Information Processing Systems36, 57876–57907 (2023)

Zimmermann, R.S., Klein, T., Brendel, W.: Scale alone does not improve mecha- nistic interpretability in vision models. Advances in Neural Information Processing Systems36, 57876–57907 (2023)

2023
[71]

ACM Computing Surveys55(5), 1–31 (2022)

Zini, J.E., Awad, M.: On the explainability of natural language processing deep models. ACM Computing Surveys55(5), 1–31 (2022)

2022