Metrics for Explainable AI: Challenges and Prospects

Robert R. Hoffman , Shane T. Mueller , Gary Klein , Jordan Litman

Authors on Pith no claims yet

classification 💻 cs.AI

keywords explanationssystemuserwhetherknowusersworksachieved

read the original abstract

The question addressed in this paper is: If we present to a user an AI system that explains how it works, how do we know whether the explanation works and the user has achieved a pragmatic understanding of the AI? In other words, how do we know that an explanainable AI system (XAI) is any good? Our focus is on the key concepts of measurement. We discuss specific methods for evaluating: (1) the goodness of explanations, (2) whether users are satisfied by explanations, (3) how well users understand the AI systems, (4) how curiosity motivates the search for explanations, (5) whether the user's trust and reliance on the AI are appropriate, and finally, (6) how the human-XAI work system performs. The recommendations we present derive from our integration of extensive research literatures and our own psychometric evaluations.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

What Should Explanations Contain? A Human-Centered Explanation Content Model for Local, Post-Hoc Explanations
cs.HC 2026-05 accept novelty 7.0

A 14-code content model for local post-hoc AI explanations, derived from 325 user statements and validated by experts with high reliability scores.
Interpretability Can Be Actionable
cs.LG 2026-05 conditional novelty 6.0

Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.
Knowledge Affordances for Hybrid Human-AI Information Seeking
cs.HC 2026-04 unverdicted novelty 5.0

The paper introduces knowledge affordances as declarative, relational descriptions of knowledge sources to guide information seeking in hybrid human-AI environments.
Confidence Without Competence in AI-Assisted Knowledge Work
cs.HC 2026-04 unverdicted novelty 5.0

Standard LLM chats produce high perceived understanding but low objective learning in students, while future-self explanations best align confidence with actual gains and guided hints maximize learning with moderate workload.
Improving Explanations: Applying the Feature Understandability Scale for Cost-Sensitive Feature Selection
cs.HC 2026-04 unverdicted novelty 5.0

Accuracy and understandability can be co-optimised for feature selection in tabular-data explanations while maintaining high classification performance.
Reheat Nachos for Dinner? Evaluating AI Support for Cross-Cultural Communication of Neologisms
cs.CL 2026-04 unverdicted novelty 4.0

AI explanations of slang improve non-native speakers' writing competence more than definitions or rewrites according to native raters, but users overestimate their skill and a performance gap with natives remains.
Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions
cs.CY 2026-02 unverdicted novelty 4.0

Current XAI methods for DNNs and LLMs rest on paradoxes and false assumptions that demand a paradigm shift to verification protocols, scientific foundations, context-aware design, and faithful model analysis rather th...
What if AI systems weren't chatbots?
cs.CY 2026-05 unverdicted novelty 3.0

Chatbot AI systems often fail complex needs while projecting authority, contributing to deskilling, labor displacement, economic concentration, and high environmental costs, so alternative pluralistic and task-specifi...