Recognition: unknown
Metrics for Explainable AI: Challenges and Prospects
read the original abstract
The question addressed in this paper is: If we present to a user an AI system that explains how it works, how do we know whether the explanation works and the user has achieved a pragmatic understanding of the AI? In other words, how do we know that an explanainable AI system (XAI) is any good? Our focus is on the key concepts of measurement. We discuss specific methods for evaluating: (1) the goodness of explanations, (2) whether users are satisfied by explanations, (3) how well users understand the AI systems, (4) how curiosity motivates the search for explanations, (5) whether the user's trust and reliance on the AI are appropriate, and finally, (6) how the human-XAI work system performs. The recommendations we present derive from our integration of extensive research literatures and our own psychometric evaluations.
This paper has not been read by Pith yet.
Forward citations
Cited by 8 Pith papers
-
What Should Explanations Contain? A Human-Centered Explanation Content Model for Local, Post-Hoc Explanations
A 14-code content model for local post-hoc AI explanations, derived from 325 user statements and validated by experts with high reliability scores.
-
Interpretability Can Be Actionable
Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.
-
Knowledge Affordances for Hybrid Human-AI Information Seeking
The paper introduces knowledge affordances as declarative, relational descriptions of knowledge sources to guide information seeking in hybrid human-AI environments.
-
Confidence Without Competence in AI-Assisted Knowledge Work
Standard LLM chats produce high perceived understanding but low objective learning in students, while future-self explanations best align confidence with actual gains and guided hints maximize learning with moderate workload.
-
Improving Explanations: Applying the Feature Understandability Scale for Cost-Sensitive Feature Selection
Accuracy and understandability can be co-optimised for feature selection in tabular-data explanations while maintaining high classification performance.
-
Reheat Nachos for Dinner? Evaluating AI Support for Cross-Cultural Communication of Neologisms
AI explanations of slang improve non-native speakers' writing competence more than definitions or rewrites according to native raters, but users overestimate their skill and a performance gap with natives remains.
-
Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions
Current XAI methods for DNNs and LLMs rest on paradoxes and false assumptions that demand a paradigm shift to verification protocols, scientific foundations, context-aware design, and faithful model analysis rather th...
-
What if AI systems weren't chatbots?
Chatbot AI systems often fail complex needs while projecting authority, contributing to deskilling, labor displacement, economic concentration, and high environmental costs, so alternative pluralistic and task-specifi...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.