arxiv: 2604.12543 · v1 · submitted 2026-04-14 · 💻 cs.AI

Recognition: unknown

A Two-Stage LLM Framework for Accessible and Verified XAI Explanations

Georgios Mermigkis , Dimitris Metaxakis , Marios Tyrovolas , Argiris Sofotasios , Nikolaos Avgeris , Panagiotis Hadjidoukas , Chrysostomos Stylios

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:22 UTC · model grok-4.3

classification 💻 cs.AI

keywords XAI explanationsLLM verificationnatural language explanationsfaithfulness assessmenthallucination detectioniterative refinementexplainable AIentropy production rate

0 comments

The pith

A second LLM can verify and refine natural-language XAI explanations to filter errors and improve accessibility.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a two-stage framework in which one LLM converts raw XAI outputs into readable narratives and a second LLM evaluates those narratives for faithfulness, completeness, coherence, and hallucination risk. Feedback from the verifier is fed back to the first model in an iterative loop until the output stabilizes. Experiments across five XAI techniques, multiple datasets, and three open-weight LLM families show that this verification step removes unreliable explanations and produces clearer language than direct translation alone. The framework also tracks entropy production rate during refinement to confirm the process drives the explainer toward more stable reasoning. Readers should care because current LLM translations of XAI results often lack safeguards, leaving end-users with potentially flawed or incomplete accounts of AI decisions.

Core claim

The two-stage LLM meta-verification framework consists of an Explainer LLM that translates raw XAI outputs into natural-language narratives, a Verifier LLM that assesses those narratives for faithfulness, coherence, completeness, and hallucination risk, and an iterative refeed mechanism that uses the verifier's feedback to refine the explainer's output. Experiments across five XAI techniques and datasets using three families of open-weight LLMs demonstrate that verification is crucial for filtering unreliable explanations while improving linguistic accessibility compared with raw XAI outputs. Analysis of the entropy production rate during the refinement process shows that the verifier's反馈pro

What carries the argument

The Verifier LLM, which assesses the Explainer LLM's natural-language output on faithfulness, coherence, completeness, and hallucination risk to enable iterative refinement via refeed.

If this is right

Verification filters out unreliable explanations before they reach end-users.
Natural-language XAI outputs become more linguistically accessible than direct translations.
The iterative refinement process progressively stabilizes explanations as shown by decreasing entropy production rate.
The overall approach supplies an efficient route to more trustworthy and democratized XAI systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same verifier-refinement loop could be tested on explanations for other AI systems such as medical diagnosis models or autonomous vehicle decisions.
Direct comparison of verified versus unverified explanations in user studies would measure whether end-users actually perceive higher trustworthiness.
If the refinement converges quickly, the framework might lower total compute cost relative to generating many independent LLM explanations.

Load-bearing premise

The Verifier LLM can accurately and consistently detect faithfulness issues, incompleteness, and hallucination risk in the Explainer's output without introducing its own systematic errors or biases.

What would settle it

A controlled test set of XAI outputs with known hallucinations or omissions where the verifier consistently fails to flag them or where the refined explanations contain more errors than the original unverified versions.

Figures

Figures reproduced from arXiv: 2604.12543 by Argiris Sofotasios, Chrysostomos Stylios, Dimitris Metaxakis, Georgios Mermigkis, Marios Tyrovolas, Nikolaos Avgeris, Panagiotis Hadjidoukas.

**Figure 2.** Figure 2: Structured prompt templates used in the framework: Explainer Zero-Shot CoT (left), Verifier Meta-Prompting (center) and Refeed Mechanism for [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Representative explanations from the five use cases: SHAP on ACSIncome, LIME on Diamonds, Grad-CAM++ on CIFAR10, Integrated Gradients [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Illustrative examples of the proposed framework correcting erroneous explanations after verifier-guided refinement. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Iterations required for explanation acceptance across the four [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Median Explainer EPR over refinement iterations (1–6) for all [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 8.** Figure 8: ROC curves evaluating how well the initial EPR predicts the need [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

read the original abstract

Large Language Models (LLMs) are increasingly used to translate the technical outputs of eXplainable Artificial Intelligence (XAI) methods into accessible natural-language explanations. However, existing approaches often lack guarantees of accuracy, faithfulness, and completeness. At the same time, current efforts to evaluate such narratives remain largely subjective or confined to post-hoc scoring, offering no safeguards to prevent flawed explanations from reaching end-users. To address these limitations, this paper proposes a Two-Stage LLM Meta-Verification Framework that consists of (i) an Explainer LLM that converts raw XAI outputs into natural-language narratives, (ii) a Verifier LLM that assesses them in terms of faithfulness, coherence, completeness, and hallucination risk, and (iii) an iterative refeed mechanism that uses the Verifier's feedback to refine and improve them. Experiments across five XAI techniques and datasets, using three families of open-weight LLMs, show that verification is crucial for filtering unreliable explanations while improving linguistic accessibility compared with raw XAI outputs. In addition, the analysis of the Entropy Production Rate (EPR) during the refinement process indicates that the Verifier's feedback progressively guides the Explainer toward more stable and coherent reasoning. Overall, the proposed framework provides an efficient pathway toward more trustworthy and democratized XAI systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The two-stage explainer-verifier setup with iterative refeed and EPR tracking is a practical idea, but the results rest on unverified LLM self-judgments without external anchors.

read the letter

The main takeaway is a framework that pairs one LLM to turn raw XAI outputs into natural-language text with a second LLM that scores the result for faithfulness, completeness, coherence, and hallucination risk, then loops the feedback back for refinement. They also track Entropy Production Rate across iterations to monitor whether the process moves toward stable reasoning. Experiments cover five XAI techniques on multiple datasets using three families of open-weight LLMs, and the reported outcome is that the verified versions read better and appear more reliable than direct XAI outputs or single-pass LLM translations.

Referee Report

3 major / 2 minor

Summary. The paper proposes a Two-Stage LLM Meta-Verification Framework for XAI explanations: an Explainer LLM converts raw XAI outputs into natural-language narratives; a Verifier LLM evaluates them for faithfulness, coherence, completeness, and hallucination risk; and an iterative refeed mechanism refines outputs based on verifier feedback. Experiments across five XAI techniques and datasets with three families of open-weight LLMs claim that verification filters unreliable explanations and improves linguistic accessibility over raw XAI outputs, with Entropy Production Rate (EPR) analysis indicating progressive stabilization of reasoning during refinement.

Significance. If the empirical results hold under independent validation, the framework would provide a scalable, LLM-based pathway to make XAI outputs more accessible and trustworthy for non-expert users, directly addressing the gap between technical XAI methods and usable explanations. The EPR analysis introduces a novel dynamical-systems lens on iterative LLM refinement that could generalize beyond this setting.

major comments (3)

[Experiments] Experiments section (and abstract): the central claim that 'verification is crucial for filtering unreliable explanations' rests on Verifier LLM judgments of faithfulness, completeness, and hallucination risk, yet no external anchor—human ratings, inter-rater agreement, or task-specific accuracy against known XAI ground truth—is reported. Improvements are shown only relative to raw XAI outputs and EPR trends, both of which can be satisfied by consistent but systematically biased LLM behavior.
[Framework and EPR analysis] Section describing the framework and EPR analysis: the abstract states that EPR 'indicates that the Verifier's feedback progressively guides the Explainer toward more stable and coherent reasoning,' but provides neither the explicit formula for EPR, the precise computation from token probabilities or attention patterns, nor any statistical test linking EPR reduction to explanation quality. Without these, the EPR result cannot be reproduced or falsified.
[Evaluation] Evaluation protocol: the manuscript reports positive results across five XAI techniques, datasets, and three LLM families but omits exact metrics (e.g., how accessibility or faithfulness is scored), chosen baselines, and any statistical significance tests. This prevents assessment of effect sizes and robustness of the 'improved accessibility' claim.

minor comments (2)

[Methods] Notation for EPR and the iterative refeed loop should be defined with explicit equations or pseudocode in the methods section to allow replication.
[Discussion] The paper should include a limitations subsection discussing potential biases introduced by the choice of open-weight LLMs and the risk of verifier-exploiter collusion.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We appreciate the referee's thorough review and valuable feedback on our manuscript. We address each major comment below, providing clarifications and indicating revisions where necessary to strengthen the paper.

read point-by-point responses

Referee: [Experiments] Experiments section (and abstract): the central claim that 'verification is crucial for filtering unreliable explanations' rests on Verifier LLM judgments of faithfulness, completeness, and hallucination risk, yet no external anchor—human ratings, inter-rater agreement, or task-specific accuracy against known XAI ground truth—is reported. Improvements are shown only relative to raw XAI outputs and EPR trends, both of which can be satisfied by consistent but systematically biased LLM behavior.

Authors: We acknowledge the importance of external validation for the verifier's judgments. Our experiments demonstrate consistent improvements in accessibility and faithfulness metrics across diverse XAI techniques and LLM families, with the iterative process showing progressive refinement. However, we agree that LLM self-evaluation can be subject to bias. In the revision, we will add a dedicated limitations section discussing this and outline plans for human evaluation studies in future work. The current results provide initial evidence of the framework's utility. revision: partial
Referee: [Framework and EPR analysis] Section describing the framework and EPR analysis: the abstract states that EPR 'indicates that the Verifier's feedback progressively guides the Explainer toward more stable and coherent reasoning,' but provides neither the explicit formula for EPR, the precise computation from token probabilities or attention patterns, nor any statistical test linking EPR reduction to explanation quality. Without these, the EPR result cannot be reproduced or falsified.

Authors: We apologize for the omission of the EPR details in the manuscript. The Entropy Production Rate is computed based on the divergence in token probability distributions between consecutive iterations, specifically using the Kullback-Leibler divergence normalized by sequence length to measure reasoning stabilization. We will include the full mathematical formulation, computation method from the model's output probabilities, and statistical tests (e.g., paired t-tests on EPR trends) in the revised manuscript to ensure reproducibility. revision: yes
Referee: [Evaluation] Evaluation protocol: the manuscript reports positive results across five XAI techniques, datasets, and three LLM families but omits exact metrics (e.g., how accessibility or faithfulness is scored), chosen baselines, and any statistical significance tests. This prevents assessment of effect sizes and robustness of the 'improved accessibility' claim.

Authors: We will revise the Evaluation section to explicitly define all metrics used for accessibility (e.g., readability scores like Flesch-Kincaid) and faithfulness (based on verifier criteria), specify the baselines (raw XAI outputs and non-iterative LLM explanations), and include statistical significance tests such as ANOVA or Wilcoxon tests with p-values and effect sizes to support the claims. revision: yes

Circularity Check

0 steps flagged

Empirical framework evaluated on external XAI methods and datasets; no load-bearing circularity in derivation chain

full rationale

The paper proposes a two-stage LLM framework (Explainer + Verifier with refeed) and supports its claims via experiments on five XAI techniques, multiple datasets, and three LLM families. Improvements are reported relative to raw XAI outputs, with EPR serving only as a post-hoc analysis metric for coherence trends rather than a fitted input or definitional parameter. No self-definitional equations, predictions that reduce to fitted values by construction, or load-bearing self-citations appear in the abstract or described structure. The core contribution remains an empirical demonstration against external benchmarks, consistent with a low circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework relies on standard assumptions about LLM capabilities rather than new fitted parameters or invented physical entities.

axioms (2)

domain assumption LLMs can produce coherent natural-language translations of technical XAI outputs
Invoked for the Explainer LLM stage.
domain assumption A separate LLM instance can reliably judge faithfulness, coherence, completeness, and hallucination risk
Core premise of the Verifier LLM and refeed mechanism.

pith-pipeline@v0.9.0 · 5567 in / 1302 out tokens · 54603 ms · 2026-05-10T15:22:08.871891+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 9 canonical work pages · 3 internal anchors

[1]

Explainable artificial intelligence: A survey of needs, techniques, applications, and future direction,

M. Mersha, K. Lam, J. Wood, A. K. AlShami, and J. Kalita, “Explainable artificial intelligence: A survey of needs, techniques, applications, and future direction,”Neurocomputing, vol. 599, p. 128111, Sep. 2024

2024
[2]

A unified approach to interpreting model predictions,

S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” inProc. 31st Conf. Neural Inf. Process. Syst. (NeurIPS), 2017, pp. 4768–4777

2017
[3]

Grad-cam: Visual explanations from deep networks via gradient-based localization,

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” inProc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 618–626

2017
[4]

Backpropagation-based counterfactual explanations for quasi-nonlinear fuzzy cognitive maps,

M. Tyrovolas, G. Nápoles, and C. Stylios, “Backpropagation-based counterfactual explanations for quasi-nonlinear fuzzy cognitive maps,” IEEE Trans. Syst., Man, Cybern.: Syst., vol. 56, no. 3, pp. 1706–1720, Mar. 2026

2026
[5]

Counterfactual explanations without opening the black box: Automated decisions and the GDPR,

S. Wachter, B. Mittelstadt, and C. Russell, “Counterfactual explanations without opening the black box: Automated decisions and the GDPR,” Harvard J. Law Technol., vol. 31, no. 2, pp. 841–887, 2018

2018
[6]

LLMs for XAI: Future directions for explaining explanations,

A. Zytek, S. Pidò, and K. Veeramachaneni, “LLMs for XAI: Future directions for explaining explanations,” 2024,arXiv:2405.06064

work page arXiv 2024
[7]

A survey on explainable AI narratives based on large language models,

M. Silvestri, V . Vineis, E. Gabrielli, F. Giorgi, F. Veglianti, F. Silvestri, and G. Tolomei, “A survey on explainable AI narratives based on large language models,”TechRxiv, Nov. 2025

2025
[8]

XAI for all: Can large language models simplify explainable AI?

P. Mavrepiset al., “XAI for all: Can large language models simplify explainable AI?”Comput. Sci. Inf. Technol., vol. 14, no. 22, p. 127, 2024

2024
[9]

Using LLMs to explain AI-generated art classification via grad-cam heatmaps,

G. Castellano, M. G. Miccoli, R. Scaringi, G. Vessio, and G. Zaza, “Using LLMs to explain AI-generated art classification via grad-cam heatmaps,” inProc. 5th Italian Workshop on Explainable Artificial Intelligence (XAI.it), ser. CEUR Workshop Proceedings, vol. 3839, 2024, pp. 65–74. [Online]. Available: https://ceur-ws.org/V ol-3839/paper5.pdf

2024
[10]

Enhancing the interpretability of SHAP values using large language models,

X. Zeng, “Enhancing the interpretability of SHAP values using large language models,” 2024,arXiv:2409.00079

work page arXiv 2024
[11]

Why Language Models Hallucinate

A. T. Kalai, O. Nachum, S. S. Vempala, and E. Zhang, “Why language models hallucinate,” 2025,arXiv:2509.04664

work page internal anchor Pith review arXiv 2025
[12]

Augmenting XAI with LLMs: A case study in banking marketing recommendation,

A. Castelnovoet al., “Augmenting XAI with LLMs: A case study in banking marketing recommendation,” inExplainable Artificial Intel- ligence, ser. Communications in Computer and Information Science. Cham: Springer, 2024, pp. 211–229

2024
[13]

Rule2Text: A framework for generating and evaluating natural language explanations of knowledge graph rules,

N. Shirvani-Mahdavi and C. Li, “Rule2Text: A framework for generating and evaluating natural language explanations of knowledge graph rules,” 2025,arXiv:2508.10971

work page arXiv 2025
[14]

How good is my story? towards quantitative metrics for evaluating LLM-generated XAI narratives,

T. Ichmoukhamedov, J. Hinns, and D. Martens, “How good is my story? towards quantitative metrics for evaluating LLM-generated XAI narratives,” 2024,arXiv:2412.10220

work page arXiv 2024
[15]

Large language models are zero-shot reasoners,

T. Kojima, S. S. Gu, M. Reid, Y . Matsuo, and Y . Iwasawa, “Large language models are zero-shot reasoners,” inProc. 36th Conf. Neural Inf. Process. Syst. (NeurIPS), 2022

2022
[16]

Language models are few-shot learners,

T. B. Brownet al., “Language models are few-shot learners,” inProc. 34th Conf. Neural Inf. Process. Syst. (NeurIPS), 2020

2020
[17]

Meta prompting for agi systems,

Y . Zhang, Y . Yuan, and A. C.-C. Yao, “Meta prompting for AI systems,” 2023,arXiv:2311.11482

work page arXiv 2023
[18]

Retiring adult: New datasets for fair machine learning,

F. Ding, M. Hardt, J. Miller, and L. Schmidt, “Retiring adult: New datasets for fair machine learning,” inProc. 35th Conf. Neural Inf. Process. Syst. (NeurIPS), 2021

2021
[19]

gpt-oss-120b & gpt-oss-20b Model Card

S. Agarwalet al., “gpt-oss-120b & gpt-oss-20b Model Card,” 2025, arXiv:2508.10925

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

Deepseek-r1 incentivizes reasoning in LLMs through reinforcement learning,

D. Guoet al., “Deepseek-r1 incentivizes reasoning in LLMs through reinforcement learning,”Nature, vol. 645, pp. 633–638, 2025

2025
[21]

Qwen3 Technical Report

A. Yanget al., “Qwen3 Technical Report,” 2025,arXiv:2505.09388

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

A readability formula that saves time,

J. P. Kincaid, R. P. Fishburne, R. L. Rogers, and B. S. Chissom, “A readability formula that saves time,” Naval Air Station Memphis, Tech. Rep. Research Branch Report 8-75, 1975

1975
[23]

Learned hallucination detection in black-box LLMs using token-level entropy production rate,

C. Moslonkaet al., “Learned hallucination detection in black-box LLMs using token-level entropy production rate,” 2025,arXiv:2509.04492

work page arXiv 2025
[24]

Llm evaluators recognize and favor their own generations,

A. Panickssery, S. R. Bowman, and S. Feng, “Llm evaluators recognize and favor their own generations,” inProc. 38th Conf. Neural Inf. Process. Syst. (NeurIPS), 2024, p. 2197

2024