pith. sign in

arxiv: 2606.07113 · v1 · pith:ZC4LEEOHnew · submitted 2026-06-05 · 💻 cs.AI

Beyond Post-hoc Explanation: Toward Glassbox AI via Probabilistic Mediation

Pith reviewed 2026-06-27 22:11 UTC · model grok-4.3

classification 💻 cs.AI
keywords glassbox AIBayesian networksprobabilistic mediationante-hoc explainabilityaccountable AIgenerative modelscausal mediation
0
0 comments X

The pith

Bayesian networks can serve as ante-hoc mediation layers for generative models to enable auditable and contestable reasoning from the start.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that post-hoc explanations for large language models lack any formal tie to the actual reasoning process and remain unstable in high-stakes settings. It proposes replacing them with a Glassbox Framework that inserts Bayesian networks as mediation layers before inference begins. These networks encode domain knowledge, causal assumptions, and probabilistic dependencies to generate transparent traces and quantified uncertainty. A reader would care because current AI use in public administration, law, and healthcare makes institutional accountability legally necessary. The paper grounds the idea in a benefit eligibility example and lists open challenges in semantic alignment, dynamic construction, probabilistic grounding, and human governance.

Core claim

The central claim is that the absence of structured reasoning, not merely the absence of explanation, is the core problem; Bayesian networks can therefore act as transparent, ante-hoc mediation layers for generative models, encoding domain knowledge, causal assumptions, and probabilistic dependencies before inference occurs and thereby producing auditable reasoning traces, uncertainty quantification, and contestable outputs.

What carries the argument

The Glassbox Framework, in which Bayesian networks function as ante-hoc mediation layers that encode causal and probabilistic structure to mediate generative model outputs.

If this is right

  • Reasoning traces become directly tied to the inference process rather than reconstructed afterward.
  • Outputs gain contestability through explicit probabilistic dependencies that can be inspected and challenged.
  • Uncertainty quantification is produced as an inherent part of the mediation layer.
  • Applications such as benefit eligibility decisions can incorporate domain-specific causal knowledge upfront.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same mediation approach could be tested in other regulated domains where causal assumptions change over time.
  • Dynamic network construction would need methods that update encoded knowledge without retraining the generative model.
  • Human governance procedures would have to decide which causal assumptions get encoded and how disputes over them are resolved.

Load-bearing premise

Domain knowledge, causal assumptions, and probabilistic dependencies can be encoded into Bayesian networks that preserve generative model capabilities while enabling full auditability and contestability.

What would settle it

An experiment in which a Bayesian network mediation layer is added to a generative model on a benchmark task and either the model's accuracy drops substantially or the resulting reasoning traces do not match the actual factors that drove the model's outputs.

Figures

Figures reproduced from arXiv: 2606.07113 by Manuele Leonelli.

Figure 1
Figure 1. Figure 1: The Glassbox Framework. The governance layer (top) cycles through expert elicitation, DAG specification, audit, and revision. The inference layer (middle) mediates LLM-BN interaction via the semantic translation interface, with virtual soft evidence entering the BN and inconsistency flags routed back for re-query. The accountability layer (bottom) sequences the output trace through audit, contestation, rev… view at source ↗
Figure 2
Figure 2. Figure 2: The semantic alignment problem. Natural language expressions from a benefit eligibility document must be mapped [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A minimal BN for benefit eligibility reasoning. Four statutorily defined condition nodes feed into the central eligibility [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Large language models are rapidly becoming infrastructural components in high-stakes institutional settings, including public administration, legal reasoning, and healthcare, where opacity is not merely inconvenient but institutionally and legally untenable. Existing approaches to explainability are predominantly post-hoc, offering unstable, non-contestable accounts that have no formal relationship to the reasoning process that produced the output. We argue that the problem is not the absence of explanation but the absence of structured reasoning in the first place. This paper makes the case for a fundamentally different architecture, which we call the Glassbox Framework, in which Bayesian networks serve as transparent, ante-hoc mediation layers for generative models. Bayesian networks encode domain knowledge, causal assumptions, and probabilistic dependencies before inference occurs, enabling auditable reasoning traces, uncertainty quantification, and contestable outputs. We characterise the architecture of this framework and ground it in a benefit eligibility scenario, identifying the foundational challenges spanning semantic alignment, dynamic model construction, probabilistic grounding, and human governance that must be solved to realise it at scale. By shifting from post-hoc explanation to ante-hoc probabilistic mediation, this work outlines a principled path toward AI systems that are not only powerful but fundamentally accountable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims that post-hoc explanations for LLMs are inadequate for high-stakes domains and proposes the Glassbox Framework, in which Bayesian networks act as ante-hoc mediation layers between domain knowledge and generative models to deliver auditable reasoning traces, uncertainty quantification, and contestable outputs. It grounds the idea in a benefit eligibility scenario and explicitly lists four open challenges (semantic alignment, dynamic model construction, probabilistic grounding, and human governance) whose resolution is required for realization at scale.

Significance. If the architecture could be implemented while preserving model performance, it would offer a structural alternative to post-hoc methods and could meet institutional requirements for accountability. The paper's value is in clearly framing the shift from explanation to mediation and in naming the concrete technical and governance obstacles; however, with no formal model, implementation, or empirical test supplied, any significance remains conditional on future work.

major comments (2)
  1. [Abstract] Abstract: the central assertion that Bayesian networks 'enable auditable reasoning traces, uncertainty quantification, and contestable outputs' is presented without any specification of the mediation interface, the form of the joint distribution, or even a schematic of how the BN layer would constrain or audit a generative model; this absence makes the accountability claim impossible to evaluate.
  2. [benefit eligibility scenario] benefit eligibility scenario: the scenario is invoked to illustrate the framework yet supplies no concrete example of an input, a BN structure, a mediation step, or a contestation procedure, leaving the load-bearing claim that the architecture yields contestable outputs without any supporting illustration.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The manuscript is a position paper whose primary contribution is to articulate the shift from post-hoc explanation to ante-hoc probabilistic mediation and to name the concrete technical and governance obstacles that must be overcome. We respond to the major comments below, agreeing that additional illustrative material would improve clarity while preserving the paper's conceptual scope.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central assertion that Bayesian networks 'enable auditable reasoning traces, uncertainty quantification, and contestable outputs' is presented without any specification of the mediation interface, the form of the joint distribution, or even a schematic of how the BN layer would constrain or audit a generative model; this absence makes the accountability claim impossible to evaluate.

    Authors: We agree that the abstract states the intended capabilities at a high level without specifying the mediation interface, joint distribution, or a schematic. This is consistent with the paper's nature as a conceptual proposal that characterises the architecture at the level of principles and then enumerates the open challenges (semantic alignment, dynamic model construction, probabilistic grounding, and human governance) whose resolution would be required to define such interfaces. To address the concern, we will revise the abstract to qualify the claims as prospective and add a high-level schematic diagram of the mediation layer in the main text. revision: partial

  2. Referee: [benefit eligibility scenario] benefit eligibility scenario: the scenario is invoked to illustrate the framework yet supplies no concrete example of an input, a BN structure, a mediation step, or a contestation procedure, leaving the load-bearing claim that the architecture yields contestable outputs without any supporting illustration.

    Authors: The benefit eligibility scenario is used to situate the framework in a concrete institutional setting and to motivate the requirement for contestable outputs. Because the manuscript does not present an implemented system, it contains no specific input values, BN structure, or step-by-step mediation example. We accept that a more detailed illustrative walk-through would make the contestability claim easier to evaluate and will expand the scenario section accordingly in the revision. revision: partial

Circularity Check

0 steps flagged

No significant circularity; conceptual proposal without derivations or self-referential claims

full rationale

The paper is an explicit high-level position paper that proposes the Glassbox Framework as an architecture using Bayesian networks for ante-hoc mediation. It advances no equations, derivations, fitted parameters, or empirical predictions. The central argument is limited to characterizing the framework and immediately enumerating open challenges (semantic alignment, dynamic model construction, probabilistic grounding, human governance) whose resolution is left for future work. No load-bearing step reduces to its own inputs by construction, and no self-citations are invoked to justify uniqueness or forbid alternatives. The document is therefore self-contained as a conceptual outline.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper contributes a conceptual architecture and identifies open challenges but does not introduce new fitted parameters, formal axioms beyond domain assumptions, or entities with independent evidence.

axioms (1)
  • domain assumption Bayesian networks can serve as transparent mediation layers that encode domain knowledge, causal assumptions, and probabilistic dependencies before inference occurs in generative models
    This is the core premise of the Glassbox Framework stated in the abstract.
invented entities (1)
  • Glassbox Framework no independent evidence
    purpose: To provide ante-hoc probabilistic mediation for generative AI models using Bayesian networks
    The framework is introduced in this paper as a new architecture.

pith-pipeline@v0.9.1-grok · 5727 in / 1363 out tokens · 35389 ms · 2026-06-27T22:11:51.191201+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 4 canonical work pages · 3 internal anchors

  1. [1]

    Nature Machine Intelligence , volume=

    Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , author=. Nature Machine Intelligence , volume=. 2019 , publisher=

  2. [2]

    Journal of Machine Learning Research , volume=

    All models are wrong, but many are useful: Learning a variable's importance by studying an entire class of prediction models simultaneously , author=. Journal of Machine Learning Research , volume=

  3. [3]

    Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

    ``Why should I trust you?'' Explaining the predictions of any classifier , author=. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

  4. [4]

    Advances in Neural Information Processing Systems , volume=

    A unified approach to interpreting model predictions , author=. Advances in Neural Information Processing Systems , volume=

  5. [5]

    Advances in Neural Information Processing Systems , volume=

    Towards robust interpretability with self-explaining neural networks , author=. Advances in Neural Information Processing Systems , volume=

  6. [6]

    International Conference on Machine Learning , pages=

    Counterfactual off-policy evaluation with gumbel-max structural causal models , author=. International Conference on Machine Learning , pages=. 2019 , organization=

  7. [7]

    Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages=

    Algorithmic recourse: From counterfactual explanations to interventions , author=. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages=

  8. [8]

    2009 , publisher=

    Causality , author=. 2009 , publisher=

  9. [9]

    Risk assessment and decision analysis with

    Fenton, Norman and Neil, Martin , year=. Risk assessment and decision analysis with

  10. [10]

    A general structure for legal arguments about evidence using

    Fenton, Norman and Neil, Martin and Lagnado, David A , journal=. A general structure for legal arguments about evidence using. 2013 , publisher=

  11. [11]

    A method for explaining

    Vlek, Charlotte S and Prakken, Henry and Renooij, Silja and Verheij, Bart , journal=. A method for explaining. 2016 , publisher=

  12. [12]

    Towards A Rigorous Science of Interpretable Machine Learning

    Towards a rigorous science of interpretable machine learning , author=. arXiv preprint arXiv:1702.08608 , year=

  13. [13]

    Counterfactual explanations without opening the black box: Automated decisions and the

    Wachter, Sandra and Mittelstadt, Brent and Russell, Chris , journal=. Counterfactual explanations without opening the black box: Automated decisions and the. 2017 , publisher=

  14. [14]

    2024 , url =

    Artificial Intelligence Act , howpublished =. 2024 , url =

  15. [15]

    Artificial Intelligence Review , volume=

    Neurosymbolic ai: The 3 rd wave , author=. Artificial Intelligence Review , volume=. 2023 , publisher=

  16. [16]

    arXiv preprint arXiv:2002.06177 , year=

    The next decade in AI: four steps towards robust artificial intelligence , author=. arXiv preprint arXiv:2002.06177 , year=

  17. [17]

    On the Opportunities and Risks of Foundation Models

    On the opportunities and risks of foundation models , author=. arXiv preprint arXiv:2108.07258 , year=

  18. [18]

    Ethical and social risks of harm from Language Models

    Ethical and social risks of harm from language models , author=. arXiv preprint arXiv:2112.04359 , year=

  19. [19]

    Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages=

    On the dangers of stochastic parrots: Can language models be too big? , author=. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages=

  20. [20]

    Machine Learning and the City: Applications in Architecture and Urban Design , pages=

    A unified framework of five principles for AI in society , author=. Machine Learning and the City: Applications in Architecture and Urban Design , pages=. 2022 , publisher=

  21. [21]

    AI and Ethics , volume=

    A regulatory taxonomy of AI opacity in the EU: Rethinking transparency, traceability, interpretability, and explainability , author=. AI and Ethics , volume=. 2026 , publisher=

  22. [22]

    Advances in neural information processing systems , volume=

    Attention is all you need , author=. Advances in neural information processing systems , volume=

  23. [23]

    International Journal of Engineering Business Management , volume=

    Understanding support for AI regulation: A Bayesian network perspective , author=. International Journal of Engineering Business Management , volume=. 2025 , publisher=

  24. [24]

    Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society , volume=

    Will AI take my job? Evolving perceptions of automation and labor risk in Latin America , author=. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society , volume=

  25. [25]

    AI & Society , volume=

    Accountability in artificial intelligence: What it is and how it works , author=. AI & Society , volume=. 2024 , publisher=

  26. [26]

    The Knowledge Engineering Review , volume=

    Building large-scale Bayesian networks , author=. The Knowledge Engineering Review , volume=. 2000 , publisher=

  27. [27]

    2009 , publisher=

    Probabilistic graphical models: Principles and techniques , author=. 2009 , publisher=

  28. [28]

    Reliability Engineering & System Safety , volume=

    Computing Sobol indices in probabilistic graphical models , author=. Reliability Engineering & System Safety , volume=. 2022 , publisher=

  29. [29]

    Queue , volume=

    The mythos of model interpretability , author=. Queue , volume=. 2018 , publisher=

  30. [30]

    Artificial Intelligence , volume=

    Explanation in artificial intelligence: Insights from the social sciences , author=. Artificial Intelligence , volume=. 2019 , publisher=

  31. [31]

    Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages=

    Outlining traceability: A principle for operationalizing accountability in computing systems , author=. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages=

  32. [32]

    Closing the

    Raji, Inioluwa Deborah and Smart, Andrew and White, Rebecca N and Mitchell, Margaret and Gebru, Timnit and Hutchinson, Ben and Smith-Loud, Jamila and Theron, Daniel and Barnes, Parker , booktitle=. Closing the

  33. [33]

    Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency , pages=

    Fairness and abstraction in sociotechnical systems , author=. Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency , pages=

  34. [34]

    Global sensitivity analysis of uncertain parameters in

    Ballester-Ripoll, Rafael and Leonelli, Manuele , journal=. Global sensitivity analysis of uncertain parameters in. 2025 , publisher=

  35. [35]

    Principles alone cannot guarantee ethical

    Mittelstadt, Brent , journal=. Principles alone cannot guarantee ethical

  36. [36]

    Big Data & Society , volume=

    How the machine `thinks': Understanding opacity in machine learning algorithms , author=. Big Data & Society , volume=. 2016 , publisher=

  37. [37]

    Public Administration Review , volume=

    From street-level to system-level bureaucracies: How information and communication technology is transforming administrative discretion and constitutional control , author=. Public Administration Review , volume=. 2002 , publisher=

  38. [38]

    2018 , publisher=

    Automating inequality: How high-tech tools profile, police, and punish the poor , author=. 2018 , publisher=

  39. [39]

    Washington Law Review , volume=

    The scored society: Due process for automated predictions , author=. Washington Law Review , volume=

  40. [40]

    Duke Law & Technology Review , volume=

    Slave to the algorithm? Why a right to an explanation is probably not the remedy you are looking for , author=. Duke Law & Technology Review , volume=

  41. [41]

    Retrieval-augmented generation for knowledge-intensive

    Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K. Retrieval-augmented generation for knowledge-intensive. Advances in Neural Information Processing Systems , volume=

  42. [42]

    Technological Forecasting and Social Change , volume=

    The future of employment: How susceptible are jobs to computerisation? , author=. Technological Forecasting and Social Change , volume=. 2017 , publisher=

  43. [43]

    , author=

    Fears about artificial intelligence across 20 countries and six domains of application. , author=. American Psychologist , volume=. 2026 , publisher=

  44. [44]

    2017 , publisher=

    Elements of causal inference: Foundations and learning algorithms , author=. 2017 , publisher=

  45. [45]

    OR Spectrum , volume=

    Coherent combination of probabilistic outputs for group decision making: An algebraic approach , author=. OR Spectrum , volume=. 2020 , publisher=