arxiv: 2604.11065 · v1 · submitted 2026-04-13 · 💻 cs.AI

Recognition: unknown

AI Integrity: A New Paradigm for Verifiable AI Governance

Seulki Lee

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:46 UTC · model grok-4.3

classification 💻 cs.AI

keywords AI IntegrityAuthority StackVerifiable AI GovernanceIntegrity HallucinationPRISM FrameworkAI EthicsAI SafetyAI Alignment

0 comments

The pith

AI Integrity protects an AI system's four-layer Authority Stack from corruption through verifiable process auditing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes AI Integrity as a distinct governance paradigm that verifies the internal reasoning process of AI systems instead of evaluating only their outcomes. It models this process as a four-layer Authority Stack where normative values shape epistemic standards, which then guide source selection and data criteria. Protection against Authority Pollution and Integrity Hallucination ensures the path from evidence to conclusion remains transparent and auditable regardless of the values involved. This matters for high-stakes domains like healthcare and law because undetected shifts in reasoning layers can introduce bias without changing final outputs. The approach uses the PRISM framework to operationalize measurement while remaining procedural rather than prescriptive about which values are correct.

Core claim

AI Integrity is defined as the state in which the Authority Stack—its layered hierarchy of normative values, epistemological standards, source preferences, and data selection criteria—is protected from corruption, contamination, manipulation, and bias and maintained in a verifiable manner. This paradigm differs from AI Ethics, Safety, and Alignment by focusing on the reasoning cascade itself rather than outcomes, with Integrity Hallucination identified as the central measurable threat to value consistency, operationalized through the PRISM framework's six core metrics.

What carries the argument

The Authority Stack, a four-layer cascade model (Normative Authority grounded in Schwartz Basic Human Values, Epistemic Authority via Walton argumentation schemes and GRADE/CEBM hierarchies, Source Authority from Source Credibility Theory, and Data Authority) that carries the argument by distinguishing legitimate value cascading from Authority Pollution.

If this is right

Governance shifts from auditing final outputs to checking consistency across the full reasoning cascade.
The PRISM framework supplies six metrics that quantify Integrity Hallucination as a detectable and addressable problem.
AI systems can be audited for procedural integrity without requiring agreement on specific normative values.
High-stakes applications gain auditable trails from evidence to conclusion that existing paradigms do not provide.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This procedural focus could be layered onto existing safety and alignment techniques to add verifiable process checks.
Regulatory requirements in medicine or defense might eventually mandate logging of the authority cascade for compliance.
Applying the metrics to current models could expose patterns of authority pollution missed by outcome-based evaluations.

Load-bearing premise

The proposed four-layer Authority Stack accurately models real AI reasoning processes so that Integrity Hallucination can serve as the central measurable threat.

What would settle it

An empirical test on deployed AI systems showing that their reasoning does not follow a detectable cascade from normative values through epistemic standards to sources and data, or that the PRISM metrics fail to identify inconsistencies in value consistency.

read the original abstract

AI systems increasingly shape high-stakes decisions in healthcare, law, defense, and education, yet existing governance paradigms -- AI Ethics, AI Safety, and AI Alignment -- share a common limitation: they evaluate outcomes rather than verifying the reasoning process itself. This paper introduces AI Integrity, a concept defined as a state in which the Authority Stack of an AI system -- its layered hierarchy of values, epistemological standards, source preferences, and data selection criteria -- is protected from corruption, contamination, manipulation, and bias, and maintained in a verifiable manner. We distinguish AI Integrity from the three existing paradigms, define the Authority Stack as a 4-layer cascade model (Normative, Epistemic, Source, and Data Authority) grounded in established academic frameworks -- Schwartz Basic Human Values for normative authority, Walton argumentation schemes with GRADE/CEBM hierarchies for epistemic authority, and Source Credibility Theory for source authority -- characterize the distinction between legitimate cascading and Authority Pollution, and identify Integrity Hallucination as the central measurable threat to value consistency. We further specify the PRISM (Profile-based Reasoning Integrity Stack Measurement) framework as the operational methodology, defining six core metrics and a phased research roadmap. Unlike normative frameworks that prescribe which values are correct, AI Integrity is a procedural concept: it requires that the path from evidence to conclusion be transparent and auditable, regardless of which values a system holds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a conceptual proposal for AI Integrity that defines a four-layer Authority Stack and PRISM metrics but stays entirely at the level of new terminology without examples or tests.

read the letter

The paper's main move is to define AI Integrity as the protected state of an AI system's Authority Stack—its cascade from normative values through epistemic standards, sources, and data selection—and to treat Integrity Hallucination as the central threat to that stack. It positions this as a procedural alternative to outcome-focused work in ethics, safety, and alignment, drawing on Schwartz values, Walton schemes, and Source Credibility Theory for the layers and sketching a six-metric PRISM measurement approach plus a research roadmap. The combination of these pieces into one named framework is new relative to the cited literature, and the emphasis on auditable process over prescribed values is drawn cleanly. That gives the piece a coherent internal logic as a modeling proposal. The soft spots are straightforward: there are no worked examples of the stack in any actual model, no data or computation showing how the PRISM metrics would be applied or scored, and no formal checks on whether the four layers match real AI reasoning flows. The distinction between legitimate cascading and Authority Pollution therefore rests on the authors' own definitions, which makes the framework hard to apply or compare externally right now. This is for readers who follow AI governance literature and want to see new conceptual scaffolding before empirical work begins. It is not yet useful for practitioners needing concrete tools. A serious referee could press on operationalization and overlap with existing auditability ideas, so I would send it to peer review rather than desk reject.

Referee Report

1 major / 2 minor

Summary. The paper claims to introduce 'AI Integrity' as a new paradigm for AI governance that focuses on protecting the Authority Stack of an AI system from corruption and bias in a verifiable, procedural manner. The Authority Stack is modeled as a four-layer cascade (Normative based on Schwartz values, Epistemic based on Walton schemes and GRADE/CEBM, Source based on Credibility Theory, and Data), with distinctions from AI Ethics, Safety, and Alignment. It identifies 'Authority Pollution' and 'Integrity Hallucination' as key issues and proposes the PRISM framework with six core metrics and a research roadmap.

Significance. If the proposed framework holds and can be implemented, it could offer a valuable shift in AI governance towards process-oriented verification rather than outcome assessment, potentially aiding in high-stakes applications. The grounding in established academic frameworks is a strength, providing credibility to the conceptual model. However, the lack of empirical validation or formalization means its significance is prospective and depends on subsequent research as outlined in the roadmap.

major comments (1)

[Authority Stack model] The assumption that the four-layer Authority Stack accurately models real AI reasoning processes is central to the proposal but remains untested; providing at least one detailed hypothetical example of how the stack would be applied and protected in a concrete AI decision scenario would help substantiate this modeling choice.

minor comments (2)

The six core metrics of the PRISM framework should be explicitly listed and defined in a table or dedicated subsection to enhance the operational methodology's clarity.
Ensure all referenced frameworks (e.g., specific Walton argumentation schemes, GRADE/CEBM hierarchies) have precise citations to allow readers to trace the grounding.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments and recommendation. We address the major comment point by point below.

read point-by-point responses

Referee: The assumption that the four-layer Authority Stack accurately models real AI reasoning processes is central to the proposal but remains untested; providing at least one detailed hypothetical example of how the stack would be applied and protected in a concrete AI decision scenario would help substantiate this modeling choice.

Authors: We agree that an illustrative example would strengthen the exposition of the Authority Stack. The manuscript presents the four-layer model as a conceptual synthesis grounded in established frameworks (Schwartz Basic Human Values, Walton argumentation schemes with GRADE/CEBM, and Source Credibility Theory) rather than an empirically validated representation of all AI reasoning. The paper explicitly frames AI Integrity as a procedural paradigm whose full validation is part of the outlined research roadmap. In the revised manuscript we will add a detailed hypothetical example, such as an AI system supporting a clinical treatment recommendation. The example will trace the cascade from normative authority (e.g., patient-autonomy and beneficence values) through epistemic authority (evidence hierarchies), source authority (credibility of medical literature), and data authority, while showing how each layer is protected against Authority Pollution (e.g., via auditable source selection and consistency checks) and how Integrity Hallucination would be detected. This addition will clarify the distinction between legitimate cascading and corruption without changing the paper's conceptual focus. revision: yes

Circularity Check

0 steps flagged

Conceptual proposal grounded in external literature; no circular derivation

full rationale

The paper introduces AI Integrity as a definitional concept and the Authority Stack as a 4-layer model explicitly grounded in external established frameworks (Schwartz Basic Human Values, Walton argumentation schemes with GRADE/CEBM, Source Credibility Theory). PRISM is presented as an operational methodology specifying six metrics and a research roadmap, without any mathematical derivations, fitted parameters, quantitative predictions, or self-referential equations. The distinction from existing paradigms is procedural rather than outcome-derived. No load-bearing self-citations, ansatzes smuggled via prior work, or reductions of claims to internal definitions are present; the contribution is a modeling proposal that remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The paper rests on new definitional constructs without independent empirical or formal support; it draws from but does not derive from the cited external frameworks.

axioms (1)

domain assumption The 4-layer cascade model (Normative, Epistemic, Source, Data Authority) accurately represents AI reasoning hierarchies
Invoked in the definition of Authority Stack and grounded in Schwartz, Walton, and Source Credibility Theory but not proven for AI systems

invented entities (3)

AI Integrity no independent evidence
purpose: New procedural state for verifiable governance
Defined as protection of the Authority Stack; no independent falsifiable handle outside the paper
Integrity Hallucination no independent evidence
purpose: Central measurable threat to value consistency
Introduced as the key risk; no external validation or measurement protocol supplied
PRISM framework no independent evidence
purpose: Operational methodology with six core metrics
Proposed as the measurement tool; details not provided in abstract

pith-pipeline@v0.9.0 · 5534 in / 1410 out tokens · 27483 ms · 2026-05-10T15:46:40.725689+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 3 canonical work pages · 2 internal anchors

[1]

Amodei, D., et al. (2016). Concrete problems in AI safety.arXiv:1606.06565

work page internal anchor Pith review arXiv 2016
[2]

Bai, Y ., et al. (2022). Training a helpful and harmless assistant with RLHF.arXiv:2204.05862

work page internal anchor Pith review Pith/arXiv arXiv 2022
[3]

European Parliament. (2024). Regulation (EU) 2024/1689 (AI Act)

2024
[4]

Floridi, L., & Cowls, J. (2019). A unified framework of five principles for AI in society.Harvard Data Science Review, 1(1)

2019
[5]

Gabriel, I. (2020). Artificial intelligence, values, and alignment.Minds and Machines, 30(3), 411–437

2020
[6]

Grant, N. (2024). Google pauses Gemini AI image generator after historical inaccuracies.The New York Times, Feb. 22

2024
[7]

I., Janis, I

Hovland, C. I., Janis, I. L., & Kelley, H. H. (1953).Communication and Persuasion. Yale University Press

1953
[8]

(2019).Ethically Aligned Design, 1st ed

IEEE. (2019).Ethically Aligned Design, 1st ed

2019
[9]

Jahn, F., et al. (2026). Breaking up with normatively monolithic agency with GRACE.arXiv:2601.10520

work page arXiv 2026
[10]

Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of AI ethics guidelines.Nature Machine Intelligence, 1(9), 389–399

2019
[11]

Lee, S. (2026b). Measuring AI value priorities: Empirical analysis of forced-choice responses across AI models.Preprint
[12]

Lee, S. (2026c). PRISM Risk Signal Framework: Hierarchy-based red lines for AI behavioral risk. Preprint
[13]

NIST. (2023). AI Risk Management Framework (AI RMF 1.0). NIST AI 100-1

2023
[14]

Ouyang, L., et al. (2022). Training language models to follow instructions with human feedback. NeurIPS 2022. Preprint — April 2026 13

2022
[15]

Pornpitakpan, C. (2004). The persuasiveness of source credibility: A critical review of five decades’ evidence.Journal of Applied Social Psychology, 34(2), 243–281

2004
[16]

Schwartz, S. H. (1992). Universals in the content and structure of values.Advances in Experimental Social Psychology, 25, 1–65

1992
[17]

Schwartz, S. H. (2012). An overview of the Schwartz theory of basic values.Online Readings in Psychology and Culture, 2(1)

2012
[18]

H., et al

Schwartz, S. H., et al. (2012). Refining the theory of basic individual values.Journal of Personality and Social Psychology, 103(4), 663–688

2012
[19]

J., et al

Thirunavukarasu, A. J., et al. (2023). Large language models in medicine.Nature Medicine, 29(8), 1930–1940

2023
[20]

(2008).Argumentation Schemes

Walton, D., Reed, C., & Macagno, F. (2008).Argumentation Schemes. Cambridge University Press

2008