Recognition: 2 theorem links
· Lean TheoremDoc-PP: Document Policy Preservation Benchmark for Large Vision-Language Models
Pith reviewed 2026-05-16 16:27 UTC · model grok-4.3
The pith
Large vision-language models leak sensitive document information when synthesizing answers across text and images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Models exhibit a Reasoning-Induced Safety Gap by leaking sensitive information when answers require complex synthesis or aggregation across visual and textual elements, even under explicit non-disclosure policies; the DVA framework reduces this leakage by decoupling reasoning from policy verification.
What carries the argument
Doc-PP benchmark for multimodal policy preservation, together with the DVA structural inference framework that decomposes queries into separate reasoning and verification stages.
If this is right
- Current safety techniques fail to block leakage that arises only after models combine information from multiple modalities or steps.
- Providing OCR-extracted text improves perception yet raises the chance of policy violations.
- DVA delivers a stronger baseline for policy-compliant answers than ordinary prompting or direct generation.
- Document question-answering systems need safeguards that explicitly track and enforce inferred content rather than surface-level tokens.
Where Pith is reading between the lines
- The same leakage pattern is likely to appear in other high-stakes multimodal domains such as medical records or legal filings.
- Benchmarks could be extended to test policies that change with user identity or that span several linked documents.
- Training or prompting that forces explicit decomposition steps may lower leakage without large accuracy losses.
Load-bearing premise
The real-world reports chosen for the benchmark faithfully represent the range of dynamic, user-defined non-disclosure policies that occur in practice.
What would settle it
Testing the same models on a new collection of documents carrying fresh user-defined policies and checking whether the measured leakage rate falls substantially below the rates reported on Doc-PP.
read the original abstract
The deployment of Large Vision-Language Models (LVLMs) for real-world document question answering is often constrained by dynamic, user-defined policies that dictate information disclosure based on context. While ensuring adherence to these explicit constraints is critical, existing safety research primarily focuses on implicit social norms or text-only settings, overlooking the complexities of multimodal documents. In this paper, we introduce Doc-PP (Document Policy Preservation Benchmark), a novel benchmark constructed from real-world reports requiring reasoning across heterogeneous visual and textual elements under strict non-disclosure policies. Our evaluation highlights a systemic Reasoning-Induced Safety Gap: models frequently leak sensitive information when answers must be inferred through complex synthesis or aggregated across modalities, effectively circumventing existing safety constraints. Furthermore, we identify that providing extracted text improves perception but inadvertently facilitates leakage. To address these vulnerabilities, we propose DVA (Decompose-Verify-Aggregation), a structural inference framework that decouples reasoning from policy verification. Experimental results demonstrate that DVA significantly outperforms standard prompting defenses, offering a robust baseline for policy-compliant document understanding
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Doc-PP, a benchmark of real-world multimodal documents with dynamic non-disclosure policies, to evaluate LVLMs on document QA. It reports a Reasoning-Induced Safety Gap in which models leak sensitive information during complex cross-modal synthesis, shows that extracted text improves perception but increases leakage, and proposes the DVA (Decompose-Verify-Aggregation) framework that decouples reasoning from policy verification and outperforms standard prompting.
Significance. If the benchmark annotations prove reliable, the work is significant for exposing a concrete failure mode in current LVLM safety that is missed by text-only or implicit-norm evaluations. The DVA framework supplies a reproducible structural baseline for policy-compliant multimodal inference.
major comments (1)
- [Section 3] Section 3: Policy construction is described as manual extraction from real-world reports, yet no inter-annotator agreement statistics (Cohen’s kappa, Fleiss’ kappa) or conflict-resolution protocol are reported. Because the Reasoning-Induced Safety Gap claim rests entirely on the accuracy of these ground-truth policy-violation labels, the absence of agreement metrics leaves open the possibility that observed leakage reflects annotation inconsistency rather than model failure.
Simulated Author's Rebuttal
We thank the referee for the constructive comment on annotation reliability. We agree that inter-annotator agreement metrics are essential to substantiate the ground-truth policy-violation labels and will incorporate them in the revision.
read point-by-point responses
-
Referee: [Section 3] Section 3: Policy construction is described as manual extraction from real-world reports, yet no inter-annotator agreement statistics (Cohen’s kappa, Fleiss’ kappa) or conflict-resolution protocol are reported. Because the Reasoning-Induced Safety Gap claim rests entirely on the accuracy of these ground-truth policy-violation labels, the absence of agreement metrics leaves open the possibility that observed leakage reflects annotation inconsistency rather than model failure.
Authors: We acknowledge the validity of this concern. In the revised manuscript we will add a dedicated subsection in Section 3 reporting inter-annotator agreement on a randomly sampled subset of 200 documents. Two additional annotators independently labeled policy-violation spans; we will report Cohen’s kappa (pairwise) and Fleiss’ kappa (multi-annotator) together with the conflict-resolution protocol (majority vote followed by adjudication by a senior annotator for disagreements). These statistics will directly support the reliability of the Reasoning-Induced Safety Gap findings. revision: yes
Circularity Check
No significant circularity in benchmark or framework
full rationale
The paper introduces Doc-PP as an empirical benchmark constructed from real-world reports and proposes the DVA framework based on experimental evaluations of LVLMs. No equations, derivations, or mathematical claims are present in the provided text. The central claims about the Reasoning-Induced Safety Gap rest on observed model behaviors rather than any reduction to fitted parameters, self-definitions, or load-bearing self-citations. The work is self-contained as a benchmark evaluation without circular steps.
Axiom & Free-Parameter Ledger
invented entities (2)
-
Doc-PP benchmark
no independent evidence
-
DVA framework
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce Doc-PP ... DVA (Decompose–Verify–Aggregation), a structural inference framework that decouples reasoning from policy verification.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our evaluation highlights a systemic Reasoning-Induced Safety Gap
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Policy-Invisible Violations in LLM-Based Agents
LLM agents commit policy-invisible violations when policy facts are hidden from their context; a graph-simulation enforcer reaches 93% accuracy vs 68.8% for content-only baselines on a new 600-trace benchmark.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.