Runtime Compliance Verification for AI Agents
Pith reviewed 2026-06-26 20:06 UTC · model grok-4.3
The pith
A runtime monitor enforces GDPR rules on AI agents by intercepting tool calls and rejecting actions that violate formal policy predicates on execution traces.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
C-Trace expresses a subset of GDPR requirements as formal policy predicates over agent execution traces, deploys a runtime monitor that intercepts every tool invocation and model output to reject non-compliant actions, and shows through case-study tests with attack dialogues that the monitor keeps attack success rate at or below 12 percent under 10 percent per-category extractor noise and reaches 0 percent under perfect extraction.
What carries the argument
The runtime monitor in C-Trace that intercepts tool invocations and model outputs and checks them against policy predicates defined on agent execution traces.
If this is right
- The monitor rejects non-compliant tool invocations and model outputs before they occur.
- Attack success stays at or below 12 percent and false positives at or below 16 percent with up to 10 percent noise from drop-out or over-typing.
- Attack success reaches exactly 0 percent when trace extraction is perfect.
- The same monitor works against both DSPy-generated prompts and verbatim prompts taken from red-teaming corpora.
- The approach is demonstrated on four case studies that have been reframed around GDPR obligations.
Where Pith is reading between the lines
- The predicate approach could be reused for other data-protection or sector-specific rules by writing new predicates rather than redesigning the monitor.
- Embedding the monitor directly in agent runtimes would turn compliance checking into a continuous property rather than a pre-release audit.
- The noise-tolerance numbers suggest the method remains useful even when trace extraction must rely on imperfect natural-language parsing of logs.
- Longer or more branched dialogues could be tested by extending the same trace predicates to capture sequential dependencies across many turns.
Load-bearing premise
GDPR requirements including consent, purpose limitation, data minimization, and the right to erasure can be expressed as formal policy predicates over agent execution traces without material loss of regulatory meaning or coverage.
What would settle it
A test case in which at least one GDPR obligation cannot be written as a predicate without losing an essential regulatory element, or a run of the four case studies in which attack success rate exceeds 12 percent under 10 percent extractor noise.
Figures
read the original abstract
AI agents now handle personal data through tool use, function calls, and multi turn dialogue, which can create obligations under the General Data Protection Regulation (GDPR). Current testing practices mainly rely on offline red teaming or static prompt review, but they do not guarantee at runtime that agent behavior follows regulatory rules. We propose C-Trace (Compliance Trace based Runtime Agent Conformance Enforcement), a verification framework that: (i) expresses a subset of GDPR requirements, including consent, purpose limitation, data minimization, and the right to erasure, as formal policy predicates over agent execution traces; (ii) uses a runtime monitor that intercepts every tool invocation and model output and rejects non-compliant actions; and (iii) tests the agent with attack dialogues, including DSPy generated prompts and verbatim prompts from red teaming corpora, that try to induce violations. We evaluate the framework on four case studies reframed to GDPR. Under 10 percent per-category extractor noise, including drop-out and over-typing, the monitor keeps the attack success rate at less than or equal to 12 percent, below the baselines we compare against, and false positives at less than or equal to 16 percent, and reaches 0 percent ASR under perfect extraction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes C-Trace, a runtime verification framework for AI agents handling personal data. It expresses a subset of GDPR requirements (consent, purpose limitation, data minimization, right to erasure) as formal policy predicates over execution traces, deploys a monitor that intercepts tool calls and outputs to reject violations, and evaluates the system on four reframed case studies using attack dialogues (DSPy-generated and red-teaming prompts). Under a 10% per-category extractor noise model the monitor achieves ASR ≤12% and FP ≤16%, with 0% ASR under perfect extraction, outperforming compared baselines.
Significance. If the policy predicates faithfully encode the cited GDPR provisions without material loss of regulatory scope or intent, and if the reported metrics are reproducible from the described case studies and noise model, the work would demonstrate a concrete runtime enforcement mechanism that addresses limitations of offline red-teaming for regulatory compliance in tool-using agents.
major comments (2)
- [Abstract] Abstract: the central claim that the framework verifies compliance with GDPR rests on the unargued assumption that the chosen predicates preserve the legal scope, edge cases, and intent of consent, purpose limitation, data minimization, and erasure. No mapping, coverage argument, or comparison to the regulatory text is supplied; all quantitative results are therefore evaluated only against the authors' predicates rather than the regulation itself. This assumption is load-bearing for the compliance-verification claim.
- [Abstract] Abstract and evaluation description: the headline metrics (ASR ≤12% under 10% noise, 0% under perfect extraction, FP ≤16%) are stated without any information on the four case studies, the precise baselines, how the predicates are implemented, or how the 10% per-category noise model (including drop-out and over-typing) was constructed. These omissions prevent assessment of whether the reported robustness holds for the claimed regulatory setting.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below, clarifying the scope of our predicates and expanding the evaluation details. Where the comments identify gaps in the original submission, we commit to revisions that make the compliance claim more transparent without overstating regulatory coverage.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the framework verifies compliance with GDPR rests on the unargued assumption that the chosen predicates preserve the legal scope, edge cases, and intent of consent, purpose limitation, data minimization, and erasure. No mapping, coverage argument, or comparison to the regulatory text is supplied; all quantitative results are therefore evaluated only against the authors' predicates rather than the regulation itself. This assumption is load-bearing for the compliance-verification claim.
Authors: We agree that an explicit mapping is necessary to support the compliance-verification claim. Our predicates were derived directly from the core obligations in GDPR Articles 5, 6, 7, 17, and 25, but the original manuscript did not include a side-by-side comparison. In the revised version we will add a new subsection (Section 3.3) that (i) quotes the relevant regulatory text for each predicate, (ii) states the precise scope we formalize versus what we intentionally leave out (e.g., dynamic consent withdrawal mechanics and cross-border transfer rules), and (iii) discusses edge cases such as inferred consent from context. This revision makes the coverage argument explicit rather than assumed. revision: yes
-
Referee: [Abstract] Abstract and evaluation description: the headline metrics (ASR ≤12% under 10% noise, 0% under perfect extraction, FP ≤16%) are stated without any information on the four case studies, the precise baselines, how the predicates are implemented, or how the 10% per-category noise model (including drop-out and over-typing) was constructed. These omissions prevent assessment of whether the reported robustness holds for the claimed regulatory setting.
Authors: The abstract is length-constrained, yet the full manuscript already supplies the requested information in Sections 4 and 5: the four case studies are explicitly named and reframed (healthcare booking, financial advice, e-commerce personalization, social-media moderation); baselines are defined in 5.2 (static prompt guardrails and offline red-teaming); predicate implementation is given in 3.2 (trace predicates evaluated on tool-call and output streams); and the noise model is detailed in 4.3 (independent 10 % per-category dropout plus character-level over-typing drawn from observed LLM extraction errors). To address the referee's concern about accessibility, we will (i) insert a one-paragraph summary of these elements immediately after the metrics in the abstract and (ii) add a summary table in Section 4 that lists case-study domains, predicate counts, and noise parameters. This constitutes a partial revision focused on presentation rather than new experiments. revision: partial
Circularity Check
No significant circularity detected; framework is empirically evaluated artifact
full rationale
The paper constructs C-Trace by defining policy predicates over traces for a GDPR subset, implements an intercepting runtime monitor, and reports empirical metrics (ASR ≤12% under 10% noise, 0% under perfect extraction) from direct testing on attack dialogues and reframed case studies. No equations, fitted parameters, or self-citations are present that reduce any claimed result to its own inputs by construction. The performance numbers are obtained from evaluation against the authors' chosen predicates rather than being statistically forced or renamed from prior fits. The derivation chain is self-contained as a new artifact with independent test data.
Axiom & Free-Parameter Ledger
free parameters (1)
- 10 percent per-category extractor noise
axioms (1)
- domain assumption GDPR requirements can be expressed as formal policy predicates over agent execution traces
invented entities (1)
-
C-Trace runtime monitor
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Public law104(191), 1–16 (1996)
Act, A., et al.: Health insurance portability and accountability act of 1996. Public law104(191), 1–16 (1996)
1996
-
[2]
In: European Symposium on Research in Computer Security
Arfelt, E., Basin, D., Debois, S.: Monitoring the gdpr. In: European Symposium on Research in Computer Security. pp. 681–699. Springer (2019)
2019
-
[3]
In: Lectures on Runtime Verification: Introductory and Advanced Topics, pp
Bartocci, E., Deshmukh, J., Donzé, A., Fainekos, G., Maler, O., Ničković, D., Sankaranarayanan, S.: Specification-based monitoring of cyber-physical systems: a survey on theory, tools and applications. In: Lectures on Runtime Verification: Introductory and Advanced Topics, pp. 135–175. Springer (2018)
2018
-
[4]
In: IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (2008)
Basin, D., Klaedtke, F., Müller, S., Pfitzmann, B.: Runtime monitoring of metric first-order temporal properties. In: IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (2008). pp. 49–60. Schloss Dagstuhl–Leibniz-Zentrum für Informatik (2008)
2008
-
[5]
arXiv preprint arXiv:2209.07858 (2022)
Ganguli, D., Lovitt, L., Kernion, J., Askell, A., Bai, Y., Kadavath, S., Mann, B., Perez, E., Schiefer, N., Ndousse, K., et al.: Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858 (2022)
Pith/arXiv arXiv 2022
-
[6]
In: Proceedings of the 16th ACM workshop on artificial intelligence and security
Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., Fritz, M.: Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In: Proceedings of the 16th ACM workshop on artificial intelligence and security. pp. 79–90 (2023)
2023
-
[7]
GuardrailsAI:Guardrails:Aframeworkforvalidatingllmoutputs.In:OpenSource Project (2023)
2023
-
[8]
In: International Conference on Runtime Verification
Havelund, K., Peled, D.: Runtime verification: from propositional to first-order temporal logic. In: International Conference on Runtime Verification. pp. 90–112. Springer (2018)
2018
-
[9]
Nejm Ai 2(9), AIdbp2500144 (2025)
Jiang, Y., Black, K.C., Geng, G., Park, D., Zou, J., Ng, A.Y., Chen, J.H.: Meda- gentbench: a virtual ehr environment to benchmark medical llm agents. Nejm Ai 2(9), AIdbp2500144 (2025)
2025
-
[10]
The journal of logic and algebraic programming78(5), 293–303 (2009)
Leucker, M., Schallhart, C.: A brief account of runtime verification. The journal of logic and algebraic programming78(5), 293–303 (2009)
2009
-
[11]
In: ICML (2024)
Mazeika, M., Phan, L., Yin, X., Zou, A., Wang, Z., Mu, N., Sakhaee, E., Li, N., Basart, S., Li, B., Forsyth, D., Hendrycks, D.: Harmbench: A standardized evalu- ation framework for automated red teaming and robust refusal. In: ICML (2024)
2024
-
[12]
Microsoft: Microsoft presidio: Context aware, pluggable and customizable pii anonymization service.https://github.com/microsoft/presidio(2024)
2024
-
[13]
OpenAI: Swarm: Airline customer-service reference agent.https://github.com/ openai/swarm/tree/main/examples/airline(2024)
2024
-
[14]
OpenAI: Swarm: Personal shopper reference agent.https://github.com/openai/ swarm/tree/main/examples/personal_shopper(2024)
2024
-
[15]
In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Perez, E., Huang, S., Song, F., Cai, T., Ring, R., Aslanides, J., Glaese, A., McAleese, N., Irving, G.: Red teaming language models with language models. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. pp. 3419–3448 (2022)
2022
-
[16]
Official Journal of the European Union (2016)
Protection, B.P.N.G.D.: Regulation (gdpr). Official Journal of the European Union (2016)
2016
-
[17]
Rebedea, T., Dinu, R., Sreedhar, M., Parisien, C., Cohen, J.: Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails. arXiv:2310.10501 (2023) 18 N. Kahani et al
arXiv 2023
-
[18]
In: CNCF Technical Documentation (2019)
Sandall, T.: Open policy agent: Policy-based control for cloud native environments. In: CNCF Technical Documentation (2019)
2019
-
[19]
In: NeurIPS (2023)
Schick, T., Dwivedi-Yu, J., Dessi, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., Scialom, T.: Toolformer: Language models can teach themselves to use tools. In: NeurIPS (2023)
2023
-
[20]
Do Anything Now
Shen, X., Chen, Z., Backes, M., Shen, Y., Zhang, Y.: “Do Anything Now”: Char- acterizing and evaluating in-the-wild jailbreak prompts on large language models. In: ACM CCS (2024)
2024
-
[21]
In: ICLR (2023)
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: React: Synergizing reasoning and acting in language models. In: ICLR (2023)
2023
-
[22]
Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J.Z., Fredrikson, M.: Universal and transferable adversarial attacks on aligned language models. In: arXiv:2307.15043 (2023)
Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.