pith. sign in

arxiv: 2605.20734 · v1 · pith:2BF32T5Cnew · submitted 2026-05-20 · 💻 cs.CR · cs.AI

An Application-Layer Multi-Modal Covert-Channel Reference Monitor for LLM Agent Egress

Pith reviewed 2026-05-21 04:33 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords covert channelsLLM agentsegress monitoringreference monitormulti-modal securitycryptographic attestationcapacity reduction
0
0 comments X

The pith

A reference monitor for LLM agent egress drives residual covert-channel capacity to zero on all destroyable carriers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes a system that stops LLM agents from leaking information by hiding bits inside ordinary-looking messages or media. It applies a ten-stage text pipeline with a leaky-bucket ledger, plus specialized scramblers for audio and images that only skip processing when a boot-time Ed25519 signature matches an authorized kind and data class. Residual capacity is measured as Miller-Madow corrected mutual information between what an adversary tries to embed and what can later be recovered, and the implementation reduces this quantity to zero wherever destruction is possible. A reader would care because standard allowlists and content scanners miss encodings such as zero-width characters, JSON key order, LSB planes, or inter-image permutations.

Core claim

The reference implementation drives residual capacity to zero on every destroyable channel and to a stated bound on the one (per-image mean luminance) that cannot be destroyed without ruining the image. This is achieved with a text pipeline of ten capacity-reducing stages and per-sink leaky-bucket ledger, two media scramblers (Fourier-domain audio band-limiter and RGB image bit-depth and mean-luminance bucketer) that are gated by a boot-time Ed25519 attestation of trusted keys and {kind, data-class} pairs, a content-addressed canonicalizer to close permutation channels, and measurement by an adversarial ensemble of fifteen encoders across modalities.

What carries the argument

The multi-modal egress reference monitor that combines staged lossless text reductions, attestation-gated media scramblers, and mutual-information capacity measurement.

If this is right

  • LLM agents lose the ability to exfiltrate data through text encodings such as homoglyphs or whitespace once the ten-stage pipeline is applied.
  • Unsigned or mismatched media is blocked or scrambled by default, removing the need for content-based classification of real versus sonified payloads.
  • Inter-image permutation channels are closed by the content-addressed canonicalizer before any scrambling occurs.
  • Capacity is quantified to zero on destroyable channels and to an explicit bound on the luminance channel that must remain intact.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same attestation-plus-scrambler pattern could be applied to other agent output types such as network packets or API responses.
  • A production deployment would need to measure end-to-end latency and false-positive rates on real LLM traffic.
  • The stated luminance bound implies a tunable security-versus-quality trade-off that future work could optimize with different bucketing parameters.

Load-bearing premise

Boot-time Ed25519 keys and {kind, data-class} pairs can be published and verified so that only legitimate media is exempted while all other payloads are treated as suspect, without creating new covert channels or false blocks.

What would settle it

An experiment in which an adversarial encoder embeds recoverable bits into an attested image and recovers them above the stated mean-luminance bound after the monitor has processed the image would falsify the residual-capacity claim.

Figures

Figures reproduced from arXiv: 2605.20734 by Alfredo Metere.

Figure 1
Figure 1. Figure 1: Text covert-channel examples and the canonicalizer’s response. Four payload-level [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Design-space probe images. The same 128×128 cover under twelve representative defense settings. The cover is a smooth additive RGB gradient — R, G, and B all grow simultaneously along different axes, so the visible result is a continuous diagonal traversal of the gamut rather than separable red, green, and blue bands: from dark blue-grey in the top-left (all three channels low) through magenta-ish (top-rig… view at source ↗
Figure 3
Figure 3. Figure 3: Audio-scrambler spectral defense in action. Magnitude spectra (dB relative to the loudest [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
read the original abstract

A large language model (LLM) agent that sends messages can leak data inside them. Destination allowlists and content scanners do not police whether an otherwise-benign payload is itself a covert channel: a compromised agent encodes bits in zero-width characters, homoglyphs, whitespace, base64, JavaScript Object Notation (JSON) key ordering, message timing or size -- and, in binary egress, in least-significant-bit (LSB) pixel planes, per-image mean luminance, inter-image sequence permutation, ultrasonic tones, or audible-band sonified data. Our egress reference monitor has three contributions. (i) A text pipeline of ten capacity-reducing stages, a per-sink leaky-bucket capacity ledger, and a staged posture that enforces lossless stages from day one. (ii) Two media scramblers (a Fourier-domain audio band-limiter and a red-green-blue (RGB) image bit-depth and mean-luminance bucketer) gated by a boot-time cryptographic legitimacy attestation: an auditor publishes at boot the trusted Ed25519 keys and {kind, data-class} pairs; only payloads with a verifying signature for an authorized class are exempt. The attestation sidesteps the intractable content-based discrimination between real media and data sonified or rasterized as a carrier; unsigned media is suspect by default; a content-addressed canonicalizer closes the inter-image permutation channel. (iii) Residual capacity is the Miller--Madow corrected mutual information between embedded and recovered bits (zero when destroyed), measured by an adversarial ensemble of fifteen working encoders across text, image and audio. The reference implementation drives residual capacity to zero on every destroyable channel and to a stated bound on the one (per-image mean luminance) that cannot be destroyed without ruining the image.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript describes an application-layer reference monitor for LLM agent egress that aims to eliminate covert channels across text, image, and audio. It contributes (i) a text pipeline of ten capacity-reducing stages with a per-sink leaky-bucket ledger and staged enforcement, (ii) Fourier-domain audio band-limiting and RGB image bit-depth/mean-luminance bucketing gated by boot-time Ed25519 attestation of trusted keys and {kind, data-class} pairs (with unsigned media treated as suspect and a content-addressed canonicalizer for permutation channels), and (iii) residual capacity measured as Miller-Madow corrected mutual information between embedded and recovered bits via an adversarial ensemble of fifteen encoders. The reference implementation is asserted to drive this residual capacity to zero on all destroyable channels and to a stated bound on the non-destroyable per-image mean luminance channel.

Significance. If the empirical claims hold under the stated measurement protocol, the work provides a concrete, deployable defense against a broad class of covert exfiltration vectors that evade destination allowlists and content scanners. The cryptographic attestation mechanism to exempt legitimate media without content-based discrimination, combined with the adversarial ensemble evaluation and leaky-bucket accounting, represents a practical engineering contribution that could serve as a reference for securing LLM agents. Credit is due for the explicit use of Miller-Madow correction and the multi-modal scope.

major comments (3)
  1. [Abstract] Abstract: the central claim that the reference implementation drives residual capacity to zero on every destroyable channel provides no implementation details, error analysis, or verification that the ensemble of fifteen encoders covers all possible channels (including timing, size, JSON ordering, and sonification variants).
  2. [Abstract] Abstract: the stated bound on residual capacity for the per-image mean luminance channel is asserted without an independent derivation, justification of the bound value, or analysis of how the bucketer interacts with image usability constraints.
  3. [Attestation mechanism] Attestation and exemption mechanism: the claim that boot-time Ed25519 attestation plus {kind, data-class} pairs reliably exempts only legitimate media without introducing new covert channels (via key selection, publication timing, or canonicalizer metadata) lacks a concrete protocol description or zero-leakage argument.
minor comments (2)
  1. [Abstract] Abstract: the text pipeline is described as having 'ten capacity-reducing stages' but the stages are not enumerated; a table or explicit list would improve clarity and allow readers to assess coverage.
  2. [Evaluation] The Miller-Madow estimator is invoked as standard, yet its specific application (binning, sample size, correction term) to the recovered-bit measurements should be stated explicitly for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and for acknowledging the significance of our application-layer reference monitor for LLM agent egress. We address each major comment in detail below, indicating the revisions planned for the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the reference implementation drives residual capacity to zero on every destroyable channel provides no implementation details, error analysis, or verification that the ensemble of fifteen encoders covers all possible channels (including timing, size, JSON ordering, and sonification variants).

    Authors: The manuscript's full text details the implementation of the ten capacity-reducing stages in the text pipeline, the per-sink leaky-bucket ledger, and the staged enforcement. The adversarial ensemble of fifteen encoders is specified to cover a range of channels, explicitly including timing, size, JSON ordering, and sonification variants as part of the evaluation across text, image, and audio. The Miller-Madow correction provides the statistical error analysis for the mutual information estimates. To address the concern about the abstract, we will revise it to include a concise reference to these elements and the comprehensive nature of the encoder ensemble. revision: yes

  2. Referee: [Abstract] Abstract: the stated bound on residual capacity for the per-image mean luminance channel is asserted without an independent derivation, justification of the bound value, or analysis of how the bucketer interacts with image usability constraints.

    Authors: The bound is based on the quantization introduced by the mean-luminance bucketer, which is designed to limit covert capacity while preserving image usability. We will add to the revised manuscript an independent derivation of the bound, including justification of its value derived from the bucketing granularity and an analysis of its interaction with usability constraints using standard metrics such as PSNR or perceptual quality assessments. revision: yes

  3. Referee: [Attestation mechanism] Attestation and exemption mechanism: the claim that boot-time Ed25519 attestation plus {kind, data-class} pairs reliably exempts only legitimate media without introducing new covert channels (via key selection, publication timing, or canonicalizer metadata) lacks a concrete protocol description or zero-leakage argument.

    Authors: The attestation protocol is outlined as an auditor publishing trusted Ed25519 keys and authorized {kind, data-class} pairs at boot time, with verification required for exemption and unsigned media treated as suspect. The content-addressed canonicalizer addresses permutation channels. We maintain that this setup avoids new covert channels because the keys and classes are fixed at boot and not selectable by the agent at runtime, timing is outside agent control, and metadata is minimized. However, we will provide a more detailed step-by-step protocol description and a formal argument for the absence of leakage in the revised manuscript. revision: partial

Circularity Check

0 steps flagged

No circularity: results are empirical measurements on an implemented monitor

full rationale

The paper describes concrete pipelines (ten text stages, Fourier audio limiter, RGB bit-depth and luminance bucketer) plus a boot-time Ed25519 attestation mechanism, then reports measured residual capacity via Miller-Madow corrected mutual information on an adversarial encoder ensemble. No equations are presented that reduce a claimed first-principles result to the inputs by construction, no parameters are fitted on a subset and then relabeled as predictions, and no self-citations or uniqueness theorems are invoked to justify core choices. The stated bound on the luminance channel is presented as an empirical limit required to preserve image utility rather than a tautological re-expression of the measurement definition itself. The derivation chain is therefore self-contained and externally falsifiable through the described implementation and measurement procedure.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract provides limited detail on internal parameters or assumptions; the leaky-bucket ledger and per-sink capacities are mentioned without explicit values or derivation.

free parameters (1)
  • per-sink leaky-bucket capacity
    Capacity limits per destination sink are referenced but no numerical values or fitting procedure are given in the abstract.
axioms (1)
  • domain assumption Cryptographic attestation at boot can be trusted to publish valid Ed25519 keys and class pairs
    The monitor relies on this to exempt signed media without content inspection.

pith-pipeline@v0.9.0 · 5847 in / 1271 out tokens · 25682 ms · 2026-05-21T04:33:37.459763+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 1 internal anchor

  1. [1]

    Elliott Bell and Leonard J

    D. Elliott Bell and Leonard J. LaPadula. Secure computer system: Unified exposition and multics interpretation. Technical Report MTR-2997 Rev. 1, The MITRE Corporation, 1976

  2. [2]

    Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin Yang

    Daniel J. Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin Yang. High-speed high-security signatures.Journal of Cryptographic Engineering, 2(2):77–89, 2012

  3. [3]

    Trojan source: Invisible vulnerabilities

    Nicholas Boucher and Ross Anderson. Trojan source: Invisible vulnerabilities. InUSENIX Security Symposium, 2023

  4. [4]

    Brodley, and Clay Shields

    Serdar Cabuk, Carla E. Brodley, and Clay Shields. Ip covert timing channels: Design and detection. In ACM Conference on Computer and Communications Security (CCS), pages 178–187, 2004

  5. [5]

    Zico Kolter, Jakob Foerster, and D

    Christian Schroeder de Witt, Samuel Sokota, J. Zico Kolter, Jakob Foerster, and D. J. Strouse. Perfectly secure steganography using minimum entropy coupling. InInternational Conference on Learning Representations (ICLR), 2023

  6. [6]

    Dorothy E. Denning. A lattice model of secure information flow.Communications of the ACM, 19(5): 236–243, 1976

  7. [7]

    Denning and Peter J

    Dorothy E. Denning and Peter J. Denning. Certification of programs for secure information flow. Communications of the ACM, 20(7):504–513, 1977

  8. [8]

    Inaudible sound as a covert channel in mobile devices

    Luke Deshotels. Inaudible sound as a covert channel in mobile devices. InUSENIX Workshop on Offensive Technologies (WOOT), 2014. 24

  9. [9]

    Dyer, Scott E

    Kevin P. Dyer, Scott E. Coull, Thomas Ristenpart, and Thomas Shrimpton. Peek-a-boo, i still see you: Why efficient traffic analysis countermeasures fail. InIEEE Symposium on Security and Privacy, pages 332–346, 2012

  10. [10]

    Ascii smuggling and hidden prompt instructions: Unicode tag characters in llm applications, 2024.https://embracethered.com/blog/

    Embrace the Red. Ascii smuggling and hidden prompt instructions: Unicode tag characters in llm applications, 2024.https://embracethered.com/blog/

  11. [11]

    Detecting LSB steganography in color and gray-scale images

    Jessica Fridrich, Miroslav Goljan, and Rui Du. Detecting LSB steganography in color and gray-scale images. InIEEE MultiMedia, volume 8, pages 22–28, 2001

  12. [12]

    Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. InACM Workshop on Artificial Intelligence and Security (AISec), 2023

  13. [13]

    Neuhoff, editors.The Sonification Handbook

    Thomas Hermann, Andy Hunt, and John G. Neuhoff, editors.The Sonification Handbook. Logos Verlag, 2011

  14. [14]

    Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

    Hakan Inan et al. Llama Guard: Llm-based input-output safeguard for human-ai conversations. In arXiv:2312.06674, 2023

  15. [15]

    Kemmerer

    Richard A. Kemmerer. Shared resource matrix methodology: An approach to identifying storage and timing channels. InACM Transactions on Computer Systems, volume 1, pages 256–277, 1983

  16. [16]

    An information-theoretic model for adaptive side-channel attacks

    Boris Köpf and David Basin. An information-theoretic model for adaptive side-channel attacks. InACM Conference on Computer and Communications Security (CCS), pages 286–296, 2007

  17. [17]

    Butler W. Lampson. A note on the confinement problem.Communications of the ACM, 16(10):613–615, 1973

  18. [18]

    Integrating flexible support for security policies into the linux operating system

    Peter Loscocco and Stephen Smalley. Integrating flexible support for security policies into the linux operating system. InUSENIX Annual Technical Conference, 2001

  19. [19]

    Enclawed: A verifiable, human-in-the-loop runtime for llm agents, 2026

    Alfredo Metere. Enclawed: A verifiable, human-in-the-loop runtime for llm agents, 2026. https: //enclawed.com

  20. [20]

    Jonathan K. Millen. Covert channel capacity. InIEEE Symposium on Security and Privacy, pages 60–66, 1987

  21. [21]

    George A. Miller. Note on the bias of information estimates. InInformation Theory in Psychology: Problems and Methods, pages 95–100. Free Press, 1955

  22. [22]

    ATLAS: Adversarial threat landscape for artificial-intelligence systems, 2024

    MITRE Corporation. ATLAS: Adversarial threat landscape for artificial-intelligence systems, 2024. https://atlas.mitre.org/

  23. [23]

    Owasp top 10 for large language model applications, 2025.https://owasp.org/ www-project-top-10-for-large-language-model-applications/

    OWASP Foundation. Owasp top 10 for large language model applications, 2025.https://owasp.org/ www-project-top-10-for-large-language-model-applications/

  24. [24]

    Saltzer and Michael D

    Jerome H. Saltzer and Michael D. Schroeder. The protection of information in computer systems. Proceedings of the IEEE, 63(9):1278–1308, 1975

  25. [25]

    Springer, 2012

    Asaf Shabtai, Yuval Elovici, and Lior Rokach.A Survey of Data Leakage Detection and Prevention Solutions. Springer, 2012

  26. [26]

    Claude E. Shannon. A mathematical theory of communication.Bell System Technical Journal, 27(3): 379–423, 1948

  27. [27]

    On the foundations of quantitative information flow

    Geoffrey Smith. On the foundations of quantitative information flow. InFoundations of Software Science and Computation Structures (FoSSaCS), pages 288–302, 2009. 25

  28. [28]

    Orange Book

    U.S. Department of Defense. Trusted computer system evaluation criteria (dod 5200.28-std). Technical report, U.S. Department of Defense, 1985. The “Orange Book”

  29. [29]

    John C. Wray. An analysis of covert timing channels. InIEEE Symposium on Security and Privacy, pages 2–7, 1991

  30. [30]

    Making information flow explicit in HiStar

    Nickolai Zeldovich, Silas Boyd-Wickizer, Eddie Kohler, and David Mazières. Making information flow explicit in HiStar. InUSENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006

  31. [31]

    Ziegler, Yuntian Deng, and Alexander M

    Zachary M. Ziegler, Yuntian Deng, and Alexander M. Rush. Neural linguistic steganography. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019. 26