pith. sign in

arxiv: 2606.09315 · v1 · pith:5QWI37FPnew · submitted 2026-06-08 · 💻 cs.CR · cs.AI

Brain-Prompt Injection: A Route-Safety Audit for BCI-LLM Agents

Pith reviewed 2026-06-27 16:13 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords brain-prompt injectionBCI-LLM agentsroute safetyaudit contractconformal calibrationprovenanceEEG confirmationC3 decomposition
0
0 comments X

The pith

A Route-Safety Audit Contract with provenance and split-conformal confirmation blocks brain-prompt injection routes in BCI-LLM agents at zero false-accept rate under acquisition isolation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that brain-prompt injection attacks can reroute actions in BCI-to-agent systems without detection by EEG or text monitors alone. Route safety is determined by what an audit log can observe rather than decoder agreement or accuracy. The authors introduce a minimal Route-Safety Audit Contract and prove an audit-schema separation theorem plus C3 attacked-dependence decomposition, demonstrating that clean agreement fails to isolate the joint term controlling C3 flips. Split-conformal calibration on a non-oracle EEG confirmation channel produces a frontier where false-accept rate reaches 0.000 at clean utility 0.150 for alpha equal to 0.005. These results are obtained on EEGMMI left-right command data and hold under subject-cluster bootstrap and cross-architecture checks.

Core claim

Provenance blocks C2 routes at probability 0.000; agreement-plus-provenance routes C3 flips at probability 1.000; confirmation-plus-provenance routes them at probability 0.000. The conformal frontier reaches FAR 0.000 at clean utility 0.150 for alpha=.005 and FAR 0.119 at clean utility 0.452 for alpha=.10 under acquisition isolation; an attacker-controllable confirmation channel breaks the bound to approximately 1. Mediation and confirmation reduce risk but are not intent certificates.

What carries the argument

The Route-Safety Audit Contract, a minimal log schema with denominator hierarchy and endpoint specification, together with the C3 attacked-dependence decomposition that separates observable routing terms from decoder internals.

If this is right

  • Provenance alone is sufficient to block all C2 routes.
  • Agreement combined with provenance routes every C3 flip.
  • Confirmation combined with provenance routes every C3 flip at zero probability.
  • Split-conformal calibration on the confirmation channel yields zero FAR at 15 percent clean utility for alpha equal to 0.005 under isolation.
  • Within-regime saturation holds across TinyEEGNet and EEGNetV4 architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same audit contract could be applied to other neural signal modalities if equivalent confirmation channels exist.
  • Subject-cluster bootstrap results suggest the reported intervals generalize across users without retraining per individual.
  • If the confirmation channel remains attacker-controllable, the entire conformal bound collapses regardless of the audit schema.
  • The decomposition implies that adding more observable fields to the log schema could further tighten the separation between C2 and C3 terms.

Load-bearing premise

The audit log schema and C3 attacked-dependence decomposition capture exactly the observable information that controls routing decisions, independent of decoder internals.

What would settle it

Running the same conformal calibration after giving the attacker direct control over the confirmation channel and checking whether the false-accept rate reaches approximately 1, violating the reported bound.

Figures

Figures reproduced from arXiv: 2606.09315 by Jianwei Tai.

Figure 1
Figure 1. Figure 1: Cross-architecture conformal frontier. Left: em￾pirical unauthorized confirmed-route rate versus conformal target α, with 95% Wilson CI shading and the conformal target line y=α. Right: clean confirmation acceptance rate (utility) at the same α. EEGNetV4 Pareto-dominates Tiny￾EEGNetB across the frontier under threat model B-strict [PITH_FULL_IMAGE:figures/full_fig_p014_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Capacity-saturation finding for the conformal con [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗
read the original abstract

BCI-to-agent pipelines turn decoded neural activity into an authorization channel for tool-use agents, exposing a new attack surface we call \emph{brain-prompt injection}: signal-side perturbations, context-only injections, and adaptive dual-decoder attacks can all change the routed action while EEG-side or text-side monitors remain blind. Route safety in this stack depends on what the audit log can observe, not on decoder accuracy or agreement alone. We define a Route-Safety Audit Contract: a minimal log schema, denominator hierarchy, and endpoint specification, and prove an audit-schema separation theorem together with a C3 attacked-dependence decomposition; clean agreement and marginal robustness do not identify the joint term that controls C3 routing. As a calibration layer on top of the contract, we apply split-conformal calibration to a non-oracle EEG confirmation channel and report the resulting false-accept frontier under an explicit threat-archetype matrix. We instantiate the contract on EEGMMI native left/right command-control over 5{,}400 events, harmless tool stubs, and seed/case denominators. Provenance blocks C2 routes ($0.000$); agreement-plus-provenance routes C3 flips ($1.000$); confirmation-plus-provenance routes them ($0.000$). The conformal frontier reaches FAR $0.000$ at clean utility $0.150$ for $\alpha=.005$ and FAR $0.119$ at clean utility $0.452$ for $\alpha=.10$ under acquisition isolation; an attacker-controllable confirmation channel breaks the bound to $\approx\!1$. Subject-cluster bootstrap confirms these intervals on $60$ subjects; cross-architecture (TinyEEGNet, EEGNetV4) and capacity-sweep results show within-regime saturation. Mediation and confirmation reduce risk; they are not intent certificates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to identify a new attack surface called brain-prompt injection in BCI-LLM agent pipelines. It defines a Route-Safety Audit Contract with a minimal log schema and proves an audit-schema separation theorem along with a C3 attacked-dependence decomposition. Using split-conformal calibration on a non-oracle EEG confirmation channel and experiments on the EEGMMI dataset with 5,400 events, it reports that provenance blocks C2 routes at rate 0.000, agreement-plus-provenance routes C3 flips at 1.000, confirmation-plus-provenance at 0.000, and conformal prediction achieves FAR 0.000 at clean utility 0.150 for alpha=0.005 under acquisition isolation.

Significance. If the central results hold, the work offers a structured approach to auditing route safety in BCI-to-agent systems, providing empirical evidence on the effectiveness of log-based provenance, agreement, and confirmation in blocking attacks, along with conformal bounds on false accept rates. The subject-cluster bootstrap and cross-architecture validation add robustness to the empirical findings.

major comments (2)
  1. The C3 attacked-dependence decomposition and separation theorem are introduced without derivation steps or proof details in the provided abstract, making it difficult to assess whether they are independently grounded or potentially circular with the audit observables.
  2. The reported blocking rates (e.g., provenance blocks C2 routes (0.000)) and conformal frontier depend on the assumption that the audit log schema captures the observable information controlling routing decisions independent of decoder internals. The manuscript states this explicitly, but if unlogged decoder states affect routing, the decomposition would miss the controlling term, undermining the claims.
minor comments (1)
  1. The abstract contains a formatting artifact '5{,}400' which should be clarified as 5,400 events.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point-by-point below, providing clarifications on the derivations and assumptions while indicating planned revisions.

read point-by-point responses
  1. Referee: The C3 attacked-dependence decomposition and separation theorem are introduced without derivation steps or proof details in the provided abstract, making it difficult to assess whether they are independently grounded or potentially circular with the audit observables.

    Authors: The full manuscript derives the audit-schema separation theorem and C3 attacked-dependence decomposition from the minimal log schema and denominator hierarchy in the sections immediately following the Route-Safety Audit Contract definition. These establish that clean agreement and marginal robustness fail to identify the joint term controlling C3 routing, using observable separation properties that do not reference decoder internals. The reasoning is not circular, as it begins from the contract's explicit observables. We will add a concise proof sketch to the revised introduction or appendix to improve accessibility beyond the abstract. revision: yes

  2. Referee: The reported blocking rates (e.g., provenance blocks C2 routes (0.000)) and conformal frontier depend on the assumption that the audit log schema captures the observable information controlling routing decisions independent of decoder internals. The manuscript states this explicitly, but if unlogged decoder states affect routing, the decomposition would miss the controlling term, undermining the claims.

    Authors: The Route-Safety Audit Contract is defined strictly on the observable log schema, and the separation theorem applies conditional on that schema capturing the routing decisions. The manuscript repeatedly emphasizes that route safety depends on audit-log observables rather than decoder accuracy or internals. The threat model and empirical results (including the 0.000 blocking rates and conformal bounds) are conditioned on this logged information; attacks that evade the log fall outside the contract by construction. We will add an explicit discussion of this boundary condition and its implications for the decomposition in the revised manuscript. revision: yes

Circularity Check

1 steps flagged

C3 attacked-dependence decomposition and separation theorem co-defined with audit contract

specific steps
  1. self definitional [Abstract]
    "We define a Route-Safety Audit Contract: a minimal log schema, denominator hierarchy, and endpoint specification, and prove an audit-schema separation theorem together with a C3 attacked-dependence decomposition; clean agreement and marginal robustness do not identify the joint term that controls C3 routing."

    The theorem and decomposition are derived directly from the audit contract defined in the same sentence; the reported route-blocking rates are presented as consequences of this decomposition, which assumes by definition that the log observables fully determine C3 routing independent of decoder internals.

full rationale

The paper introduces the Route-Safety Audit Contract and immediately proves the separation theorem plus C3 decomposition from it in the same work. The headline empirical claims (provenance blocks C2 at 0.000; agreement+provenance routes C3 flips at 1.000) are then interpreted exclusively through that decomposition, which by construction treats the proposed log fields as capturing all routing controls. This reduces the interpretive step to the definitions themselves rather than an independent grounding. No external verification or machine-checked theorem is cited; the assumption that unlogged decoder states do not modulate routing is baked into the schema.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

Abstract-only review prevents exhaustive extraction; the ledger below captures the minimal elements visible from the provided text.

free parameters (1)
  • alpha
    Significance level used in split-conformal calibration; controls the reported FAR-utility frontier.
axioms (1)
  • domain assumption Route safety is determined by what the audit log can observe rather than decoder accuracy or agreement alone.
    Stated directly in the abstract as the premise for the Route-Safety Audit Contract.
invented entities (2)
  • brain-prompt injection no independent evidence
    purpose: New attack surface encompassing signal-side perturbations, context-only injections, and adaptive dual-decoder attacks.
    Introduced as the central threat model; no independent evidence supplied in abstract.
  • C3 attacked-dependence decomposition no independent evidence
    purpose: Mathematical decomposition separating the joint routing term from clean agreement and marginal robustness.
    Defined within the paper to support the separation theorem; no external grounding visible.

pith-pipeline@v0.9.1-grok · 5858 in / 1481 out tokens · 22058 ms · 2026-06-27T16:13:45.917737+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 6 linked inside Pith

  1. [1]

    Towards evaluating the robustness of neural networks

    Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, pages 39--57, 2017

  2. [2]

    Goldberger, Luis A

    Ary L. Goldberger, Luis A. N. Amaral, Leon Glass, Jeffrey M. Hausdorff, Plamen Ch. Ivanov, Roger G. Mark, Joseph E. Mietus, George B. Moody, Chung-Kang Peng, and H. Eugene Stanley. PhysioBank , PhysioToolkit , and PhysioNet : Components of a new research resource for complex physiologic signals. Circulation, 101(23):e215--e220, 2000

  3. [3]

    Not what you've signed up for: Compromising real-world LLM -integrated applications with indirect prompt injection

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you've signed up for: Compromising real-world LLM -integrated applications with indirect prompt injection. arXiv preprint arXiv:2302.12173, 2023

  4. [4]

    Lawhern, Amelia J

    Vernon J. Lawhern, Amelia J. Solon, Nicholas R. Waytowich, Stephen M. Gordon, Chou P. Hung, and Brent J. Lance. EEGNet : A compact convolutional neural network for EEG -based brain--computer interfaces. Journal of Neural Engineering, 15(5):056013, 2018

  5. [5]

    Towards deep learning models resistant to adversarial attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018

  6. [6]

    Metzger, Kaylo T

    Sean L. Metzger, Kaylo T. Littlejohn, Alexander B. Silva, David A. Moses, Margaret P. Seaton, Ran Wang, Maximilian E. Dougherty, Jessie R. Liu, Peter Wu, Michael A. Berger, Irina Zhuravleva, Adelyn Tu-Chan, Karunesh Ganguly, Gopala K. Anumanchipalli, and Edward F. Chang. A high-performance neuroprosthesis for speech decoding and avatar control. Nature, 62...

  7. [7]

    Berkay Celik, and Ananthram Swami

    Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy, pages 372--387, 2016

  8. [8]

    ToolLLM : Facilitating large language models to master 16000+ real-world API s

    Yujia Qin, Shengding Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. ToolLLM : Facilitating large language models to master 16000+ real-world API s. arXiv preprint arXiv:2307.16789, 2023

  9. [9]

    McFarland, Thilo Hinterberger, Niels Birbaumer, and Jonathan R

    Gerwin Schalk, Dennis J. McFarland, Thilo Hinterberger, Niels Birbaumer, and Jonathan R. Wolpaw. BCI2000 : A general-purpose brain--computer interface system. IEEE Transactions on Biomedical Engineering, 51(6):1034--1043, 2004

  10. [10]

    Toolformer : Language models can teach themselves to use tools

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer : Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023

  11. [11]

    Ensemble adversarial training: Attacks and defenses

    Florian Tram \`e r, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations, 2018

  12. [12]

    Jailbroken: How does LLM safety training fail? arXiv preprint arXiv:2307.02483, 2023

    Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. Jailbroken: How does LLM safety training fail? arXiv preprint arXiv:2307.02483, 2023

  13. [13]

    Willett, Donald T

    Francis R. Willett, Donald T. Avansino, Leigh R. Hochberg, Jaimie M. Henderson, and Krishna V. Shenoy. High-performance brain-to-text communication via handwriting. Nature, 593:249--254, 2021

  14. [14]

    Brandon Westover, and Jimeng Sun

    Chaoqi Yang, M. Brandon Westover, and Jimeng Sun. BIOT : Biosignal transformer for cross-data learning in the wild. Advances in Neural Information Processing Systems, 36, 2023

  15. [15]

    On the vulnerability of CNN classifiers in EEG -based BCI s

    Xiao Zhang and Dongrui Wu. On the vulnerability of CNN classifiers in EEG -based BCI s. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(5):814--825, 2019

  16. [16]

    Zico Kolter, and Matt Fredrikson

    Andy Zou, Zifan Wang, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023

  17. [17]

    Algorithmic Learning in a Random World

    Vladimir Vovk, Alexander Gammerman, and Glenn Shafer. Algorithmic Learning in a Random World. Springer, 2005

  18. [18]

    Cand\` e s

    Yaniv Romano, Evan Patterson, and Emmanuel J. Cand\` e s. Conformalized quantile regression. In Advances in Neural Information Processing Systems (NeurIPS), 2019

  19. [19]

    Large brain model for learning generic representations with tremendous EEG data in BCI

    Wei-Bang Jiang, Li-Ming Zhao, and Bao-Liang Lu. Large brain model for learning generic representations with tremendous EEG data in BCI . In International Conference on Learning Representations (ICLR), Spotlight, 2024

  20. [20]

    InjecAgent : Benchmarking indirect prompt injections in tool-integrated large language model agents

    Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. InjecAgent : Benchmarking indirect prompt injections in tool-integrated large language model agents. In Findings of the Association for Computational Linguistics: ACL 2024, pages 10471--10506, 2024

  21. [21]

    AgentDojo : A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents

    Edoardo Debenedetti, Jie Zhang, Mislav Balunovi \'c , Luca Beurer-Kellner, Marc Fischer, and Florian Tram \`e r. AgentDojo : A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. In Advances in Neural Information Processing Systems (NeurIPS), 2024

  22. [22]

    Benchmarking and defending against indirect prompt injection attacks on large language models

    Jingwei Yi, Yueqi Xie, Bin Zhu, Keegan Hines, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. Benchmarking and defending against indirect prompt injection attacks on large language models. arXiv preprint arXiv:2312.14197, 2023

  23. [23]

    Brain-to-text decoding: A non-invasive approach via typing

    Jarod L \'e vy, Mingfang Zacharias, Omar Chehab, Hubert Banville, Daoud Tabbal, Alexandre Defossez, and Jean-R \'e mi King. Brain-to-text decoding: A non-invasive approach via typing. arXiv preprint arXiv:2502.17480, 2025

  24. [24]

    Agent Security Bench ( ASB ): Formalizing and benchmarking attacks and defenses in LLM -based agents

    Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent Security Bench ( ASB ): Formalizing and benchmarking attacks and defenses in LLM -based agents. arXiv preprint arXiv:2410.02644, 2024