pith. sign in

arxiv: 2605.26159 · v1 · pith:RYXOFAWBnew · submitted 2026-05-24 · 💻 cs.NI · cs.CR· cs.LG

Device Context Protocol: A Compact, Safety-First Architecture for LLM-Driven Control of Constrained Devices

Pith reviewed 2026-06-29 23:54 UTC · model grok-4.3

classification 💻 cs.NI cs.CRcs.LG
keywords device context protocolllm tool callingprompt injectionmicrocontrollersiot safetyconstrained devicesmanifest schema
0
0 comments X

The pith

Device Context Protocol enables safe LLM control of microcontrollers by rejecting invalid calls at the host with a 28 KB firmware.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the Device Context Protocol (DCP) can safely extend LLM tool calling to constrained physical devices. It does this through a compact frame format, a manifest schema with built-in checks for capabilities and ranges, and a Bridge that filters calls before they reach the device. The study of 675 tool calls shows complete rejection of escalation attempts and high rejection of injections, unlike prior protocols. This would matter because it opens LLM orchestration to the many small hardware devices that control physical systems without requiring large memory or sacrificing safety.

Core claim

The central discovery is a protocol architecture that uses sub-50-byte frames, a manifest schema enforcing scoping, range checks, dry-run evaluation and units-as-types, and a host Bridge to prevent bad calls from reaching firmware. On ESP32 it uses 27.6 KB flash and 0.6 KB RAM. The empirical evaluation against adversarial prompts from five LLMs shows 100% rejection of capability-escalation and 78% of prompt-injection attempts, while matching OpenAPI 3 expressiveness.

What carries the argument

The manifest schema with capability scoping, range and type checks, dry-run evaluation, and the host-side Bridge that rejects calls before they reach the device.

If this is right

  • Enables LLM control of physical hardware on devices with very limited memory resources.
  • Provides better safety against prompt injection and hallucination than existing MCP variants.
  • Achieves similar expressiveness to OpenAPI schemas at much lower resource cost.
  • Positions DCP as the link between software MCP and physical device control.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This architecture could be adapted for other low-power embedded systems beyond ESP32.
  • Future work might test the Bridge against more sophisticated injection techniques.
  • It suggests that safety can be enforced at the protocol layer rather than in device firmware.
  • Adoption might lead to standardized safe interfaces for AI-controlled IoT devices.

Load-bearing premise

The host-side Bridge and manifest schema are assumed to be sufficient to catch all hallucinated or injected calls before any byte reaches the device firmware.

What would settle it

Demonstrating a prompt that causes the Bridge to accept and forward a command outside the allowed capabilities to the device firmware.

Figures

Figures reproduced from arXiv: 2605.26159 by Dongxu Yang.

Figure 1
Figure 1. Figure 1: DCP architecture. The Bridge is the sole trust boundary; the device remains simple [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: DCP frame layout and on-wire size compared to representative alternatives. The [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Schema-layer rejection rates over an empirical corpus of 675 tool calls produced by [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Measured end-to-end latency for a typical call–reply round-trip. Bars are medi [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Measured static RAM footprint. The DCP layer (0.6 KB) is the differential measurement [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
read the original abstract

Large language models are increasingly used as orchestrators of external tools via the Model Context Protocol (MCP), but MCP is built for software services with megabytes of memory and does not descend to the microcontrollers that dominate the long tail of physical devices. Recent work (IoT-MCP) ports MCP to edge gateways at 74 KB peak memory; this still excludes the smallest commodity MCUs and, critically, does not address the safety problem of giving an unreliable caller (an LLM that may hallucinate or be prompt-injected) direct control of physical hardware. We present the Device Context Protocol (DCP): a sub-50-byte typical frame (6-byte header + CBOR payload + optional 16-byte HMAC), a manifest schema in which capability scoping, range and type checks, dry-run evaluation, and units-as-types are protocol-layer primitives, and a host-side Bridge that rejects malformed or hallucinated calls before any byte reaches the device. Reference firmware measures 27.6 KB flash / 0.6 KB RAM on ESP32; the Python Bridge, ESP32 firmware, and a language-neutral conformance suite are MIT-licensed and public. An empirical study -- 675 tool calls produced by five LLMs across four vendors (DeepSeek, Alibaba, Zhipu, MiniMax) against six categories of adversarial prompts, with the injection category instantiating AgentDojo's attack templates -- shows DCP rejects 100% of capability-escalation attempts and 78% of prompt-injection attempts, versus 0--1% for Raw MCP and IoT-MCP, matching the expressiveness of a well-formed OpenAPI 3 schema at three orders of magnitude less firmware footprint. We position DCP as the missing layer between MCP (which is moving toward enterprise SaaS connectivity) and the physical devices it does not reach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces the Device Context Protocol (DCP) as a compact protocol (sub-50-byte frames) for LLM orchestration of constrained microcontrollers. It defines a manifest schema incorporating capability scoping, range/type checks, dry-run evaluation, and units-as-types as protocol primitives; a host-side Bridge that performs validation before any call reaches device firmware; reference ESP32 firmware at 27.6 KB flash / 0.6 KB RAM; and an empirical study of 675 tool calls from five LLMs across six adversarial categories (including AgentDojo prompt-injection templates) showing 100% rejection of capability-escalation attempts and 78% rejection of prompt-injection attempts versus 0-1% for Raw MCP and IoT-MCP baselines.

Significance. If the safety claims hold, DCP fills a documented gap between enterprise-oriented MCP variants and the resource-constrained physical devices that dominate IoT deployments, while supplying open-source artifacts and a concrete footprint comparison. The empirical rejection-rate data against external baselines provides a falsifiable performance anchor for the safety architecture.

major comments (1)
  1. [Safety architecture description] Safety architecture description (and abstract): the central claim that 'no byte reaches the device' because the Bridge plus manifest schema catches all hallucinated or injected calls is load-bearing, yet the 675-call study only instantiates the six tested adversarial categories (AgentDojo templates plus five LLM vendors). No argument, coverage metric, or completeness analysis is supplied showing these checks are exhaustive against arbitrary tool-call strings an LLM might emit; any undetected gap would falsify the premise while remaining invisible to the reported metrics.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the safety architecture. We address it directly below.

read point-by-point responses
  1. Referee: [Safety architecture description] Safety architecture description (and abstract): the central claim that 'no byte reaches the device' because the Bridge plus manifest schema catches all hallucinated or injected calls is load-bearing, yet the 675-call study only instantiates the six tested adversarial categories (AgentDojo templates plus five LLM vendors). No argument, coverage metric, or completeness analysis is supplied showing these checks are exhaustive against arbitrary tool-call strings an LLM might emit; any undetected gap would falsify the premise while remaining invisible to the reported metrics.

    Authors: We agree the evaluation is empirical. The manifest schema supplies an exhaustive, machine-checkable definition of every permitted call for a device; the Bridge performs deterministic validation (capability scoping, range/type checks, units-as-types, dry-run) against that schema before any frame is forwarded. By construction, any string that fails these checks—including arbitrary LLM outputs—is rejected, so the 'no byte reaches the device' property holds for the declared interface. The 675-call study measures how often real LLMs under the six tested adversarial regimes produce conforming versus non-conforming calls. We will revise the abstract and safety-architecture section to state the guarantee explicitly as 'with respect to a well-formed manifest' and add a limitations paragraph noting the empirical scope. This tempers the claim without altering the protocol design or reported data. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical evaluation uses external baselines

full rationale

The paper describes an architecture (DCP frames, manifest schema, host Bridge) and reports empirical rejection rates from 675 LLM-generated calls against Raw MCP and IoT-MCP baselines. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear. The safety premise (Bridge intercepts all calls) is an architectural claim evaluated against specific adversarial templates rather than derived from or equivalent to its own inputs by construction. The study is externally falsifiable and does not reduce to self-reference.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central safety claims rest on the domain assumption that a manifest-driven Bridge can reliably intercept unsafe LLM outputs before they reach hardware; no free parameters or invented physical entities are introduced.

axioms (1)
  • domain assumption LLM outputs may contain hallucinated or prompt-injected commands that must be filtered at the protocol layer before reaching constrained devices.
    This assumption underpins the design of the manifest schema and host Bridge as the primary safety mechanism.
invented entities (1)
  • Device Context Protocol (DCP) no independent evidence
    purpose: Compact, safety-first frame and manifest format for LLM control of MCUs
    New protocol introduced by the paper; independent evidence would require external adoption or formal verification beyond the presented tests.

pith-pipeline@v0.9.1-grok · 5873 in / 1525 out tokens · 31914 ms · 2026-06-29T23:54:37.413160+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 9 canonical work pages · 3 internal anchors

  1. [1]

    Model context protocol specification

    Anthropic. Model context protocol specification. https://modelcontextprotocol.io,

  2. [2]

    Bormann and P

    C. Bormann and P. Hoffman. Concise binary object representation (CBOR). RFC 8949, IETF, 2020

  3. [3]

    CBOR-Web: The binary protocol for AI agents.https: //cborweb.com/, 2026

    CBOR-Web Working Group. CBOR-Web: The binary protocol for AI agents.https: //cborweb.com/, 2026. Accessed: 2026-05

  4. [4]

    Consistent overhead byte stuffing

    Stuart Cheshire and Mary Baker. Consistent overhead byte stuffing. InIEEE/ACM Transactions on Networking, volume 7, pages 159–172, 1999

  5. [5]

    Matter specification

    Connectivity Standards Alliance. Matter specification. https://csa-iot.org/ all-solutions/matter/, 2024

  6. [6]

    AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents

    Edoardo Debenedetti, Jie Zhang, Mislav Balunović, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. InNeurIPS Datasets and Benchmarks Track, 2024

  7. [7]

    Sparkplug specification

    Eclipse Sparkplug Working Group. Sparkplug specification. Eclipse Foundation, 2024

  8. [8]

    Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

    Kai Greshake, Sahar Abdelnabi, et al. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. arXiv:2302.12173, 2023

  9. [9]

    Say the Mission, Execute the Swarm: Agent-Enhanced LLM Reasoning in the Web-of-Drones

    Andrea Iannoli, Lorenzo Gigli, Luca Sciullo, Angelo Trotta, and Marco Di Felice. Say the mission, execute the swarm: Agent-enhanced LLM reasoning in the web-of-drones. In Proceedings of the 27th IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), 2026. arXiv:2605.03788

  10. [10]

    Survey of hallucination in natural language generation.ACM Computing Surveys, 55(12), 2023

    Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, et al. Survey of hallucination in natural language generation.ACM Computing Surveys, 55(12), 2023

  11. [11]

    Krawczyk, M

    H. Krawczyk, M. Bellare, and R. Canetti. HMAC: Keyed-hashing for message authentication. RFC 2104, IETF, 1997

  12. [12]

    Skilled AI agents for embedded and IoT systems development.arXiv preprint arXiv:2603.19583, 2026

    Yiming Li, Yuhan Cheng, Mingchen Ma, Yihang Zou, Ningyuan Yang, Wei Cheng, Hai Li, Yiran Chen, and Tingjun Chen. Skilled AI agents for embedded and IoT systems development.arXiv preprint arXiv:2603.19583, 2026

  13. [13]

    W3C web of things (WoT) support in Azure IoT operations

    Microsoft. W3C web of things (WoT) support in Azure IoT operations. Microsoft Tech Community Blog, 2025

  14. [14]

    The 2026 mcp roadmap

    Model Context Protocol Working Groups. The 2026 mcp roadmap. https://blog. modelcontextprotocol.io/posts/2026-mcp-roadmap/, 2026. Updated 2026-03-05

  15. [15]

    From Prompt to Physical Actuation: Holistic Threat Modeling of LLM-Enabled Robotic Systems

    NehaNagaraja, HayretdinBahsi, andCarloR.daCunha. Fromprompttophysicalactuation: Holistic threat modeling of LLM-enabled robotic systems.arXiv preprint arXiv:2604.27267, 2026

  16. [16]

    Secure hash standard (SHS)

    NIST. Secure hash standard (SHS). FIPS PUB 180-4, 2015

  17. [17]

    ToolLLM: Facilitating large language models to master 16000+ real-world APIs

    Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. ToolLLM: Facilitating large language models to master 16000+ real-world APIs. InICLR, 2024. 14

  18. [18]

    Defining and evaluating physical safety for large language models.arXiv preprint arXiv:2411.02317, 2024

    Yung-Chen Tang, Pin-Yu Chen, and Tsung-Yi Ho. Defining and evaluating physical safety for large language models.arXiv preprint arXiv:2411.02317, 2024

  19. [19]

    Web of things (WoT) thing description 2.0

    W3C Web of Things Working Group. Web of things (WoT) thing description 2.0. W3C Recommendation, 2025

  20. [20]

    RoboSafe: Safeguarding embodied agents via executable safety logic.arXiv preprint arXiv:2512.21220, 2025

    Le Wang, Zonghao Ying, Xiao Yang, Quanchen Zou, Zhenfei Yin, Tianlin Li, Jian Yang, Yaodong Yang, Aishan Liu, and Xianglong Liu. RoboSafe: Safeguarding embodied agents via executable safety logic.arXiv preprint arXiv:2512.21220, 2025

  21. [21]

    Linke, Danyang Zhuo, Yiran Chen, Ting Wang, Dirk Englund, and Tingjun Chen

    Zehao Wang, Mingzhe Han, Wei Cheng, Yue-Kai Huang, Philip Ji, Denton Wu, Mahdi Safari, Flemming Holtorf, Kenaish AlQubaisi, Norbert M. Linke, Danyang Zhuo, Yiran Chen, Ting Wang, Dirk Englund, and Tingjun Chen. Agentic AI for scalable and robust optical systems control.arXiv preprint arXiv:2602.20144, 2026

  22. [22]

    IoT-MCP: Bridging LLMs and IoT systems through model context protocol.arXiv preprint arXiv:2510.01260, 2025

    Ningyuan Yang, Guanliang Lyu, Mingchen Ma, Yiyi Lu, Yiming Li, Zhihui Gao, Hancheng Ye, Jianyi Zhang, Tingjun Chen, and Yiran Chen. IoT-MCP: Bridging LLMs and IoT systems through model context protocol.arXiv preprint arXiv:2510.01260, 2025

  23. [23]

    InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents.ACL Findings, 2024

    Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents.ACL Findings, 2024

  24. [24]

    AegisMCP: Online graph intrusion detection for tool-augmented LLMs on edge devices.arXiv preprint arXiv:2510.19462, 2025

    Zhonghao Zhan, Amir Al Sadi, Krinos Li, and Hamed Haddadi. AegisMCP: Online graph intrusion detection for tool-augmented LLMs on edge devices.arXiv preprint arXiv:2510.19462, 2025. 15