Device Context Protocol: A Compact, Safety-First Architecture for LLM-Driven Control of Constrained Devices
Pith reviewed 2026-06-29 23:54 UTC · model grok-4.3
The pith
Device Context Protocol enables safe LLM control of microcontrollers by rejecting invalid calls at the host with a 28 KB firmware.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is a protocol architecture that uses sub-50-byte frames, a manifest schema enforcing scoping, range checks, dry-run evaluation and units-as-types, and a host Bridge to prevent bad calls from reaching firmware. On ESP32 it uses 27.6 KB flash and 0.6 KB RAM. The empirical evaluation against adversarial prompts from five LLMs shows 100% rejection of capability-escalation and 78% of prompt-injection attempts, while matching OpenAPI 3 expressiveness.
What carries the argument
The manifest schema with capability scoping, range and type checks, dry-run evaluation, and the host-side Bridge that rejects calls before they reach the device.
If this is right
- Enables LLM control of physical hardware on devices with very limited memory resources.
- Provides better safety against prompt injection and hallucination than existing MCP variants.
- Achieves similar expressiveness to OpenAPI schemas at much lower resource cost.
- Positions DCP as the link between software MCP and physical device control.
Where Pith is reading between the lines
- This architecture could be adapted for other low-power embedded systems beyond ESP32.
- Future work might test the Bridge against more sophisticated injection techniques.
- It suggests that safety can be enforced at the protocol layer rather than in device firmware.
- Adoption might lead to standardized safe interfaces for AI-controlled IoT devices.
Load-bearing premise
The host-side Bridge and manifest schema are assumed to be sufficient to catch all hallucinated or injected calls before any byte reaches the device firmware.
What would settle it
Demonstrating a prompt that causes the Bridge to accept and forward a command outside the allowed capabilities to the device firmware.
Figures
read the original abstract
Large language models are increasingly used as orchestrators of external tools via the Model Context Protocol (MCP), but MCP is built for software services with megabytes of memory and does not descend to the microcontrollers that dominate the long tail of physical devices. Recent work (IoT-MCP) ports MCP to edge gateways at 74 KB peak memory; this still excludes the smallest commodity MCUs and, critically, does not address the safety problem of giving an unreliable caller (an LLM that may hallucinate or be prompt-injected) direct control of physical hardware. We present the Device Context Protocol (DCP): a sub-50-byte typical frame (6-byte header + CBOR payload + optional 16-byte HMAC), a manifest schema in which capability scoping, range and type checks, dry-run evaluation, and units-as-types are protocol-layer primitives, and a host-side Bridge that rejects malformed or hallucinated calls before any byte reaches the device. Reference firmware measures 27.6 KB flash / 0.6 KB RAM on ESP32; the Python Bridge, ESP32 firmware, and a language-neutral conformance suite are MIT-licensed and public. An empirical study -- 675 tool calls produced by five LLMs across four vendors (DeepSeek, Alibaba, Zhipu, MiniMax) against six categories of adversarial prompts, with the injection category instantiating AgentDojo's attack templates -- shows DCP rejects 100% of capability-escalation attempts and 78% of prompt-injection attempts, versus 0--1% for Raw MCP and IoT-MCP, matching the expressiveness of a well-formed OpenAPI 3 schema at three orders of magnitude less firmware footprint. We position DCP as the missing layer between MCP (which is moving toward enterprise SaaS connectivity) and the physical devices it does not reach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Device Context Protocol (DCP) as a compact protocol (sub-50-byte frames) for LLM orchestration of constrained microcontrollers. It defines a manifest schema incorporating capability scoping, range/type checks, dry-run evaluation, and units-as-types as protocol primitives; a host-side Bridge that performs validation before any call reaches device firmware; reference ESP32 firmware at 27.6 KB flash / 0.6 KB RAM; and an empirical study of 675 tool calls from five LLMs across six adversarial categories (including AgentDojo prompt-injection templates) showing 100% rejection of capability-escalation attempts and 78% rejection of prompt-injection attempts versus 0-1% for Raw MCP and IoT-MCP baselines.
Significance. If the safety claims hold, DCP fills a documented gap between enterprise-oriented MCP variants and the resource-constrained physical devices that dominate IoT deployments, while supplying open-source artifacts and a concrete footprint comparison. The empirical rejection-rate data against external baselines provides a falsifiable performance anchor for the safety architecture.
major comments (1)
- [Safety architecture description] Safety architecture description (and abstract): the central claim that 'no byte reaches the device' because the Bridge plus manifest schema catches all hallucinated or injected calls is load-bearing, yet the 675-call study only instantiates the six tested adversarial categories (AgentDojo templates plus five LLM vendors). No argument, coverage metric, or completeness analysis is supplied showing these checks are exhaustive against arbitrary tool-call strings an LLM might emit; any undetected gap would falsify the premise while remaining invisible to the reported metrics.
Simulated Author's Rebuttal
We thank the referee for the constructive comment on the safety architecture. We address it directly below.
read point-by-point responses
-
Referee: [Safety architecture description] Safety architecture description (and abstract): the central claim that 'no byte reaches the device' because the Bridge plus manifest schema catches all hallucinated or injected calls is load-bearing, yet the 675-call study only instantiates the six tested adversarial categories (AgentDojo templates plus five LLM vendors). No argument, coverage metric, or completeness analysis is supplied showing these checks are exhaustive against arbitrary tool-call strings an LLM might emit; any undetected gap would falsify the premise while remaining invisible to the reported metrics.
Authors: We agree the evaluation is empirical. The manifest schema supplies an exhaustive, machine-checkable definition of every permitted call for a device; the Bridge performs deterministic validation (capability scoping, range/type checks, units-as-types, dry-run) against that schema before any frame is forwarded. By construction, any string that fails these checks—including arbitrary LLM outputs—is rejected, so the 'no byte reaches the device' property holds for the declared interface. The 675-call study measures how often real LLMs under the six tested adversarial regimes produce conforming versus non-conforming calls. We will revise the abstract and safety-architecture section to state the guarantee explicitly as 'with respect to a well-formed manifest' and add a limitations paragraph noting the empirical scope. This tempers the claim without altering the protocol design or reported data. revision: partial
Circularity Check
No circularity; empirical evaluation uses external baselines
full rationale
The paper describes an architecture (DCP frames, manifest schema, host Bridge) and reports empirical rejection rates from 675 LLM-generated calls against Raw MCP and IoT-MCP baselines. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear. The safety premise (Bridge intercepts all calls) is an architectural claim evaluated against specific adversarial templates rather than derived from or equivalent to its own inputs by construction. The study is externally falsifiable and does not reduce to self-reference.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM outputs may contain hallucinated or prompt-injected commands that must be filtered at the protocol layer before reaching constrained devices.
invented entities (1)
-
Device Context Protocol (DCP)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Model context protocol specification
Anthropic. Model context protocol specification. https://modelcontextprotocol.io,
-
[2]
Bormann and P
C. Bormann and P. Hoffman. Concise binary object representation (CBOR). RFC 8949, IETF, 2020
2020
-
[3]
CBOR-Web: The binary protocol for AI agents.https: //cborweb.com/, 2026
CBOR-Web Working Group. CBOR-Web: The binary protocol for AI agents.https: //cborweb.com/, 2026. Accessed: 2026-05
2026
-
[4]
Consistent overhead byte stuffing
Stuart Cheshire and Mary Baker. Consistent overhead byte stuffing. InIEEE/ACM Transactions on Networking, volume 7, pages 159–172, 1999
1999
-
[5]
Matter specification
Connectivity Standards Alliance. Matter specification. https://csa-iot.org/ all-solutions/matter/, 2024
2024
-
[6]
AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents
Edoardo Debenedetti, Jie Zhang, Mislav Balunović, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. InNeurIPS Datasets and Benchmarks Track, 2024
2024
-
[7]
Sparkplug specification
Eclipse Sparkplug Working Group. Sparkplug specification. Eclipse Foundation, 2024
2024
-
[8]
Kai Greshake, Sahar Abdelnabi, et al. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. arXiv:2302.12173, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[9]
Say the Mission, Execute the Swarm: Agent-Enhanced LLM Reasoning in the Web-of-Drones
Andrea Iannoli, Lorenzo Gigli, Luca Sciullo, Angelo Trotta, and Marco Di Felice. Say the mission, execute the swarm: Agent-enhanced LLM reasoning in the web-of-drones. In Proceedings of the 27th IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), 2026. arXiv:2605.03788
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[10]
Survey of hallucination in natural language generation.ACM Computing Surveys, 55(12), 2023
Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, et al. Survey of hallucination in natural language generation.ACM Computing Surveys, 55(12), 2023
2023
-
[11]
Krawczyk, M
H. Krawczyk, M. Bellare, and R. Canetti. HMAC: Keyed-hashing for message authentication. RFC 2104, IETF, 1997
1997
-
[12]
Skilled AI agents for embedded and IoT systems development.arXiv preprint arXiv:2603.19583, 2026
Yiming Li, Yuhan Cheng, Mingchen Ma, Yihang Zou, Ningyuan Yang, Wei Cheng, Hai Li, Yiran Chen, and Tingjun Chen. Skilled AI agents for embedded and IoT systems development.arXiv preprint arXiv:2603.19583, 2026
-
[13]
W3C web of things (WoT) support in Azure IoT operations
Microsoft. W3C web of things (WoT) support in Azure IoT operations. Microsoft Tech Community Blog, 2025
2025
-
[14]
The 2026 mcp roadmap
Model Context Protocol Working Groups. The 2026 mcp roadmap. https://blog. modelcontextprotocol.io/posts/2026-mcp-roadmap/, 2026. Updated 2026-03-05
2026
-
[15]
From Prompt to Physical Actuation: Holistic Threat Modeling of LLM-Enabled Robotic Systems
NehaNagaraja, HayretdinBahsi, andCarloR.daCunha. Fromprompttophysicalactuation: Holistic threat modeling of LLM-enabled robotic systems.arXiv preprint arXiv:2604.27267, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[16]
Secure hash standard (SHS)
NIST. Secure hash standard (SHS). FIPS PUB 180-4, 2015
2015
-
[17]
ToolLLM: Facilitating large language models to master 16000+ real-world APIs
Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. ToolLLM: Facilitating large language models to master 16000+ real-world APIs. InICLR, 2024. 14
2024
-
[18]
Yung-Chen Tang, Pin-Yu Chen, and Tsung-Yi Ho. Defining and evaluating physical safety for large language models.arXiv preprint arXiv:2411.02317, 2024
-
[19]
Web of things (WoT) thing description 2.0
W3C Web of Things Working Group. Web of things (WoT) thing description 2.0. W3C Recommendation, 2025
2025
-
[20]
Le Wang, Zonghao Ying, Xiao Yang, Quanchen Zou, Zhenfei Yin, Tianlin Li, Jian Yang, Yaodong Yang, Aishan Liu, and Xianglong Liu. RoboSafe: Safeguarding embodied agents via executable safety logic.arXiv preprint arXiv:2512.21220, 2025
-
[21]
Linke, Danyang Zhuo, Yiran Chen, Ting Wang, Dirk Englund, and Tingjun Chen
Zehao Wang, Mingzhe Han, Wei Cheng, Yue-Kai Huang, Philip Ji, Denton Wu, Mahdi Safari, Flemming Holtorf, Kenaish AlQubaisi, Norbert M. Linke, Danyang Zhuo, Yiran Chen, Ting Wang, Dirk Englund, and Tingjun Chen. Agentic AI for scalable and robust optical systems control.arXiv preprint arXiv:2602.20144, 2026
-
[22]
Ningyuan Yang, Guanliang Lyu, Mingchen Ma, Yiyi Lu, Yiming Li, Zhihui Gao, Hancheng Ye, Jianyi Zhang, Tingjun Chen, and Yiran Chen. IoT-MCP: Bridging LLMs and IoT systems through model context protocol.arXiv preprint arXiv:2510.01260, 2025
-
[23]
InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents.ACL Findings, 2024
Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents.ACL Findings, 2024
2024
-
[24]
Zhonghao Zhan, Amir Al Sadi, Krinos Li, and Hamed Haddadi. AegisMCP: Online graph intrusion detection for tool-augmented LLMs on edge devices.arXiv preprint arXiv:2510.19462, 2025. 15
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.