arxiv: 2605.09055 · v1 · submitted 2026-05-09 · 💻 cs.RO · cs.AI· cs.MA

Recognition: 2 theorem links

· Lean Theorem

Octopus Protocol: One-Shot Hardware Discovery and Control for AI Agents via Infrastructure-as-Prompts

Quilee Simeon , Justin M. Wei , Yile Fan

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:01 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.MA

keywords hardware discoveryAI agent controlone-shot onboardingmodel context protocollanguage model generated driversclosed-loop roboticsinfrastructure as prompts

0 comments

The pith

One shell command discovers hardware, generates control interfaces, and lets AI agents operate new devices without any human-written drivers or SDKs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that hardware onboarding for agentic robotics can be reduced to a single shell command through a pipeline that probes the operating system, identifies connected devices, infers their capabilities, and generates a live Model Context Protocol server with typed tools. This matters to a sympathetic reader because the dominant cost in agent-robotics systems has been the manual creation of drivers and primitives; collapsing that cost opens the possibility of rapid deployment across heterogeneous platforms. The approach treats protocols as prompts executed by the coding agent itself, which then serves as both generator and runtime, including a daemon that monitors and repairs the generated code while enabling self-perception through camera tools. Validation across PC, macOS, Raspberry Pi, and a 6-DOF arm shows the pipeline completing in 10-15 minutes and exposing up to 30 tools that support closed-loop visual-motor control.

Core claim

Octopus Protocol collapses hardware setup to one command by running a five-stage pipeline—PROBE, IDENTIFY, INTERFACE, SERVE, DEPLOY—where a coding agent uses raw OS access and a language-model key to discover devices, infer capabilities, generate an MCP server with typed tools, deploy it as an HTTP endpoint, and maintain it through a persistent daemon. Two principles underpin the system: protocols are prompts rather than static code, and the coding agent functions as the runtime. This enables an MCP-compliant client to perform closed-loop visual-motor control using only tools the agent wrote for itself on platforms ranging from WSL to robotic arms with USB cameras.

What carries the argument

The five-stage pipeline (PROBE, IDENTIFY, INTERFACE, SERVE, DEPLOY) in which the language model interprets OS probe data to generate and maintain an MCP server of typed tools, with the coding agent serving as both code generator and ongoing runtime.

If this is right

Hardware integration for AI agents drops from hours or days of engineering to roughly 10-15 minutes per device.
Up to 30 typed MCP tools become available for control without any pre-existing drivers or SDKs.
Closed-loop visual-motor control becomes possible using only interfaces the agent generated for itself.
A background daemon continuously heals broken code and maintains physical-state perception through self-created camera tools.
The same pipeline functions across Windows, macOS, Linux, and embedded boards like the Raspberry Pi.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could extend naturally to fleets of devices by chaining multiple one-command onboards within a single agent session.
If the inference step improves, the same pattern might apply to software-only interfaces such as APIs or simulators that currently require custom wrappers.
Rapid hardware cycling becomes feasible for testing new sensors or actuators in research or prototyping loops.

Load-bearing premise

The language model will consistently interpret OS probe information and produce correct, functional interface code for any connected hardware without requiring human debugging or corrections.

What would settle it

Repeated trials on a fresh device where the generated MCP tools produce incorrect or non-functional control actions, or where the pipeline requires manual fixes before control succeeds, would show the one-shot process does not reliably work as claimed.

Figures

Figures reproduced from arXiv: 2605.09055 by Justin M. Wei, Quilee Simeon, Yile Fan.

**Figure 2.** Figure 2: Hardware. (a, left) Design intent: a circular camera-slider rig for multi-view perception (CAD render, aspirational, not yet built). (b, right) Current benchtop prototype: off-the-shelf SO-ARM101 arm (Seeed Studio; 6-DOF Feetech STS3215 servos) and a USB camera on a single-axis vertical post atop a partial circular track, driven by a Raspberry Pi 4. Tool count is capped at 30 per installation and varies by… view at source ↗

read the original abstract

Recent agentic-robotics systems, from Code-asPolicies to modern vision-language-action (VLA) foundation models, presuppose that drivers, SDKs, or ROS-style primitives for the target hardware already exist. Writing those primitives is the dominant engineering cost of bringing up new hardware for agent control. We present Octopus Protocol, a system that collapses that cost to a single shell command. Given only raw OS access and a language-model API key, a coding agent executes a five-stage pipeline--PROBE, IDENTIFY, INTERFACE, SERVE, DEPLOY--to discover connected devices, infer their capabilities, generate a Model Context Protocol (MCP) server with typed tools, and deploy it as a live HTTP endpoint. A persistent daemon then monitors the system, heals broken code, and perceives physical state through the camera tools it generated for itself. Two architectural principles make this work: protocols are prompts, not code, and the coding agent is the runtime. We validate the system on three heterogeneous platforms (PC/WSL, Apple Silicon macOS, Raspberry Pi 4) and on a commercial 6-DOF robotic arm with USB camera feedback. One command onboards the hardware in ~10-15 minutes and exposes up to 30 MCP tools; an MCP-compliant client then performs closed-loop visual-motor control through tools no human wrote.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Octopus Protocol, a system that uses an LLM coding agent to execute a five-stage pipeline (PROBE, IDENTIFY, INTERFACE, SERVE, DEPLOY) from raw OS access alone. This discovers connected devices, infers capabilities, generates a typed MCP server exposing up to 30 tools, deploys it as an HTTP endpoint, and runs a persistent healing daemon that monitors and repairs the generated code while enabling closed-loop visual-motor control on heterogeneous platforms including a 6-DOF robotic arm. The central claim is that one shell command reduces hardware onboarding to 10-15 minutes without any human-written drivers or SDKs.

Significance. If the pipeline reliably produces correct, bug-free MCP tools that support real-time closed-loop control, the work would substantially reduce the dominant engineering cost of hardware integration for agentic robotics and VLA systems. It introduces the architectural idea of treating protocols as prompts and the coding agent as runtime, which could generalize to other infrastructure domains if the reliability claims hold.

major comments (2)

[Abstract] Abstract: The validation statement reports successful onboarding across three platforms and one robotic arm in ~10-15 minutes with up to 30 MCP tools, yet supplies no quantitative metrics (success rates, failure rates, latency distributions, error bars, or logs) and no description of how closed-loop visual-motor control performance was measured. This absence makes it impossible to assess whether the generated tools actually support reliable control or merely syntactic correctness.
[Abstract] Abstract (PROBE-IDENTIFY-INTERFACE-SERVE-DEPLOY pipeline): The claim that the LLM agent produces reliable, device-specific control and feedback code from OS probes (lsusb, dmesg, camera enumeration) alone rests on the untested assumption that such probes contain sufficient information to avoid hallucinations or protocol errors in real-time loops. No evidence is given that the persistent healing daemon succeeds for closed-loop control rather than simple syntax fixes, leaving the central systems claim unsupported.

minor comments (1)

[Abstract] Abstract: The term 'Model Context Protocol (MCP)' is introduced without a brief definition or reference to its specification, which may confuse readers unfamiliar with the protocol.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback highlighting the need for stronger quantitative validation and explicit evidence of pipeline reliability. We address each major comment below and will revise the manuscript to incorporate additional details and metrics where possible.

read point-by-point responses

Referee: [Abstract] Abstract: The validation statement reports successful onboarding across three platforms and one robotic arm in ~10-15 minutes with up to 30 MCP tools, yet supplies no quantitative metrics (success rates, failure rates, latency distributions, error bars, or logs) and no description of how closed-loop visual-motor control performance was measured. This absence makes it impossible to assess whether the generated tools actually support reliable control or merely syntactic correctness.

Authors: We agree that the abstract as written does not provide the requested quantitative metrics or measurement details, which limits immediate assessment of the claims. The full manuscript's evaluation section reports end-to-end success on the specified platforms and arm but presents results primarily through demonstration rather than aggregated statistics. We will revise the abstract to include success rates across trials, average onboarding times, and a concise description of how closed-loop performance was assessed (e.g., via sustained visual feedback enabling motor commands without human intervention). We will also expand the evaluation section with the corresponding metrics and methodology. revision: yes
Referee: [Abstract] Abstract (PROBE-IDENTIFY-INTERFACE-SERVE-DEPLOY pipeline): The claim that the LLM agent produces reliable, device-specific control and feedback code from OS probes (lsusb, dmesg, camera enumeration) alone rests on the untested assumption that such probes contain sufficient information to avoid hallucinations or protocol errors in real-time loops. No evidence is given that the persistent healing daemon succeeds for closed-loop control rather than simple syntax fixes, leaving the central systems claim unsupported.

Authors: The manuscript tests the assumption empirically through successful generation and deployment of functional tools on heterogeneous hardware, including the 6-DOF arm, using only the described OS probes. The persistent healing daemon is presented as part of the DEPLOY stage and is shown to maintain system operation. However, we acknowledge that the current text provides limited explicit evidence distinguishing its effectiveness in real-time closed-loop scenarios from basic syntax repairs. We will revise the relevant sections to include concrete examples of healing events observed during control tasks and clarify the daemon's role in supporting continuous operation. revision: partial

standing simulated objections not resolved

Detailed per-trial latency distributions, error bars, and raw logs, as the initial validation emphasized end-to-end functionality over exhaustive instrumentation.

Circularity Check

0 steps flagged

No circularity: systems description without derivations or self-referential fits

full rationale

The paper describes an engineering pipeline (PROBE-IDENTIFY-INTERFACE-SERVE-DEPLOY) for LLM-driven hardware onboarding and MCP tool generation. No mathematical derivations, equations, fitted parameters, or predictions appear in the abstract or described content. Claims rest on empirical validation across platforms and a 6-DOF arm rather than any reduction of outputs to inputs by construction. Architectural principles are stated as design choices, not derived results. No self-citations or uniqueness theorems are invoked as load-bearing. The work is self-contained as a systems contribution evaluated against external hardware benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on assumptions about LLM capabilities for code generation and hardware inference, plus access to OS and camera.

axioms (2)

domain assumption Language models can reliably generate correct hardware interface code from OS probes and device identification.
Invoked implicitly in the INTERFACE stage of the pipeline.
domain assumption The generated MCP server and tools will function correctly for closed-loop control without human intervention.
Stated in the validation claim for the robotic arm.

pith-pipeline@v0.9.0 · 5553 in / 1265 out tokens · 39741 ms · 2026-05-12T03:01:32.121350+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

five-stage pipeline—PROBE, IDENTIFY, INTERFACE, SERVE, DEPLOY—to discover connected devices, infer their capabilities, generate a Model Context Protocol (MCP) server with typed tools
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

persistent daemon then monitors the system, heals broken code, and perceives physical state through the camera tools it generated for itself

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 2 internal anchors

[1]

2014 , howpublished=

Terraform: Infrastructure as Code , author=. 2014 , howpublished=

work page 2014
[2]

2018 , howpublished=

Pulumi: Infrastructure as Code in Any Programming Language , author=. 2018 , howpublished=

work page 2018
[3]

2024 , howpublished=

Devin: The First AI Software Engineer , author=. 2024 , howpublished=

work page 2024
[4]

2024 , howpublished=

Model Context Protocol , author=. 2024 , howpublished=

work page 2024
[5]

2025 , howpublished=

Claude Code: Agentic Coding Tool , author=. 2025 , howpublished=

work page 2025
[6]

Jimenez, Carlos E and others , journal=

work page
[7]

AutoGen: Enabling Next-Gen

Wu, Qingyun and others , journal=. AutoGen: Enabling Next-Gen

work page
[8]

Computer , volume=

The Vision of Autonomic Computing , author=. Computer , volume=. 2003 , publisher=

work page 2003
[9]

Huntley, Geoff , year=

work page
[10]

2024 , howpublished=

Cursor: The AI Code Editor , author=. 2024 , howpublished=

work page 2024
[11]

LangChain: Building Applications with

Chase, Harrison , year=. LangChain: Building Applications with

work page
[12]

Transactions of the Association for Computational Linguistics , volume=

Lost in the Middle: How Language Models Use Long Contexts , author=. Transactions of the Association for Computational Linguistics , volume=

work page
[13]

1994 , publisher=

Markov Decision Processes: Discrete Stochastic Dynamic Programming , author=. 1994 , publisher=

work page 1994
[14]

Yang, John and Jimenez, Carlos E and Wettig, Alexander and Liber, Kilian and Narasimhan, Karthik and Press, Ofir , journal=

work page
[15]

Hong, Sirui and Zhuge, Mingchen and Chen, Jonathan and Zheng, Xiawu and Cheng, Yuheng and Wang, Jinlin and Zhang, Ceyao and others , journal=

work page
[16]

Packer, Charles and Wooders, Sarah and Lin, Kevin and Fang, Vivian and Patil, Shishir G and Stoica, Ion and Gonzalez, Joseph E , journal=

work page
[17]

The Vision of Autonomic Computing: Can

Khang, Zhiyang and others , journal=. The Vision of Autonomic Computing: Can

work page
[18]

arXiv preprint arXiv:2411.00186 , year=

Self-Healing Machine Learning: A Framework for Autonomous Adaptation in Real-World Environments , author=. arXiv preprint arXiv:2411.00186 , year=

work page arXiv
[19]

Code as Policies: Language Model Programs for Embodied Control

Code as Policies: Language Model Programs for Embodied Control , author=. arXiv preprint arXiv:2209.07753 , year=

work page internal anchor Pith review arXiv
[20]

2024 , howpublished=

Cadene, R. 2024 , howpublished=

work page 2024
[21]

arXiv preprint arXiv:2503.14734 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Sanidhya Vijayvargiya, Xuhui Zhou, Akhila Yerukola, Maarten Sap, and Graham Neubig

A Survey of using Large Language Models for Generating Infrastructure as Code , author=. arXiv preprint arXiv:2404.00227 , year=

work page arXiv
[23]

2017 , howpublished=

Software 2.0 , author=. 2017 , howpublished=

work page 2017
[24]

2025 , howpublished=

Software 3.0 , author=. 2025 , howpublished=

work page 2025
[25]

Wang, Xingyao and Chen, Boxuan and others , journal=

work page
[26]

Bouzenia, Islem and Devanbu, Premkumar and Pradel, Michael , journal=

work page
[27]

Cruz, Christopher , journal=

work page
[28]

Promptware engineering: Software engineering for prompt-enabled systems,

Promptware Engineering: Software Engineering for Prompt-Enabled Systems , author=. arXiv preprint arXiv:2503.02400 , year=

work page arXiv
[29]

2024 , howpublished=

Building Effective Agents , author=. 2024 , howpublished=

work page 2024
[30]

Qian, Chen and Liu, Wei and Liu, Hongzhang and Chen, Nuo and Dang, Yufan and Li, Jiahao and Yang, Cheng and Chen, Weize and Su, Yusheng and Cong, Xin and Xu, Juyuan and Li, Dahai and Liu, Zhiyuan and Sun, Maosong , journal=

work page
[31]

Specifications: The Missing Link to Making the Development of

Stoica, Ion and others , journal=. Specifications: The Missing Link to Making the Development of

work page
[32]

Deployability-Centric Infrastructure-as-Code Generation: Fail, Learn, Refine, and Succeed through

Zhang, Tianyi and Pan, Shidong and Zhang, Zejun and Xing, Zhenchang and Sun, Xiaoyu , booktitle=. Deployability-Centric Infrastructure-as-Code Generation: Fail, Learn, Refine, and Succeed through

work page
[33]

Nekrasov, Roman and Fossati, Stefano and Kumara, Indika and Tamburri, Damian Andrew and van den Heuvel, Willem-Jan , journal=

work page
[34]

Using a Feedback Loop for

Palavalli, Mayur Amarnath and Santolucito, Mark , journal=. Using a Feedback Loop for

work page
[35]

Fu, Max and Yu, Justin and El-Refai, Karim and Kou, Ethan and Xue, Haoru and Huang, Huang and Xiao, Wenli and Wang, Guanzhi and Li, Fei-Fei and Shi, Guanya and Wu, Jiajun and Sastry, Shankar and Zhu, Yuke and Goldberg, Ken and Fan, Linxi , journal=

work page
[36]

Li, Ruiying and Zhou, Yunlang and Zhu, Yuyao and Chen, Kylin and Wang, Jingyuan and Wang, Sukai and Hu, Kongtao and Yu, Minhui and Jiang, Bowen and Su, Zhan and Ma, Jiayao and He, Xin and Shen, Yongjian and Yang, Yang and Ren, Guanghui and Yao, Maoqing and Wang, Wenhao and Mu, Yao , journal=

work page
[37]

Cardenas, Irvin Steve and Arnett, Marcus Anthony and Yeo, Natalie Catherine and Sah, Lucky and Kim, Jong-Hoon , journal=

work page
[38]

Guan, Weifan and Hu, Qinghao and Xi, Huasen and Zhang, Chenxiao and Li, Aosheng and Cheng, Jian , journal=

work page
[39]

and Fang, Alan Y

Tsui, Brian Y. and Fang, Alan Y. and Hwu, Tiffany J. , journal=. Demonstration-Free Robotic Control via

work page
[40]

Software Engineer

Ralph Wiggum as a "Software Engineer" , author=. 2025 , month=

work page 2025