arxiv: 2604.04820 · v1 · submitted 2026-04-06 · 💻 cs.AI · cs.CL

Recognition: 2 theorem links

· Lean Theorem

ANX: Protocol-First Design for AI Agent Interaction with a Supporting 3EX Decoupled Architecture

Xu Mingze

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:38 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords AI agentsagent-native protocoltoken efficiencydecoupled architectureMCPGUI automationmulti-agent collaborationagent security

0 comments

The pith

ANX protocol unifies AI agent interactions via markup and decoupled architecture to cut token use and add native security.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes ANX as an open agent-native protocol and top-level framework that combines CLI, skills, and MCP components to fix high token consumption, fragmented interfaces, and weak security in current AI agent methods. It introduces four innovations: dense agent-native markup and CLI for efficient communication, dual-renderable skills for humans and agents, on-demand lightweight apps, and machine-executable SOPs for reliable long tasks. A supporting 3EX decoupled architecture with ANXHub separates concerns to support these features. Experiments on form-filling tasks with Qwen3.5-plus and GPT-4o report token reductions of 47 to 66 percent and execution time cuts of about 58 percent versus baselines. The design also routes sensitive data outside agent context and requires human confirmation for actions.

Core claim

ANX is an open extensible verifiable agent-native protocol integrating CLI, Skill, and MCP through a 3EX decoupled architecture with ANXHub. Its agent-native design uses ANX Config, Markup, and CLI for high information density and adaptability. Skills provide flexible dual rendering as executable instructions and human UI. MCP enables lightweight apps without pre-registration. ANX Markup produces unambiguous machine-executable SOPs for long-horizon tasks and multi-agent collaboration. Security is achieved via LLM-bypassed UI-to-Core paths and human-only confirmations. Form-filling tests confirm lower token counts and shorter runtimes than MCP or GUI approaches.

What carries the argument

ANX Markup for high-density executable instructions and the 3EX decoupled architecture that integrates CLI, Skill, and MCP while separating execution concerns.

If this is right

Long-horizon and multi-agent tasks gain reliability from unambiguous machine-executable SOPs.
Overall token consumption drops, reducing costs for LLM-based agent operations.
Security improves because sensitive data stays out of the agent context and actions require human approval.
Human-agent handoff becomes smoother through skills that render as both instructions and UI.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Widespread use could standardize how agents connect to tools and interfaces across platforms.
Application developers might design outputs natively in ANX Markup to gain automatic agent compatibility.
The approach invites testing on non-form tasks to check if efficiency holds when task complexity increases.

Load-bearing premise

The token and time savings from limited form-filling tests with two models will generalize to diverse real-world agent tasks and that the security features hold without introducing new vulnerabilities.

What would settle it

Running the token and execution time comparison on a wider range of tasks such as multi-step web navigation or collaborative planning, or testing whether an agent can access protected data despite the UI-to-Core bypass.

Figures

Figures reproduced from arXiv: 2604.04820 by Xu Mingze.

**Figure 2.** Figure 2: ANX Protocol State Machine 3.2.5 Key Components and Interaction Flow [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: ANX Sensitive Information Handling Mechanism [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: ANX Human-Validated Handling Mechanism • Untrusted ANX Core – Using a tampered or untrusted ANX Core compromises UI-Core communication, leading to sensitive data leakage or unauthorised execution. • Non-official ANX Hub – Connecting to an untrusted ANX Hub risks malicious tool injection, task misrouting, or semantic information leakage. • LLM-induced social engineering – ANX cannot prevent LLMs from ignori… view at source ↗

**Figure 5.** Figure 5: ANX SOP Mechanism Runtime Scheduling. ANX SOP’s scheduling leverages the synergistic interaction of the 3EX layers: • The Expression layer (ANX Markup) serves as the SOP definition carrier, supporting static step specification, dynamic conditional branch configuration, and step dependency setting to adapt to complex scenarios. • The Exchange layer (ANXHub) implements dynamic semantic tool discovery over th… view at source ↗

**Figure 6.** Figure 6: Human & Multi-Agent Collaborative ANX SOP Mechanism [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Create Job Account Form Example Figure 8: GUI Flow [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 9.** Figure 9: MCP Flow Figure 10: ANX Flow [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

read the original abstract

AI agents, autonomous digital actors, need agent-native protocols; existing methods include GUI automation and MCP-based skills, with defects of high token consumption, fragmented interaction, inadequate security, due to lacking a unified top-level framework and key components, each independent module flawed. To address these issues, we present ANX, an open, extensible, verifiable agent-native protocol and top-level framework integrating CLI, Skill, MCP, resolving pain points via protocol innovation, architectural optimization and tool supplementation. Its four core innovations: 1) Agent-native design (ANX Config, Markup, CLI) with high information density, flexibility and strong adaptability to reduce tokens and eliminate inconsistencies; 2) Human-agent interaction combining Skill's flexibility for dual rendering as agent-executable instructions and human-readable UI; 3) MCP-supported on-demand lightweight apps without pre-registration; 4) ANX Markup-enabled machine-executable SOPs eliminating ambiguity for reliable long-horizon tasks and multi-agent collaboration. As the first in a series, we focus on ANX's design, present its 3EX decoupled architecture with ANXHub and preliminary feasibility analysis and experimental validation. ANX ensures native security: LLM-bypassed UI-to-Core communication keeps sensitive data out of agent context; human-only confirmation prevents automated misuse. Form-filling experiments with Qwen3.5-plus/GPT-4o show ANX reduces tokens by 47.3% (Qwen3.5-plus) and 55.6% (GPT-4o) vs MCP-based skills, 57.1% (Qwen3.5-plus) and 66.3% (GPT-4o) vs GUI automation, and shortens execution time by 58.1% and 57.7% vs MCP-based skills.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ANX proposes a new integrated protocol for AI agents with token and time savings in form-filling tests, but the evidence does not yet cover the long-horizon and multi-agent cases the design claims to fix.

read the letter

The main takeaway is that this paper introduces ANX as a protocol-first approach for AI agent interactions, built around a custom markup language, CLI and skill integration, and a 3EX decoupled architecture with ANXHub. It reports 47-66% token reductions and roughly 58% faster execution versus MCP-based skills and GUI automation in form-filling experiments using Qwen3.5-plus and GPT-4o. The security angle, where sensitive data bypasses the LLM, is a practical addition. The dual human-agent rendering of skills and the push for machine-executable SOPs to reduce ambiguity are clear design choices aimed at real deployment pain points. The architecture description looks clean and extensible on paper. The work is honest about starting with feasibility analysis rather than claiming full solutions. The main limitation is that all the numbers come from narrow form-filling tasks. The abstract and design sections emphasize benefits for long-horizon reliability and multi-agent collaboration, yet no measurements appear for those scenarios. If the efficiency gains are tied to high-density simple interactions, the broader argument that ANX resolves the core defects of existing methods does not follow from the data shown. Test setup details are also light, which makes it harder to judge how far the results travel. This paper is aimed at researchers and engineers working on agent tooling, protocols, and software integration. Readers looking for concrete ideas on reducing overhead in agent deployments will find usable concepts here. It deserves peer review because the protocol combination is new and the initial results are specific enough to generate useful feedback, even though additional experiments on harder tasks would strengthen it.

Referee Report

3 major / 2 minor

Summary. The paper introduces ANX, an open agent-native protocol and top-level framework that integrates CLI, Skill, and MCP components via a 3EX decoupled architecture (with ANXHub) to address high token consumption, fragmented interactions, and security shortcomings in GUI automation and MCP-based skills. Core innovations include ANX Config/Markup/CLI for high-density agent-native design, dual-rendered human-agent skills, on-demand MCP apps, and Markup-enabled machine-executable SOPs for long-horizon and multi-agent reliability. The manuscript presents the architecture, security features (LLM-bypassed UI-to-Core paths and human confirmation), and preliminary experimental validation limited to form-filling tasks with Qwen3.5-plus and GPT-4o, reporting token reductions of 47.3–66.3% and time reductions of ~58% versus baselines.

Significance. If the protocol design and security properties generalize, ANX could offer a unified, extensible foundation for more efficient and verifiable AI agent interactions, with explicit strengths in its protocol-first approach, open extensibility, and LLM-bypassed security mechanisms that keep sensitive data out of agent context. The preliminary experiments provide concrete quantitative evidence of token and time savings in at least one task class, which is a positive step toward falsifiable claims.

major comments (3)

[Abstract] Abstract: The headline performance claims (47.3% token reduction for Qwen3.5-plus vs MCP-based skills, 55.6% vs GPT-4o, up to 66.3% vs GUI automation, and ~58% time reduction) rest exclusively on form-filling experiments; no quantitative results are reported for the long-horizon tasks or multi-agent collaboration that the design section asserts are enabled by ANX Markup's elimination of ambiguity. This gap means the central assertion that ANX resolves the general defects of existing methods does not follow from the presented evidence.
[Abstract] Abstract (experimental validation paragraph): The reported token and time savings lack any description of test setup, number of trials, statistical measures (e.g., variance or significance tests), full baseline implementations, or task complexity metrics. Without these, the quantitative claims cannot be assessed for robustness or reproducibility, directly undermining the feasibility analysis that is positioned as supporting the protocol's broader applicability.
[Design section] Design section (four core innovations): The claim that ANX Markup enables 'reliable long-horizon tasks and multi-agent collaboration' by eliminating ambiguity is presented as a key innovation, yet the manuscript provides only descriptive architecture details and no formal verification, simulation results, or even qualitative case studies on such tasks. This leaves the load-bearing assertion about general defect resolution unsupported beyond the narrow form-filling scope.

minor comments (2)

[Abstract] The abstract and introduction would benefit from explicit cross-references to the specific sections describing the 3EX architecture and ANXHub implementation to improve readability for readers unfamiliar with the decoupled design.
[Introduction] Notation for the four core innovations is listed numerically but not tied to later sections or figures; adding section pointers would clarify how each innovation maps to the 3EX components.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback. We agree that the current manuscript is a preliminary design paper with experimental validation limited to form-filling tasks, and we will revise to better scope the claims, add methodological details to the abstract, and distinguish design assertions from empirical results. Our point-by-point responses follow.

read point-by-point responses

Referee: [Abstract] Abstract: The headline performance claims (47.3% token reduction for Qwen3.5-plus vs MCP-based skills, 55.6% vs GPT-4o, up to 66.3% vs GUI automation, and ~58% time reduction) rest exclusively on form-filling experiments; no quantitative results are reported for the long-horizon tasks or multi-agent collaboration that the design section asserts are enabled by ANX Markup's elimination of ambiguity. This gap means the central assertion that ANX resolves the general defects of existing methods does not follow from the presented evidence.

Authors: We agree that the quantitative results are confined to form-filling tasks. The manuscript is explicitly the first in a series and centers on protocol design and 3EX architecture, using form-filling as an initial feasibility check. Assertions about long-horizon tasks and multi-agent collaboration stem from ANX Markup's production of machine-executable SOPs that remove ambiguity, but these are design properties rather than empirically demonstrated outcomes in this paper. We will revise the abstract to state that the reported savings are preliminary and specific to form-filling, while qualifying the broader benefits as enabled by the architecture and slated for future validation. revision: partial
Referee: [Abstract] Abstract (experimental validation paragraph): The reported token and time savings lack any description of test setup, number of trials, statistical measures (e.g., variance or significance tests), full baseline implementations, or task complexity metrics. Without these, the quantitative claims cannot be assessed for robustness or reproducibility, directly undermining the feasibility analysis that is positioned as supporting the protocol's broader applicability.

Authors: The referee correctly notes that the abstract's summary of results is insufficiently detailed. Although the full manuscript contains an experimental section, the abstract paragraph is too brief. We will revise it to include summaries of the test setup, number of trials, any statistical measures (means and variance where available), baseline implementation details, and task complexity metrics drawn from the experimental section. This will improve transparency and allow readers to evaluate reproducibility. revision: yes
Referee: [Design section] Design section (four core innovations): The claim that ANX Markup enables 'reliable long-horizon tasks and multi-agent collaboration' by eliminating ambiguity is presented as a key innovation, yet the manuscript provides only descriptive architecture details and no formal verification, simulation results, or even qualitative case studies on such tasks. This leaves the load-bearing assertion about general defect resolution unsupported beyond the narrow form-filling scope.

Authors: We concur that support for long-horizon and multi-agent reliability is currently descriptive only. ANX Markup is intended to generate machine-executable SOPs that eliminate natural-language ambiguity, which logically supports reliable execution, but no simulations, case studies, or verification are provided. We will revise the design section to present this as an architectural property with explicit caveats that empirical validation lies outside the scope of this preliminary paper, and we will add a limitations/future-work subsection to delineate what has been shown versus what is planned. revision: yes

standing simulated objections not resolved

We cannot supply new quantitative or qualitative results for long-horizon tasks or multi-agent collaboration in this revision, as those experiments have not been performed and are reserved for follow-up papers in the series.

Circularity Check

0 steps flagged

No circularity; empirical comparisons are independent of design claims

full rationale

The paper describes a protocol design (ANX Config, Markup, CLI, 3EX architecture) and reports direct experimental measurements of token/time reductions versus MCP and GUI baselines in form-filling tasks. No equations, fitted parameters, or derivations are present that reduce to the inputs by construction. No self-citations are invoked to justify uniqueness or load-bearing premises. The performance numbers are measured outputs from external model runs, not renamed fits or self-referential definitions. Lack of long-horizon results is an evidence gap, not a circularity in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claims rest on the assumption that a unified protocol can resolve fragmentation and security issues in agent interactions, plus the introduction of several new named components without external validation.

axioms (1)

domain assumption Existing GUI automation and MCP-based skills suffer from high token use, fragmentation, and security gaps due to lack of unified framework.
Stated directly in the abstract as motivation for ANX.

invented entities (3)

ANX protocol no independent evidence
purpose: Unified agent-native interaction standard
Newly defined framework integrating multiple interaction modes.
3EX decoupled architecture no independent evidence
purpose: Supporting structure with ANXHub for the protocol
Introduced as the architectural backbone.
ANX Markup no independent evidence
purpose: Machine-executable SOP language for reliable tasks
New markup format claimed to eliminate ambiguity.

pith-pipeline@v0.9.0 · 5628 in / 1477 out tokens · 44687 ms · 2026-05-10T18:38:38.464307+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ANX Markup... high information density... 3EX decoupled architecture... UI-to-Core communication... SOP ANX Config with sources/targets
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Form-filling experiments... token reduction 47.3–66.3 %

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 13 canonical work pages · 2 internal anchors

[1]

Anthropic. (2024). Model Context Protocol (MCP). https://github.com/modelcontextprotocol. Accessed: 2026-04-05

2024
[2]

Ben Hassouna, A., Chaari, H., & Belhaj, I. (2026). LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework. Information Fusion, 127, 103865

2026
[3]

Chen, Q., et al. (2025). Integrated Design and Governance of Agentic AI Systems through Adaptive Information Modulation. arXiv preprint arXiv:2409.10372v4

work page arXiv 2025
[4]

Chen, J., Li, Z., Jiang, Y., et al. (2026). The Era of Skill Growing Agents: Shift from Task Execution to Skill Growth. TechRxiv. Preprint. doi:10.36227/techrxiv.177041975.54183859

work page doi:10.36227/techrxiv.177041975.54183859 2026
[5]

Dhuliawala, S., Komeili, M., Xu, J., Raileanu, R., Li, X., Celikyilmaz, A., & Weston, J. (2024). Chain-of-Verification Reduces Hallucination in Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2024 (pp. 3563–3578). Bangkok, Thailand: ACL

2024
[6]

Ding, Z., et al. (2025). Network and Systems Performance Characterization of MCP-Enabled LLM Agents. arXiv preprint arXiv:2511.07426

work page arXiv 2025
[7]

Dochkina, V. (2026). Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures. arXiv preprint arXiv:2603.28990

work page arXiv 2026
[8]

Gao, Y., Li, Z., Yuanyuanyuan, et al. (2026). SkillReducer: Optimizing LLM Agent Skills for Token Efficiency. arXiv preprint arXiv:2603.29919v1

work page arXiv 2026
[9]

Grab Engineering. (2025). Introducing the SOP-driven LLM agent frameworks. Grab Engineering Blog. https://engineering.grab.com/introducing-the-sop-drive-llm-agent-framework. Accessed: 2026-04-06

2025
[10]

R., et al

Kasibatla, S. R., et al. (2025). The Command Line GUIde: Graphical Interfaces from Man Pages via AI. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (pp. 1–5). IEEE

2025
[11]

Li, B., Wang, Y., Fei, H., et al. (2025). FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents. arXiv preprint arXiv:2506.01520v1

work page arXiv 2025
[12]

Luo, Z., et al. (2025). MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers. arXiv preprint arXiv:2508.14704

work page arXiv 2025
[13]

Mudunuri, S., et al. (2026). Semantic Tool Discovery for Large Language Models: A Vector-Based Approach to MCP Tool Selection. arXiv preprint arXiv:2603.20313

work page arXiv 2026
[14]

G., Zhang, T., Wang, X., & Gonzalez, J

Patil, S. G., Zhang, T., Wang, X., & Gonzalez, J. E. (2024). Gorilla: Large Language Model Connected with Massive APIs. In Advances in Neural Information Processing Systems, Vol. 37 (pp. 126544–126565). NeurIPS

2024
[15]

What Did It Actually Do?: Understanding risk awareness and traceability for computer-use agents,

Peng, Z. (2026). “What Did It Actually Do?”: Understanding Risk Awareness and Traceability for Computer-Use Agents. arXiv preprint arXiv:2603.28551

work page arXiv 2026
[16]

Rosenberg, J., White, P., & Jennings, C. F. (2025). CHEQ: A Protocol for Confirmation AI Agent Decisions with Human in the Loop (HITL). IETF Internet-Draft draft-rosenberg-aiproto-cheq-00. https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/00/

2025
[17]

Stähle, T., et al. (2026). VACP: Visual Analytics Context Protocol. arXiv preprint arXiv:2603.29322

work page arXiv 2026
[18]

Wang, Z., Wang, Y., Liu, X., et al. (2025). AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) (pp. 24013–24035). ACL

2025
[19]

Z., Shao, Y., Shaikh, O., et al

Wang, Z. Z., Shao, Y., Shaikh, O., et al. (2025). How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations. Carnegie Mellon University & Stanford University. Technical report

2025
[20]

H., Le, Q

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, F., Chi, E. H., Le, Q. V., & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems, Vol. 35. NeurIPS

2022
[21]

Xie, T., Ran, D.-Z., Cao, Y., et al. (2026). From User Operations to Agentic Automation: Toward Intent-Oriented Software in the LLM Era. Journal of Computer Science and Technology, 41(2), 1–18

2026
[22]

Yuan, D., et al. (2026). Beyond Message Passing: Toward Semantically Aligned Agent Communication. arXiv preprint arXiv:2604.02369

work page internal anchor Pith review Pith/arXiv arXiv 2026
[23]

Zhang, C., He, S., Li, L., Qin, S., Kang, Y., Lin, Q., Rajmohan, S., & Zhang, D. (2025). API Agents vs. GUI Agents: Divergence and Convergence. arXiv preprint arXiv:2503.11069v2

work page arXiv 2025
[24]

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

Zheng, B., Fatemi, M. Y., Jin, X., Wang, Z. Z., Gandhi, A., Song, Y., Gu, Y., Srinivasa, J., Liu, G., Neubig, G., et al. (2025). SkillWeaver: Web Agents Can Self-Improve by Discovering and Honing Skills. arXiv preprint arXiv:2504.07079

work page internal anchor Pith review arXiv 2025