arxiv: 2604.11623 · v3 · submitted 2026-04-13 · 💻 cs.AI · cs.SE

Recognition: unknown

Context Kubernetes: Declarative Orchestration of Enterprise Knowledge for Agentic AI Systems

Charafeddine Mouzouni

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:39 UTC · model grok-4.3

classification 💻 cs.AI cs.SE

keywords agentic AIknowledge orchestrationdeclarative governanceRBACenterprise securitypermission modelsreconciliation loopTLA+ verification

0 comments

The pith

A three-tier permission model inspired by Kubernetes blocks all tested AI agent knowledge leaks that RBAC cannot detect.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Context Kubernetes as an architecture for managing enterprise knowledge across AI agents by treating orchestration as a declarative, reconciliation-driven problem analogous to container management. It defines six abstractions, YAML manifests for knowledge architecture as code, and a permission system that keeps agent authority strictly below human authority. Experiments on synthetic data compare baselines and show each added layer improves outcomes: ACL filtering stops cross-domain leaks, intent routing cuts noise by 19 points, and only the full three-tier model stops every one of the five attack scenarios, including an agent emailing confidential pricing. Model checking confirms safety across millions of states, while a survey finds major platforms lack isolated approval channels. This approach matters for organizations deploying autonomous agents because it provides structured governance where ad-hoc controls fall short.

Core claim

The paper claims that delivering the right knowledge to the right agent with the right permissions and freshness at enterprise scale requires six core abstractions, a YAML declarative manifest, a reconciliation loop, and a three-tier permission model where agent authority is always a strict subset of human authority. On synthetic seed data, ungoverned RAG, ACL-filtered retrieval, and RBAC-aware routing each fail to block at least one of five attack scenarios, while the full architecture blocks all of them; the scenario RBAC misses involves an agent sending confidential pricing via email. TLA+ verification finds zero safety violations across 4.6 million reachable states, and four properties—d

What carries the argument

The three-tier agent permission model, where agent authority is always a strict subset of human authority, together with YAML-based declarative manifests for knowledge architecture as code and an automated reconciliation loop.

If this is right

Enterprises gain the ability to run multiple agents with controlled knowledge access without cross-domain leaks or unauthorized actions.
Adding intent routing to retrieval reduces irrelevant noise by 19 percentage points compared to ACL filtering alone.
Safety properties of the orchestration system hold across 4.6 million reachable states with no violations.
Major platforms from Microsoft, Salesforce, AWS, and Google lack architectural isolation of agent approval channels.
The four identified differences from container orchestration explain why knowledge flows require dedicated governance mechanisms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The model could be extended with runtime action monitoring to catch leaks that static permissions miss.
Similar declarative orchestration principles might improve governance for non-AI enterprise data pipelines.
The four harder properties of context orchestration point to new challenges in maintaining knowledge freshness under dynamic agent workloads.
Validation on real enterprise datasets would test whether the synthetic attack results generalize beyond the chosen scenarios.

Load-bearing premise

The synthetic seed data and the five chosen attack scenarios are representative of real enterprise knowledge flows, permission boundaries, and adversarial behaviors.

What would settle it

A documented case of an AI agent leaking confidential enterprise data while running under the three-tier permission model and reconciliation loop on production-like data.

read the original abstract

We introduce Context Kubernetes, an architecture for orchestrating enterprise knowledge in agentic AI systems, with a prototype implementation and eight experiments. The core observation is that delivering the right knowledge, to the right agent, with the right permissions, at the right freshness -- across an entire organization -- is structurally analogous to the container orchestration problem Kubernetes solved a decade ago. We formalize six core abstractions, a YAML-based declarative manifest for knowledge-architecture-as-code, a reconciliation loop, and a three-tier agent permission model where agent authority is always a strict subset of human authority. On synthetic seed data, we compare four governance baselines of increasing strength: ungoverned RAG, ACL-filtered retrieval, RBAC-aware routing, and the full architecture. Each layer contributes a different capability: ACL filtering eliminates cross-domain leaks, intent routing reduces noise by 19 percentage points, and only the three-tier model blocks all five tested attack scenarios -- the one attack RBAC misses is an agent sending confidential pricing via email, which RBAC cannot distinguish from ordinary email. TLA+ model-checking verifies safety properties across 4.6 million reachable states with zero violations. A survey of four major platforms (Microsoft, Salesforce, AWS, Google) documents that none architecturally isolates agent approval channels. We identify four properties that make context orchestration harder than container orchestration, and argue these make the solution more valuable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Context Kubernetes, an architecture for declarative orchestration of enterprise knowledge in agentic AI systems, modeled after Kubernetes container orchestration. It defines six core abstractions, a YAML-based knowledge-architecture-as-code manifest, a reconciliation loop, and a three-tier agent permission model in which agent authority is always a strict subset of human authority. On synthetic seed data, four governance baselines (ungoverned RAG, ACL-filtered retrieval, RBAC-aware routing, and the full architecture) are compared; the full model is reported to block all five tested attack scenarios while RBAC fails on one (an agent sending confidential pricing via email). TLA+ model checking verifies safety properties over 4.6 million reachable states with zero violations, and a survey of four major platforms (Microsoft, Salesforce, AWS, Google) finds none architecturally isolate agent approval channels.

Significance. If the central claims hold under broader validation, the work could provide a practical, declarative framework for managing knowledge flow and permissions in multi-agent enterprise AI systems, addressing a documented gap in existing platforms. The TLA+ formal verification is a clear strength, supplying machine-checked safety guarantees rather than relying solely on empirical testing. The analogy to Kubernetes and the identification of four properties that differentiate context orchestration from container orchestration are conceptually useful. However, the current evaluation's dependence on synthetic data and a small set of hand-selected scenarios substantially limits the assessed significance and generalizability to production environments.

major comments (2)

[Experimental Evaluation] The central empirical claim—that only the three-tier model blocks all five attack scenarios while RBAC-aware routing fails on the confidential-pricing-email case—is demonstrated exclusively on synthetic seed data (abstract and experimental section). No validation is provided that the generated flows reproduce the granularity of real enterprise permission boundaries, content sensitivity labels, or the ways agents combine permitted actions, which directly bears on whether the observed gap is architectural or an artifact of the test construction.
[Formal Verification] TLA+ model checking is cited as verifying safety properties across 4.6 million reachable states with zero violations, yet the manuscript provides no details on the TLA+ specification itself, the exact safety properties checked, or how the three-tier permission model is encoded in the state machine (abstract). This information is load-bearing for the formal safety claims that support the architecture's core contribution.

minor comments (2)

[Abstract] The abstract states 'eight experiments' but the reported results center on four governance baselines and five attack scenarios; the remaining experiments should be explicitly enumerated or summarized for completeness.
[Core Abstractions] The six core abstractions are introduced at a high level; providing concise formal definitions or pseudocode alongside the YAML manifest description would improve precision and reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We appreciate the positive assessment of the TLA+ verification and the conceptual framing. We address each major comment below and will revise the manuscript accordingly to improve clarity and transparency.

read point-by-point responses

Referee: [Experimental Evaluation] The central empirical claim—that only the three-tier model blocks all five attack scenarios while RBAC-aware routing fails on the confidential-pricing-email case—is demonstrated exclusively on synthetic seed data (abstract and experimental section). No validation is provided that the generated flows reproduce the granularity of real enterprise permission boundaries, content sensitivity labels, or the ways agents combine permitted actions, which directly bears on whether the observed gap is architectural or an artifact of the test construction.

Authors: We agree that the evaluation is limited by its use of synthetic seed data and that this constrains claims about generalizability to production environments. The synthetic data was intentionally constructed to isolate the specific architectural distinctions between governance layers—particularly the scenarios where RBAC cannot distinguish intent in actions such as sending confidential pricing via email—rather than to simulate full enterprise deployments. The attack scenarios draw from documented real-world agentic AI threat models. To address the concern, we will revise the experimental section to include a detailed description of the data generation methodology, how permission boundaries and sensitivity labels were modeled, and an explicit limitations subsection discussing the scope and assumptions of the evaluation. This will clarify that the results demonstrate the necessity of the three-tier model for certain attack classes without overstating empirical breadth. revision: partial
Referee: [Formal Verification] TLA+ model checking is cited as verifying safety properties across 4.6 million reachable states with zero violations, yet the manuscript provides no details on the TLA+ specification itself, the exact safety properties checked, or how the three-tier permission model is encoded in the state machine (abstract). This information is load-bearing for the formal safety claims that support the architecture's core contribution.

Authors: We agree that the current manuscript lacks sufficient detail on the TLA+ model, which is required to support the formal safety claims. In the revision we will add a dedicated subsection (or appendix) that fully describes the TLA+ specification. This will include the state-machine encoding of the three-tier permission model, the exact safety properties and invariants verified (such as strict subset authority and approval-channel isolation), and the model-checking configuration and parameters that produced the 4.6 million reachable states with zero violations. We will also make the TLA+ specification available as supplementary material to allow independent reproduction and verification. revision: yes

Circularity Check

0 steps flagged

No circularity: experimental comparisons and model checking are independent of inputs

full rationale

The paper's central results derive from direct experimental comparison of four governance baselines on synthetic seed data plus TLA+ model-checking of an abstract state machine across 4.6 million states. No equations, fitted parameters, or predictions are presented that reduce by construction to the inputs; the three-tier model claim is an observed outcome of the test harness rather than a self-definitional or self-cited necessity. The architecture is introduced via new formal abstractions and a declarative manifest without load-bearing self-citations or ansatzes smuggled from prior author work. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on the unproven analogy to container orchestration and the assumption that the three-tier permission model suffices for all realistic agent behaviors.

axioms (2)

domain assumption Delivering the right knowledge to the right agent with the right permissions and freshness is structurally analogous to the container orchestration problem solved by Kubernetes.
Stated as the core observation in the abstract.
domain assumption Agent authority is always a strict subset of human authority.
Explicit part of the three-tier permission model.

invented entities (2)

Context Kubernetes no independent evidence
purpose: Declarative orchestration architecture for enterprise knowledge in agentic AI.
New system name and framework introduced by the paper.
Six core abstractions no independent evidence
purpose: Formal building blocks for the knowledge architecture.
Invented formalization presented without prior citation.

pith-pipeline@v0.9.0 · 5545 in / 1607 out tokens · 61498 ms · 2026-05-10T15:39:01.779572+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 15 canonical work pages · 1 internal anchor

[1]

Abou Ali, F

Mohamad Abou Ali, Fadi Dornaika, and Jinan Charafeddine. Agentic AI : A comprehensive survey of architectures, applications, and future directions. Artificial Intelligence Review, 59 0 (11), 2025. doi:10.1007/s10462-025-11422-4

work page doi:10.1007/s10462-025-11422-4 2025
[2]

The orchestration of multi-agent systems: Architectures, protocols, and enterprise adoption

Apoorva Adimulam, Rajesh Gupta, and Sumit Kumar. The orchestration of multi-agent systems: Architectures, protocols, and enterprise adoption. arXiv preprint arXiv:2601.13671, 2026

work page arXiv 2026
[3]

Maryam Alavi and Dorothy E. Leidner. Review: Knowledge management and knowledge management systems: Conceptual foundations and research issues. MIS Quarterly, 25 0 (1): 0 107--136, 2001. doi:10.2307/3250961

work page doi:10.2307/3250961 2001
[4]

AWS Bedrock agents: How agents work

Amazon Web Services . AWS Bedrock agents: How agents work. AWS Documentation, 2026. https://docs.aws.amazon.com/bedrock/latest/userguide/agents-how.html

2026
[5]

and Bates, Stephen , title =

Anastasios N. Angelopoulos and Stephen Bates. Conformal prediction: A gentle introduction. Foundations and Trends in Machine Learning, 16 0 (4): 0 494--591, 2023. doi:10.1561/2200000101

work page doi:10.1561/2200000101 2023
[6]

Building effective agents

Anthropic . Building effective agents. Anthropic Research, 2024. https://www.anthropic.com/research/building-effective-agents

2024
[7]

Model context protocol: Specification

Anthropic . Model context protocol: Specification. Model Context Protocol, 2025. Version 2025-11-25. https://modelcontextprotocol.io/

2025
[8]

Borg, omega, and Kubernetes

Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes. Borg, omega, and Kubernetes . ACM Queue, 14 0 (1): 0 70--93, 2016. doi:10.1145/2898442.2898444

work page doi:10.1145/2898442.2898444 2016
[9]

FedE : Embedding knowledge graphs in federated setting

Mingyang Chen, Wen Zhang, Zonggang Yuan, Yantao Jia, and Huajun Chen. FedE : Embedding knowledge graphs in federated setting. In Proceedings of the 10th International Joint Conference on Knowledge Graphs (IJCKG '21). ACM, 2021. doi:10.1145/3502223.3502233

work page doi:10.1145/3502223.3502233 2021
[10]

AI tools trigger SaaS software stocks selloff

CNBC . AI tools trigger SaaS software stocks selloff. CNBC, 2026. February 6, 2026. Nearly \ 300B in market value erased from application software sector

2026
[11]

CrewAI : Framework for orchestrating role-playing autonomous AI agents

CrewAI . CrewAI : Framework for orchestrating role-playing autonomous AI agents. CrewAI Documentation, 2025. https://docs.crewai.com/introduction

2025
[12]

Data Mesh: Delivering Data-Driven Value at Scale

Zhamak Dehghani. Data Mesh: Delivering Data-Driven Value at Scale. O'Reilly Media, 2022. ISBN 978-1-492-09239-1

2022
[13]

Kubernetes Operators: Automating the Container Orchestration Platform

Jason Dobies and Joshua Wood. Kubernetes Operators: Automating the Container Orchestration Platform. O'Reilly Media, 2020

2020
[14]

Domain-Driven Design: Tackling Complexity in the Heart of Software

Eric Evans. Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison-Wesley, 2003. ISBN 978-0-321-12521-7

2003
[15]

Gartner predicts 40 percent of enterprise apps will feature task-specific AI agents by 2026

Gartner . Gartner predicts 40 percent of enterprise apps will feature task-specific AI agents by 2026. Gartner Press Release, 2025 a . August 26, 2025

2026
[16]

Gartner predicts over 40 percent of agentic AI projects will be canceled by end of 2027

Gartner . Gartner predicts over 40 percent of agentic AI projects will be canceled by end of 2027. Gartner Press Release, 2025 b . June 25, 2025

2027
[17]

A2A : A new era of agent interoperability

Google . A2A : A new era of agent interoperability. Google Developers Blog, 2025. Agent-to-Agent protocol donated to Linux Foundation. 50+ partners

2025
[18]

Global AI agents market size report, 2025--2030

Grand View Research . Global AI agents market size report, 2025--2030. Grand View Research, 2025. Market surpassed \ 9B in 2026, projected from \ 7.6B (2025) to \ 80--100B by 2030

2025
[19]

Joseph, Randy Katz, Scott Shenker, and Ion Stoica

Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. Mesos : A platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI '11), pages 295--308. USENIX Association, 2011

2011
[20]

Junqueira, and Benjamin Reed

Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. ZooKeeper : Wait-free coordination for internet-scale systems. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC '10). USENIX Association, 2010

2010
[21]

Why context engineering will define the next era of enterprise AI

InfoWorld . Why context engineering will define the next era of enterprise AI . InfoWorld, 2026. https://www.infoworld.com/article/4084378/why-context-engineering-will-define-the-next-era-of-enterprise-ai.html

work page arXiv 2026
[22]

Time, clocks, and the ordering of events in a distributed system

Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21 0 (7): 0 558--565, 1978. doi:10.1145/359545.359563

work page doi:10.1145/359545.359563 1978
[23]

LangGraph : Build stateful multi-actor applications with LLMs

LangChain . LangGraph : Build stateful multi-actor applications with LLMs . GitHub, 2025 a . https://github.com/langchain-ai/langgraph

2025
[24]

LangSmith : LLM application observability

LangChain . LangSmith : LLM application observability. LangChain, 2025 b . https://smith.langchain.com/

2025
[25]

Langfuse: Open-source LLM engineering platform

Langfuse . Langfuse: Open-source LLM engineering platform. Langfuse, 2025. Acquired by ClickHouse, January 2026

2025
[26]

Microsoft Copilot Studio : Fundamentals

Microsoft . Microsoft Copilot Studio : Fundamentals. Microsoft Learn, 2026. https://learn.microsoft.com/en-us/microsoft-copilot-studio/fundamentals-what-is-copilot-studio

2026
[27]

AutoGen : A framework for building multi-agent conversational systems

Microsoft Research . AutoGen : A framework for building multi-agent conversational systems. GitHub, 2025. https://github.com/microsoft/autogen

2025
[28]

From Autonomous Agents to Accountable Systems: The Enterprise Playbook for High-Trust, High- ROI AI

Charafeddine Mouzouni. From Autonomous Agents to Accountable Systems: The Enterprise Playbook for High-Trust, High- ROI AI . Cohorte AI, 2025. October 2025. https://www.cohorte.co/playbooks/from-autonomous-agents-to-accountable-systems

2025
[29]

The Enterprise Agentic Platform: Architecture, Patterns, and the AI Operating System

Charafeddine Mouzouni. The Enterprise Agentic Platform: Architecture, Patterns, and the AI Operating System . Cohorte AI, 2026 a . https://www.cohorte.co/playbooks/the-enterprise-agentic-platform

2026
[30]

Mapping the Exploitation Surface: A 10,000-Trial Taxonomy of What Makes LLM Agents Exploit Vulnerabilities

Charafeddine Mouzouni. Mapping the exploitation surface: A 10,000-trial taxonomy of what makes LLM agents exploit vulnerabilities. arXiv preprint arXiv:2604.04561, 2026 b

work page internal anchor Pith review Pith/arXiv arXiv 2026
[31]

Black-box reliability certification for AI agents via self-consistency sampling and conformal calibration

Charafeddine Mouzouni. Black-box reliability certification for AI agents via self-consistency sampling and conformal calibration. arXiv preprint arXiv:2602.21368, 2026 c

work page arXiv 2026
[32]

Artificial intelligence risk management framework ( AI RMF 1.0)

National Institute of Standards and Technology . Artificial intelligence risk management framework ( AI RMF 1.0). Technical Report NIST AI 100-1, U.S.\ Department of Commerce, 2023

2023
[33]

The Knowledge-Creating Company: How Japanese Companies Create the Dynamics of Innovation

Ikujiro Nonaka and Hirotaka Takeuchi. The Knowledge-Creating Company: How Japanese Companies Create the Dynamics of Innovation . Oxford University Press, 1995

1995
[34]

NeMo guardrails

NVIDIA . NeMo guardrails. NVIDIA Developer, 2025. Open-source guardrails framework using Colang DSL

2025
[35]

Context engineering will decide enterprise AI success

Neal Ramasamy. Context engineering will decide enterprise AI success. BigDATAwire / HPCwire, 2026. February 19, 2026. Cognizant CIO on the emergence of context engineering

2026
[36]

Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems,

Shaina Raza, Ranjan Sapkota, Manoj Karkee, and Christos Emmanouilidis. TRiSM for agentic AI : A review of trust, risk, and security management in LLM -based agentic multi-agent systems. arXiv preprint arXiv:2506.04133, 2025

work page arXiv 2025
[37]

Enterprise agentic architecture

Salesforce Architects . Enterprise agentic architecture. Salesforce Architects, 2026. https://architect.salesforce.com/fundamentals/enterprise-agentic-architecture

2026
[38]

Inside the brain of Agentforce : Revealing the Atlas reasoning engine

Salesforce Engineering . Inside the brain of Agentforce : Revealing the Atlas reasoning engine. Salesforce Engineering Blog, 2026. https://engineering.salesforce.com/inside-the-brain-of-agentforce-revealing-the-atlas-reasoning-engine/

2026
[39]

Sandhu, E

Ravi S. Sandhu, Edward J. Coyne, Hal L. Feinstein, and Charles E. Youman. Role-based access control models. IEEE Computer, 29 0 (2): 0 38--47, 1996. doi:10.1109/2.485845

work page doi:10.1109/2.485845 1996
[40]

Verma, L

Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. Large-scale cluster management at Google with Borg . In Proceedings of the Tenth European Conference on Computer Systems (EuroSys '15), pages 1--17. ACM, 2015. doi:10.1145/2741948.2741964

work page doi:10.1145/2741948.2741964 2015
[41]

Vishnyakova

Vera V. Vishnyakova. Context engineering: From prompts to corporate multi-agent architecture. arXiv preprint arXiv:2603.09619, 2026

work page arXiv 2026
[42]

doi: 10.1007/b106715

Vladimir Vovk, Alex Gammerman, and Glenn Shafer. Algorithmic Learning in a Random World. Springer, 2005. doi:10.1007/b106715

work page doi:10.1007/b106715 2005