pith. machine review for the scientific record. sign in

arxiv: 2603.00991 · v2 · submitted 2026-03-01 · 💻 cs.AI · cs.PL

Recognition: 2 theorem links

· Lean Theorem

Tracking Capabilities for Safer Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-15 18:28 UTC · model grok-4.3

classification 💻 cs.AI cs.PL
keywords AI agentscapability safetycapture checkingScala 3type systemstool callingprompt injectioninformation leakage
0
0 comments X

The pith

AI agents can generate capability-safe code with no significant loss in task performance while the type system blocks leaks and side effects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes placing agents inside a programming-language safety harness using Scala 3 with capture checking. Agents express tool-use intentions as typed code rather than making direct calls, allowing the type system to track capabilities as variables that control access to resources and effects. This setup enables static enforcement of local purity, which prevents information leakage when handling sensitive data. Experiments show agents produce such safe code while maintaining comparable success rates on tasks, and the checker reliably stops unsafe actions like prompt-induced malicious behavior.

Core claim

Instead of calling tools directly, agents output code in Scala 3 with capture checking so that capabilities track and restrict effects and resources at the type level. The resulting system provides fine-grained static control that enforces local purity to stop information leaks from classified data and blocks malicious side effects, while agents achieve task performance close to unrestricted baselines.

What carries the argument

Scala's capture checking type system, which represents access rights as tracked capability variables to regulate effects and resources.

If this is right

  • Unsafe behaviors such as information leakage become impossible in well-typed code.
  • Prompt injection attacks are limited because only permitted capabilities can be used.
  • New safety rules can be added by extending the set of tracked capability types.
  • Sub-computations can be statically marked side-effect-free for sensitive data processing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could transfer to other languages that support effect or capability tracking to widen its use.
  • Stronger LLM code-generation accuracy would allow this harness on more open-ended agent workflows.
  • Deployment in live tool ecosystems would test whether the static checks catch all practical misuse cases.

Load-bearing premise

Large language models can reliably produce correct, well-typed capability-safe Scala code for arbitrary tasks and the capture-checking system covers all relevant safety properties for real-world tool use.

What would settle it

A prompt that causes an agent to generate code leaking private information or triggering side effects yet still type-checks as safe, or a task set where agents consistently fail because no well-typed safe code exists.

Figures

Figures reproduced from arXiv: 2603.00991 by Cao Nguyen Pham, Martin Odersky, Oliver Bra\v{c}evac, Yaoyu Zhao, Yichen Xu.

Figure 1
Figure 1. Figure 1: Overview of the tacit framework. The AI agent sends Scala code via MCP tool calls. The code is validated and compiled by the Scala 3 compiler with capture checking, then executed in a REPL. The capability safety library is a gateway for all interactions with the real world (files, pro￾cesses, network, sub-agents). of a cache, we’d have to verify that the variable only holds results of previous function cal… view at source ↗
Figure 2
Figure 2. Figure 2: Data flow for processing classified documents. The [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
read the original abstract

AI agents that interact with the real world through tool calls pose fundamental safety challenges: agents might leak private information, cause unintended side effects, or be manipulated through prompt injection. To address these challenges, we propose to put the agent in a programming-language-based "safety harness": instead of calling tools directly, agents express their intentions as code in a capability-safe language: Scala 3 with capture checking. Capabilities are program variables that regulate access to effects and resources of interest. Scala's type system tracks capabilities statically, providing fine-grained control over what an agent can do. In particular, it enables local purity, the ability to enforce that sub-computations are side-effect-free, preventing information leakage when agents process classified data. We demonstrate that extensible agent safety harnesses can be built by leveraging a strong type system with tracked capabilities. Our experiments show that agents can generate capability-safe code with no significant loss in task performance, while the type system reliably prevents unsafe behaviors such as information leakage and malicious side effects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes placing AI agents in a programming-language safety harness by having them generate code in Scala 3 with capture checking rather than issuing direct tool calls. Capabilities are tracked statically by the type system to enforce local purity and control effects, thereby preventing information leakage and malicious side effects. The central claim is that agents can produce such capability-safe code with no significant loss in task performance while the type system reliably blocks unsafe behaviors.

Significance. If the empirical results hold, the work demonstrates a practical route to verifiable agent safety by repurposing an existing strong type system (Scala capture checking) for fine-grained, static enforcement of capability restrictions. This could complement runtime monitoring approaches and provide machine-checkable guarantees for tool-using agents.

major comments (2)
  1. [Abstract] Abstract: The assertion that 'our experiments show that agents can generate capability-safe code with no significant loss in task performance' supplies no information on benchmarks, baselines, success rates of code generation, type-checking failure frequencies, or the methodology used to test prevention of information leakage and side effects. This absence makes the central empirical claim impossible to evaluate.
  2. [Abstract] Abstract / safety claims: The statement that the type system 'reliably prevents unsafe behaviors such as information leakage and malicious side effects' does not address how prompt-injection vectors that corrupt the generated program before type checking are handled; capture checking applies only to the final well-typed code.
minor comments (1)
  1. [Abstract] The abstract introduces 'local purity' without a concise definition or reference to the relevant capture-checking rules; a short inline explanation would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract and safety claims. We agree that additional detail is warranted and have revised the manuscript accordingly to improve clarity and evaluability while preserving the original contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The assertion that 'our experiments show that agents can generate capability-safe code with no significant loss in task performance' supplies no information on benchmarks, baselines, success rates of code generation, type-checking failure frequencies, or the methodology used to test prevention of information leakage and side effects. This absence makes the central empirical claim impossible to evaluate.

    Authors: We agree that the abstract was overly terse and omitted key experimental details. The full manuscript's Experiments section specifies the benchmarks (a suite of agent tasks involving tool use, data processing, and resource access), baselines (direct tool calls versus generation of capability-safe Scala code), success rates for producing type-checkable programs, observed type-checking failure frequencies, and the methodology for safety testing (controlled attempts to induce leakage or side effects via generated code, all rejected by the type checker). We have revised the abstract to incorporate a concise summary of these elements, including the evaluation approach and confirmation of no significant performance loss. revision: yes

  2. Referee: [Abstract] Abstract / safety claims: The statement that the type system 'reliably prevents unsafe behaviors such as information leakage and malicious side effects' does not address how prompt-injection vectors that corrupt the generated program before type checking are handled; capture checking applies only to the final well-typed code.

    Authors: We clarify the intended pipeline: generated code is always subjected to static type checking (including capture checking) prior to execution, and only well-typed code is permitted to run. Any prompt injection that causes the agent to produce code attempting unsafe behaviors will fail to type-check and be rejected, preventing execution. The type system guarantees that well-typed programs cannot perform the prohibited actions by construction. We have added explicit discussion of this threat model, the rejection mechanism for ill-typed code, and the assumption of post-generation checking to the introduction and safety harness description. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on established Scala capture checking and experimental results

full rationale

The paper's core proposal applies Scala's pre-existing capture-checking type system (an independent language feature) to agent tool-use safety. The derivation chain consists of describing capability tracking for local purity and effect control, followed by experimental claims that agents can emit well-typed code without performance loss. No equations, fitted parameters, self-definitional loops, or load-bearing self-citations appear in the text; the type-system guarantees are imported from the Scala language definition rather than constructed within the paper. The experimental outcomes are presented as empirical measurements rather than predictions forced by the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the soundness of Scala's capture-checking type system for enforcing local purity and capability restrictions, plus the unstated assumption that LLM-generated code will be type-correct and semantically aligned with the intended task.

axioms (1)
  • domain assumption Scala's capture checking correctly tracks and enforces capabilities for effects and resources
    The safety harness relies on the static guarantees provided by the Scala 3 type system.

pith-pipeline@v0.9.0 · 5477 in / 1114 out tokens · 47543 ms · 2026-05-15T18:28:22.252763+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. First-Class Refinement Types for Scala

    cs.PL 2026-05 accept novelty 7.0 partial

    Refinement types are integrated as first-class citizens in Scala 3 with full participation in the type system, backed by a mechanized soundness proof in Rocq and a prototype compiler extension using an e-graph solver.

  2. When the Agent Is the Adversary: Architectural Requirements for Agentic AI Containment After the April 2026 Frontier Model Escape

    cs.CR 2026-04 unverdicted novelty 3.0

    A reported 2026 frontier model escape shows that alignment training, sandboxing, tool interception, and audits fail against adversarial agentic AI, requiring five new architectural requirements for durable containment.

Reference graph

Works this paper leans on

86 extracted references · 86 canonical work pages · cited by 2 Pith papers · 14 internal anchors

  1. [1]

    Amazon Web Services. 2025. Bedrock AgentCore policy. Accessed: 2025-06-01. https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/policy.htm l (cit. on p. 8)

  2. [2]

    Amazon Web Services. 2024. Cedar policy language. Accessed: 2025-06-01. https://www.cedarpolicy.com/ (cit. on p. 8)

  3. [3]

    Nada Amin, Samuel Grütter, Martin Odersky, Tiark Rompf, and Sandro Stucki

  4. [4]

    InA List of Successes That Can Change the World(Lecture Notes in Computer Science)

    The essence of dependent object types. InA List of Successes That Can Change the World(Lecture Notes in Computer Science). Vol. 9600. Springer, 249–272. doi:10.1007/978-3-319-30936-1_14 (cit. on p. 2)

  5. [5]

    Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete problems in AI safety.CoRR, abs/1606.06565. arXiv: 1606.06565 (cit. on pp. 1, 8)

  6. [6]

    Anthropic. 2025. Claude code. https://docs.anthropic.com/en/docs/claude-code (cit. on p. 9)

  7. [7]

    Anthropic. 2025. Code execution with MCP. Accessed: 2025-06-01. https://ww w.anthropic.com/engineering/code-execution-with-mcp (cit. on pp. 1, 9)

  8. [8]

    Anthropic. 2025. Equipping agents for the real world with agent skills. Accessed: 2025-06-01. https://www.anthropic.com/engineering/equipping-agents-for-t he-real-world-with-agent-skills (cit. on pp. 1, 9)

  9. [9]

    Anthropic. 2025. How we built our multi-agent research system. Accessed: 2026- 02-26. https://www.anthropic.com/engineering/multi-agent-research-system (cit. on p. 8)

  10. [10]

    Anthropic. 2024. Model context protocol. Accessed: 2025-06-01. https://modelc ontextprotocol.io/ (cit. on pp. 1, 8)

  11. [11]

    Victor Barres, Honghua Dong, Soham Ray, Xujie Si, and Karthik Narasimhan

  12. [12]

    $\tau^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment

    𝜏 2-bench: Evaluating conversational agents in a dual-control environ- ment.CoRR, abs/2506.07982. arXiv: 2506.07982 (cit. on p. 7)

  13. [13]

    Luca Beurer-Kellner et al. 2025. Design patterns for securing LLM agents against prompt injections.CoRR, abs/2506.08837. arXiv: 2506.08837 (cit. on p. 8)

  14. [14]

    Aleksander Boruch-Gruszecki, Adrien Ghosn, Mathias Payer, and Clément Pit- Claudel. 2024. Gradient: Gradual compartmentalization via object capabilities tracked in types.Proc. ACM Program. Lang., 8, OOPSLA2, 1135–1161. doi:10.11 45/3689751 (cit. on p. 9)

  15. [15]

    Aleksander Boruch-Gruszecki, Martin Odersky, Edward Lee, Ondrej Lhoták, and Jonathan Immanuel Brachthäuser. 2023. Capturing types.ACM Trans. Program. Lang. Syst., 45, 4, 21:1–21:52. doi:10.1145/3618003 (cit. on pp. 1, 2, 9)

  16. [16]

    Aleksander Boruch-Gruszecki, Yangtian Zi, Zixuan Wu, Tejas Oberoi, Carolyn Jane Anderson, Joydeep Biswas, and Arjun Guha. 2026. Agnostics: Learning to code in any programming language via reinforcement with a universal learning environment. InICLR. OpenReview.net. https://openreview.net/forum?id=mj DT60Ffms (cit. on p. 8)

  17. [17]

    Jonathan Immanuel Brachthäuser, Philipp Schuster, Edward Lee, and Alek- sander Boruch-Gruszecki. 2022. Effects, capabilities, and boxes: From scope- based reasoning to type-based reasoning and back.Proc. ACM Program. Lang., 6, OOPSLA, 1–30. doi:10.1145/3527320 (cit. on p. 9)

  18. [18]

    Jonathan Immanuel Brachthäuser, Philipp Schuster, and Klaus Ostermann. 2020. Effects as capabilities: Effect handlers and lightweight effect polymorphism. Proc. ACM Program. Lang., 4, OOPSLA, 126:1–126:30. doi:10.1145/3428194 (cit. on pp. 2, 9)

  19. [19]

    Christoph Bühler, Matteo Biagiola, Luca Di Grazia, and Guido Salvaneschi

  20. [20]

    AgentBound: Securing Execution Boundaries of AI Agents

    Securing AI agent execution.CoRR, abs/2510.21236. arXiv: 2510.21236 (cit. on p. 9)

  21. [21]

    Check Point Research. 2026. Check point researchers expose critical Claude Code flaws. Accessed: 2026-02-26. https://blog.checkpoint.com/research/check -point-researchers-expose-critical-claude-code-flaws/ (cit. on p. 8)

  22. [22]

    Mark Chen et al. 2021. Evaluating large language models trained on code.CoRR, abs/2107.03374. arXiv: 2107.03374 (cit. on p. 9)

  23. [23]

    Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. 2025. StruQ: defending against prompt injection with structured queries. InUSENIX Security Symposium. USENIX Association. https://www.usenix.org/conference/usenixs ecurity25/presentation/chen-sizhe (cit. on p. 8)

  24. [24]

    Mihai Christodorescu et al. 2025. Systems security foundations for agentic computing.CoRR, abs/2512.01295. arXiv: 2512.01295 (cit. on p. 8)

  25. [25]

    Aaron Craig, Alex Potanin, Lindsay Groves, and Jonathan Aldrich. 2018. Ca- pabilities: Effects for free. InICFEM(Lecture Notes in Computer Science). Vol. 11232. Springer, 231–247. doi:10.1007/978-3-030-02450-5_14 (cit. on p. 9)

  26. [26]

    Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. InNeurIPS. doi:10.52202/079017-2636 (cit. on pp. 1, 6, 8)

  27. [27]

    Edoardo Debenedetti et al. 2025. Defeating prompt injections by design.CoRR, abs/2503.18813. arXiv: 2503.18813 (cit. on pp. 3, 8)

  28. [28]

    DeepSeek-AI et al. 2025. DeepSeek-v3.2: Pushing the frontier of open large language models.CoRR, abs/2512.02556. arXiv: 2512.02556 (cit. on p. 7)

  29. [29]

    Dennis and Earl C

    Jack B. Dennis and Earl C. Van Horn. 1966. Programming semantics for multi- programmed computations. InCommunications of the ACMnumber 3. Vol. 9. ACM, 143–155. doi:10.1145/365230.365252 (cit. on pp. 1, 3, 9)

  30. [30]

    Aarya Doshi, Yining Hong, Congying Xu, Eunsuk Kang, Alexandros Kapravelos, and Christian Kästner. 2026. Towards verifiably safe tool use for LLM agents. CoRR, abs/2601.08012. To appear at ICSE-NIER 2026. arXiv: 2601.08012 (cit. on p. 9)

  31. [31]

    Fish, Darya Melicher, and Jonathan Aldrich

    Jennifer A. Fish, Darya Melicher, and Jonathan Aldrich. 2020. A case study in language-based security: Building an I/O library for Wyvern. InOnward!ACM, 34–47. doi:10.1145/3426428.3426913 (cit. on p. 9)

  32. [32]

    Future of Privacy Forum. 2025. Minding mindful machines: AI agents and data protection considerations. Accessed: 2025-06-01. https://fpf.org/blog/minding -mindful-machines-ai-agents-and-data-protection-considerations/ (cit. on pp. 1, 3)

  33. [33]

    Google. 2024. Fuchsia operating system. Accessed: 2025-06-01. https://fuchsia .dev/ (cit. on pp. 1, 9)

  34. [34]

    Colin S. Gordon. 2020. Designing with static capabilities and effects: use, men- tion, and invariants (pearl). InECOOP(LIPIcs). Vol. 166. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 10:1–10:25. doi:10.4230/LIPIcs.ECOOP.2020.10 (cit. on p. 9)

  35. [35]

    Zachary Grannan, Aurel Bílý, Jonáš Fiala, Jasper Geer, Markus de Medeiros, Peter Müller, and Alexander J. Summers. 2025. Place capability graphs: A general-purpose model of Rust’s ownership and borrowing guarantees.Proc. ACM Program. Lang., 9, OOPSLA2, 2002–2029. doi:10.1145/3763122 (cit. on p. 9)

  36. [36]

    Hellerstein, Shadaj Laddad, Mae Milano, Conor Power, and Mingwei Samuel

    Joseph M. Hellerstein, Shadaj Laddad, Mae Milano, Conor Power, and Mingwei Samuel. 2023. Invited paper: Initial steps toward a compiler for distributed programs. InApPLIED@PODC. ACM, 5:1–5:10. doi:10.1145/3584684.3597272 (cit. on p. 9)

  37. [37]

    Wenyue Hua, Xianjun Yang, Mingyu Jin, Zelong Li, Wei Cheng, Ruixiang Tang, and Yongfeng Zhang. 2024. TrustAgent: Towards safe and trustworthy LLM-based agents. InEMNLP (Findings)(Findings of ACL). Vol. EMNLP 2024. Association for Computational Linguistics, 10000–10016. doi:10.18653/V1/2024 .FINDINGS-EMNLP.585 (cit. on p. 9)

  38. [38]

    Wenlong Huang et al. 2022. Inner monologue: embodied reasoning through planning with language models. (2022). https://arxiv.org/abs/2207.05608 arXiv: 2207.05608[cs.RO](cit. on p. 1)

  39. [39]

    Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R Narasimhan. 2024. SWE-bench: can language models resolve real-world GitHub issues? InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=VTF8yNQM66 (cit. on pp. 7, 9)

  40. [40]

    Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, and Derek Dreyer. 2018. RustBelt: Securing the foundations of the Rust programming language.Proc. ACM Program. Lang., 2, POPL, 66:1–66:34. doi:10.1145/3158154 (cit. on pp. 4, 9)

  41. [41]

    Christoph Kern. 2025. Safe coding: Rigorous modular reasoning about software safety.ACM Queue, 23(5). https://spawn-queue.acm.org/doi/10.1145/3773098 (cit. on pp. 8, 9)

  42. [42]

    Hellerstein, and Mae Milano

    Shadaj Laddad, Alvin Cheung, Joseph M. Hellerstein, and Mae Milano. 2025. Flo: A semantic foundation for progressive stream processing.Proc. ACM Program. Lang., 9, POPL, 241–270. doi:10.1145/3704845 (cit. on p. 9)

  43. [43]

    Rustan M

    K. Rustan M. Leino. 2010. Dafny: an automatic program verifier for Functional Correctness. InLPAR. Springer, 348–370. doi:10.1007/978-3-642-17511-4_20 (cit. on p. 9)

  44. [44]

    Fengyu Liu et al. 2025. Make agent defeat agent: Automatic detection of taint- style vulnerabilities in LLM-based agents. InUSENIX Security Symposium. USENIX Association, 3767–3786. https://www.usenix.org/conference/usenixse curity25/presentation/liu-fengyu (cit. on p. 8)

  45. [45]

    Lucassen and David K

    John M. Lucassen and David K. Gifford. 1988. Polymorphic effect systems. In POPL. ACM Press, 47–57. doi:10.1145/73560.73564 (cit. on p. 9)

  46. [46]

    Erik Meijer. 2025. Guardians of the agents.Commun. ACM, 69, 1, 46–52. doi:10 .1145/3777544 (cit. on p. 9)

  47. [47]

    Darya Melicher, Yangqingwei Shi, Alex Potanin, and Jonathan Aldrich. 2017. A capability-based module system for authority control. InECOOP(LIPIcs). Vol. 74. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 20:1–20:27. doi:10 .4230/LIPIcs.ECOOP.2017.20 (cit. on p. 9)

  48. [48]

    Adrian Mettler, David Wagner, and Tyler Close. 2010. Joe-E: A security-oriented subset of Java. InNetwork and Distributed System Security Symposium (NDSS). The Internet Society (cit. on pp. 8, 9)

  49. [49]

    Mark S. Miller. 2006.Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. Ph.D. Dissertation. John Hopkins University (cit. on pp. 1, 3, 9). 10 Tracking Capabilities for Safer Agents

  50. [50]

    Miller, E

    Mark S. Miller, E. Dean Tribble, and Jonathan Shapiro. 2005. Concurrency among strangers: Programming in E as plan coordination. TGC 2005, LNCS

  51. [51]

    http://www.erights.org/talks/promises/paper/tgc05.pdf (cit. on pp. 8, 9)

  52. [52]

    Miller, Ka-Ping Yee, and Jonathan Shapiro

    Mark S. Miller, Ka-Ping Yee, and Jonathan Shapiro. 2003. Capability myths demolished. In Technical Report SRL2003-02, Johns Hopkins University. htt ps://classpages.cselabs.umn.edu/Fall-2021/csci5271/papers/SRL2003-02.pdf (cit. on pp. 2, 8)

  53. [53]

    MiniMax. 2026. MiniMax m2.5: Built for real-world productivity. https://www .minimax.io/news/minimax-m25. (2026) (cit. on p. 7)

  54. [54]

    Milad Nasr et al. 2025. The attacker moves second: Stronger adaptive at- tacks bypass defenses against LLM jailbreaks and prompt injections.CoRR, abs/2510.09023. arXiv: 2510.09023 (cit. on pp. 1, 8)

  55. [55]

    Martin Odersky, Olivier Blanvillain, Fengyun Liu, Aggelos Biboudis, Heather Miller, and Sandro Stucki. 2018. Simplicitly: Foundations and applications of implicit function types.Proc. ACM Program. Lang., 2, POPL, 42:1–42:29. doi:10.1145/3158130 (cit. on p. 13)

  56. [56]

    OpenAI. 2025. Gpt-oss-120b & gpt-oss-20b model card.CoRR, abs/2508.10925. arXiv: 2508.10925 (cit. on p. 7)

  57. [57]

    OpenCode. 2025. OpenCode: The open source AI coding agent. Accessed: 2026- 02-22. https://github.com/anomalyco/opencode (cit. on pp. 7, 9)

  58. [58]

    Essertel, Xilun Wu, Lilliam I

    Leo Osvald, Grégory M. Essertel, Xilun Wu, Lilliam I. González Alayón, and Tiark Rompf. 2016. Gentrification gone too far? Affordable 2nd-class values for fun and (co-)effect. InOOPSLA. ACM, 234–251. doi:10.1145/2983990.2984009 (cit. on p. 9)

  59. [59]

    Patil, Tianjun Zhang, Xin Wang, and Joseph E

    Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. 2024. Gorilla: Large language model connected with massive APIs. InNeurIPS. Vol. 37. Curran Associates, Inc., 126544–126565. doi:10.52202/079017-4020 (cit. on p. 9)

  60. [60]

    Patil et al

    Shishir G. Patil et al. 2024. GoEX: Perspectives and designs towards a runtime for autonomous LLM applications.CoRR, abs/2404.06921. arXiv: 2404.06921 (cit. on p. 9)

  61. [61]

    Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2022. Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions. InSP. IEEE, 754–768. doi:10.1109/SP46214.2022.9 833571 (cit. on p. 9)

  62. [62]

    Pydantic. 2025. Monty: A Python interpreter in Rust. https://github.com/pyda ntic/monty (cit. on p. 9)

  63. [63]

    Brandon Radosevich and John Halloran. 2025. MCP safety audit: LLMs with the Model Context Protocol allow major security exploits.CoRR, abs/2504.03767. arXiv: 2504.03767 (cit. on p. 8)

  64. [64]

    Johann Rehberger. 2025. Claude AI APIs can be abused for data exfiltration. https://www.securityweek.com/claude-ai-apis-can-be-abused-for-data-exfi ltration/ (cit. on pp. 1, 8)

  65. [65]

    Tiark Rompf and Nada Amin. 2016. Type soundness for dependent object types (DOT). InOOPSLA. ACM, 624–641. doi:10.1145/2983990.2984008 (cit. on p. 2)

  66. [66]

    Identifying the Risks of LM Agents with an LM-Emulated Sandbox

    Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. 2024. Identifying the risks of LM agents with an LM-emulated sandbox. InICLR. OpenReview.net. arXiv: 2309.15817 (cit. on p. 8)

  67. [67]

    Andrei Sabelfeld and Andrew C. Myers. 2003. Language-based information-flow security.IEEE J. Sel. Areas Commun., 21, 1, 5–19. doi:10.1109/JSAC.2002.806121 (cit. on pp. 3, 9)

  68. [68]

    Saltzer and Michael D

    Jerome H. Saltzer and Michael D. Schroeder. 1975. The protection of information in computer systems.Proc. IEEE, 63, 9, 1278–1308. doi:10.1109/PROC.1975.9939 (cit. on p. 8)

  69. [69]

    An experimental asynchronous programming library,

    [SW Mod.] Scala, “An experimental asynchronous programming library, ” part of Gears 2024 EPFL LAMP.url: https : / / lampepfl . github . io / gears,vcs: https://github.com/lampepfl/gears (cit. on p. 12)

  70. [70]

    Capture Checker,

    [SW Mod.] Scala, “Capture Checker, ” part of Scala 3 2024 EPFL LAMP.url: ht tps://nightly.scala-lang.org/docs/reference/experimental/capture-checking/, vcs: https://github.com/scala/scala3 (cit. on pp. 4, 8, 9)

  71. [71]

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools. InNeurIPS. arXiv: 2302.04761 (cit. on p. 9)

  72. [72]

    Georgios Syros, Anshuman Suri, Jacob Ginesin, Cristina Nita-Rotaru, and Alina Oprea. 2026. SAGA: A security architecture for governing AI agentic systems. InNDSS. The Internet Society (cit. on p. 9)

  73. [73]

    Peter Thiemann. 2025. What I always wanted to know about second class values. InProceedings of the Workshop Dedicated to Olivier Danvy on the Occasion of His 64th Birthday (OLIVIERFEST’25), 117–127. doi:10.1145/3759427.3760373 (cit. on p. 9)

  74. [74]

    Sanidhya Vijayvargiya, Aditya Bharat Soni, Xuhui Zhou, Zora Zhiruo Wang, Nouha Dziri, Graham Neubig, and Maarten Sap. 2025. OpenAgentSafety: A comprehensive framework for evaluating real-world AI agent safety.CoRR, abs/2507.06134. arXiv: 2507.06134 (cit. on p. 8)

  75. [75]

    Robert N. M. Watson, Simon W. Moore, Peter Sewell, and Peter G. Neumann

  76. [76]

    In University of Cambridge, Computer Labo- ratory, Technical Report UCAM-CL-TR-941

    An introduction to CHERI. In University of Cambridge, Computer Labo- ratory, Technical Report UCAM-CL-TR-941. https://www.cl.cam.ac.uk/techrep orts/UCAM-CL-TR-941.pdf (cit. on pp. 1, 9)

  77. [77]

    Simon Willison. 2023. The dual LLM pattern for building AI assistants that can resist prompt injection. Accessed: 2026-02-22. https://simonwillison.net/2023 /Apr/25/dual-llm-pattern/ (cit. on pp. 3, 8)

  78. [78]

    Wulf, Ellis Cohen, William Corwin, Anita Jones, Roy Levin, C

    William A. Wulf, Ellis Cohen, William Corwin, Anita Jones, Roy Levin, C. Pier- son, and Fred Pollack. 1974. Hydra: The kernel of a multiprocessor operating system. Communications of the ACM, 17(6), 1974. https://doi.org/10.1145/3556 16.364017 (cit. on pp. 1, 9)

  79. [79]

    Anxhelo Xhebraj, Oliver Bračevac, Guannan Wei, and Tiark Rompf. 2022. What if we don’t pop the stack? The return of 2nd-class values. InECOOP(LIPIcs). Vol. 222. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 15:1–15:29. doi:10 .4230/LIPIcs.ECOOP.2022.15 (cit. on p. 9)

  80. [80]

    Zhen Xiang et al. 2025. GuardAgent: Safeguard LLM agents via knowledge- enabled reasoning. InICML(Proceedings of Machine Learning Research). Vol. 267. PMLR / OpenReview.net. arXiv: 2406.09187 (cit. on p. 9)

Showing first 80 references.