arxiv: 2603.00991 · v2 · submitted 2026-03-01 · 💻 cs.AI · cs.PL

Recognition: 2 theorem links

· Lean Theorem

Tracking Capabilities for Safer Agents

Martin Odersky , Yaoyu Zhao , Yichen Xu , Oliver Bra\v{c}evac , Cao Nguyen Pham

Authors on Pith no claims yet

Pith reviewed 2026-05-15 18:28 UTC · model grok-4.3

classification 💻 cs.AI cs.PL

keywords AI agentscapability safetycapture checkingScala 3type systemstool callingprompt injectioninformation leakage

0 comments

The pith

AI agents can generate capability-safe code with no significant loss in task performance while the type system blocks leaks and side effects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes placing agents inside a programming-language safety harness using Scala 3 with capture checking. Agents express tool-use intentions as typed code rather than making direct calls, allowing the type system to track capabilities as variables that control access to resources and effects. This setup enables static enforcement of local purity, which prevents information leakage when handling sensitive data. Experiments show agents produce such safe code while maintaining comparable success rates on tasks, and the checker reliably stops unsafe actions like prompt-induced malicious behavior.

Core claim

Instead of calling tools directly, agents output code in Scala 3 with capture checking so that capabilities track and restrict effects and resources at the type level. The resulting system provides fine-grained static control that enforces local purity to stop information leaks from classified data and blocks malicious side effects, while agents achieve task performance close to unrestricted baselines.

What carries the argument

Scala's capture checking type system, which represents access rights as tracked capability variables to regulate effects and resources.

If this is right

Unsafe behaviors such as information leakage become impossible in well-typed code.
Prompt injection attacks are limited because only permitted capabilities can be used.
New safety rules can be added by extending the set of tracked capability types.
Sub-computations can be statically marked side-effect-free for sensitive data processing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could transfer to other languages that support effect or capability tracking to widen its use.
Stronger LLM code-generation accuracy would allow this harness on more open-ended agent workflows.
Deployment in live tool ecosystems would test whether the static checks catch all practical misuse cases.

Load-bearing premise

Large language models can reliably produce correct, well-typed capability-safe Scala code for arbitrary tasks and the capture-checking system covers all relevant safety properties for real-world tool use.

What would settle it

A prompt that causes an agent to generate code leaking private information or triggering side effects yet still type-checks as safe, or a task set where agents consistently fail because no well-typed safe code exists.

Figures

Figures reproduced from arXiv: 2603.00991 by Cao Nguyen Pham, Martin Odersky, Oliver Bra\v{c}evac, Yaoyu Zhao, Yichen Xu.

**Figure 1.** Figure 1: Overview of the tacit framework. The AI agent sends Scala code via MCP tool calls. The code is validated and compiled by the Scala 3 compiler with capture checking, then executed in a REPL. The capability safety library is a gateway for all interactions with the real world (files, processes, network, sub-agents). of a cache, we’d have to verify that the variable only holds results of previous function cal… view at source ↗

**Figure 2.** Figure 2: Data flow for processing classified documents. The [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

read the original abstract

AI agents that interact with the real world through tool calls pose fundamental safety challenges: agents might leak private information, cause unintended side effects, or be manipulated through prompt injection. To address these challenges, we propose to put the agent in a programming-language-based "safety harness": instead of calling tools directly, agents express their intentions as code in a capability-safe language: Scala 3 with capture checking. Capabilities are program variables that regulate access to effects and resources of interest. Scala's type system tracks capabilities statically, providing fine-grained control over what an agent can do. In particular, it enables local purity, the ability to enforce that sub-computations are side-effect-free, preventing information leakage when agents process classified data. We demonstrate that extensible agent safety harnesses can be built by leveraging a strong type system with tracked capabilities. Our experiments show that agents can generate capability-safe code with no significant loss in task performance, while the type system reliably prevents unsafe behaviors such as information leakage and malicious side effects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows how Scala's capture checking can wrap LLM agents in a static safety harness by making them emit capability-restricted code, but the experiments supply no numbers on generation success or failure modes.

read the letter

The main point is that agents are asked to produce Scala 3 code instead of calling tools directly, and the capture checker statically tracks which capabilities the code can use. This gives local purity guarantees that block information leaks and unwanted side effects without needing runtime monitors for every operation. The approach reuses an existing type-system feature rather than inventing new runtime machinery, which keeps the overhead low if the generated code type-checks cleanly. That part is technically coherent and worth looking at for anyone already working with effect systems or capability-based security. The experiments claim that task performance stays roughly the same while unsafe behaviors are reliably prevented, yet the text gives no success rates for code generation, no count of type errors or retries, and no breakdown of how the unsafe test cases were constructed. Without those figures it is difficult to know whether the harness works on realistic tasks or only on the narrow cases they tried. Capture checking also cannot stop prompt injection that corrupts the prompt before any code is emitted. The paper is aimed at researchers who want language-level controls on agent effects rather than purely prompt-based or sandbox-based methods. It is coherent enough on its own terms to deserve a full referee report, mainly so the empirical gaps can be clarified.

Referee Report

2 major / 1 minor

Summary. The paper proposes placing AI agents in a programming-language safety harness by having them generate code in Scala 3 with capture checking rather than issuing direct tool calls. Capabilities are tracked statically by the type system to enforce local purity and control effects, thereby preventing information leakage and malicious side effects. The central claim is that agents can produce such capability-safe code with no significant loss in task performance while the type system reliably blocks unsafe behaviors.

Significance. If the empirical results hold, the work demonstrates a practical route to verifiable agent safety by repurposing an existing strong type system (Scala capture checking) for fine-grained, static enforcement of capability restrictions. This could complement runtime monitoring approaches and provide machine-checkable guarantees for tool-using agents.

major comments (2)

[Abstract] Abstract: The assertion that 'our experiments show that agents can generate capability-safe code with no significant loss in task performance' supplies no information on benchmarks, baselines, success rates of code generation, type-checking failure frequencies, or the methodology used to test prevention of information leakage and side effects. This absence makes the central empirical claim impossible to evaluate.
[Abstract] Abstract / safety claims: The statement that the type system 'reliably prevents unsafe behaviors such as information leakage and malicious side effects' does not address how prompt-injection vectors that corrupt the generated program before type checking are handled; capture checking applies only to the final well-typed code.

minor comments (1)

[Abstract] The abstract introduces 'local purity' without a concise definition or reference to the relevant capture-checking rules; a short inline explanation would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract and safety claims. We agree that additional detail is warranted and have revised the manuscript accordingly to improve clarity and evaluability while preserving the original contributions.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that 'our experiments show that agents can generate capability-safe code with no significant loss in task performance' supplies no information on benchmarks, baselines, success rates of code generation, type-checking failure frequencies, or the methodology used to test prevention of information leakage and side effects. This absence makes the central empirical claim impossible to evaluate.

Authors: We agree that the abstract was overly terse and omitted key experimental details. The full manuscript's Experiments section specifies the benchmarks (a suite of agent tasks involving tool use, data processing, and resource access), baselines (direct tool calls versus generation of capability-safe Scala code), success rates for producing type-checkable programs, observed type-checking failure frequencies, and the methodology for safety testing (controlled attempts to induce leakage or side effects via generated code, all rejected by the type checker). We have revised the abstract to incorporate a concise summary of these elements, including the evaluation approach and confirmation of no significant performance loss. revision: yes
Referee: [Abstract] Abstract / safety claims: The statement that the type system 'reliably prevents unsafe behaviors such as information leakage and malicious side effects' does not address how prompt-injection vectors that corrupt the generated program before type checking are handled; capture checking applies only to the final well-typed code.

Authors: We clarify the intended pipeline: generated code is always subjected to static type checking (including capture checking) prior to execution, and only well-typed code is permitted to run. Any prompt injection that causes the agent to produce code attempting unsafe behaviors will fail to type-check and be rejected, preventing execution. The type system guarantees that well-typed programs cannot perform the prohibited actions by construction. We have added explicit discussion of this threat model, the rejection mechanism for ill-typed code, and the assumption of post-generation checking to the introduction and safety harness description. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on established Scala capture checking and experimental results

full rationale

The paper's core proposal applies Scala's pre-existing capture-checking type system (an independent language feature) to agent tool-use safety. The derivation chain consists of describing capability tracking for local purity and effect control, followed by experimental claims that agents can emit well-typed code without performance loss. No equations, fitted parameters, self-definitional loops, or load-bearing self-citations appear in the text; the type-system guarantees are imported from the Scala language definition rather than constructed within the paper. The experimental outcomes are presented as empirical measurements rather than predictions forced by the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the soundness of Scala's capture-checking type system for enforcing local purity and capability restrictions, plus the unstated assumption that LLM-generated code will be type-correct and semantically aligned with the intended task.

axioms (1)

domain assumption Scala's capture checking correctly tracks and enforces capabilities for effects and resources
The safety harness relies on the static guarantees provided by the Scala 3 type system.

pith-pipeline@v0.9.0 · 5477 in / 1114 out tokens · 47543 ms · 2026-05-15T18:28:22.252763+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our key insight is that we can construct harnesses that combine safety with expressivity through the systematic use of tracked capabilities... local purity, the ability to enforce that sub-computations are side-effect-free
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Scala 3’s capture checking... enforces that code cannot forge access rights, cannot perform effects beyond its budget, and cannot leak information from pure sub-computations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

First-Class Refinement Types for Scala
cs.PL 2026-05 accept novelty 7.0 partial

Refinement types are integrated as first-class citizens in Scala 3 with full participation in the type system, backed by a mechanized soundness proof in Rocq and a prototype compiler extension using an e-graph solver.
When the Agent Is the Adversary: Architectural Requirements for Agentic AI Containment After the April 2026 Frontier Model Escape
cs.CR 2026-04 unverdicted novelty 3.0

A reported 2026 frontier model escape shows that alignment training, sandboxing, tool interception, and audits fail against adversarial agentic AI, requiring five new architectural requirements for durable containment.

Reference graph

Works this paper leans on

86 extracted references · 86 canonical work pages · cited by 2 Pith papers · 13 internal anchors

[1]

Amazon Web Services. 2025. Bedrock AgentCore policy. Accessed: 2025-06-01. https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/policy.htm l (cit. on p. 8)

work page 2025
[2]

Amazon Web Services. 2024. Cedar policy language. Accessed: 2025-06-01. https://www.cedarpolicy.com/ (cit. on p. 8)

work page 2024
[3]

Nada Amin, Samuel Grütter, Martin Odersky, Tiark Rompf, and Sandro Stucki

work page
[4]

InA List of Successes That Can Change the World(Lecture Notes in Computer Science)

The essence of dependent object types. InA List of Successes That Can Change the World(Lecture Notes in Computer Science). Vol. 9600. Springer, 249–272. doi:10.1007/978-3-319-30936-1_14 (cit. on p. 2)

work page doi:10.1007/978-3-319-30936-1_14
[5]

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete problems in AI safety.CoRR, abs/1606.06565. arXiv: 1606.06565 (cit. on pp. 1, 8)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[6]

Anthropic. 2025. Claude code. https://docs.anthropic.com/en/docs/claude-code (cit. on p. 9)

work page 2025
[7]

Anthropic. 2025. Code execution with MCP. Accessed: 2025-06-01. https://ww w.anthropic.com/engineering/code-execution-with-mcp (cit. on pp. 1, 9)

work page 2025
[8]

Anthropic. 2025. Equipping agents for the real world with agent skills. Accessed: 2025-06-01. https://www.anthropic.com/engineering/equipping-agents-for-t he-real-world-with-agent-skills (cit. on pp. 1, 9)

work page 2025
[9]

Anthropic. 2025. How we built our multi-agent research system. Accessed: 2026- 02-26. https://www.anthropic.com/engineering/multi-agent-research-system (cit. on p. 8)

work page 2025
[10]

Anthropic. 2024. Model context protocol. Accessed: 2025-06-01. https://modelc ontextprotocol.io/ (cit. on pp. 1, 8)

work page 2024
[11]

Victor Barres, Honghua Dong, Soham Ray, Xujie Si, and Karthik Narasimhan

work page
[12]

$\tau^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment

𝜏 2-bench: Evaluating conversational agents in a dual-control environ- ment.CoRR, abs/2506.07982. arXiv: 2506.07982 (cit. on p. 7)

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Luca Beurer-Kellner et al. 2025. Design patterns for securing LLM agents against prompt injections.CoRR, abs/2506.08837. arXiv: 2506.08837 (cit. on p. 8)

work page arXiv 2025
[14]

Aleksander Boruch-Gruszecki, Adrien Ghosn, Mathias Payer, and Clément Pit- Claudel. 2024. Gradient: Gradual compartmentalization via object capabilities tracked in types.Proc. ACM Program. Lang., 8, OOPSLA2, 1135–1161. doi:10.11 45/3689751 (cit. on p. 9)

work page 2024
[15]

Aleksander Boruch-Gruszecki, Martin Odersky, Edward Lee, Ondrej Lhoták, and Jonathan Immanuel Brachthäuser. 2023. Capturing types.ACM Trans. Program. Lang. Syst., 45, 4, 21:1–21:52. doi:10.1145/3618003 (cit. on pp. 1, 2, 9)

work page doi:10.1145/3618003 2023
[16]

Aleksander Boruch-Gruszecki, Yangtian Zi, Zixuan Wu, Tejas Oberoi, Carolyn Jane Anderson, Joydeep Biswas, and Arjun Guha. 2026. Agnostics: Learning to code in any programming language via reinforcement with a universal learning environment. InICLR. OpenReview.net. https://openreview.net/forum?id=mj DT60Ffms (cit. on p. 8)

work page 2026
[17]

Jonathan Immanuel Brachthäuser, Philipp Schuster, Edward Lee, and Alek- sander Boruch-Gruszecki. 2022. Effects, capabilities, and boxes: From scope- based reasoning to type-based reasoning and back.Proc. ACM Program. Lang., 6, OOPSLA, 1–30. doi:10.1145/3527320 (cit. on p. 9)

work page doi:10.1145/3527320 2022
[18]

Jonathan Immanuel Brachthäuser, Philipp Schuster, and Klaus Ostermann. 2020. Effects as capabilities: Effect handlers and lightweight effect polymorphism. Proc. ACM Program. Lang., 4, OOPSLA, 126:1–126:30. doi:10.1145/3428194 (cit. on pp. 2, 9)

work page doi:10.1145/3428194 2020
[19]

Christoph Bühler, Matteo Biagiola, Luca Di Grazia, and Guido Salvaneschi

work page
[20]

AgentBound: Securing Execution Boundaries of AI Agents

Securing AI agent execution.CoRR, abs/2510.21236. arXiv: 2510.21236 (cit. on p. 9)

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Check Point Research. 2026. Check point researchers expose critical Claude Code flaws. Accessed: 2026-02-26. https://blog.checkpoint.com/research/check -point-researchers-expose-critical-claude-code-flaws/ (cit. on p. 8)

work page 2026
[22]

Mark Chen et al. 2021. Evaluating large language models trained on code.CoRR, abs/2107.03374. arXiv: 2107.03374 (cit. on p. 9)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[23]

Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. 2025. StruQ: defending against prompt injection with structured queries. InUSENIX Security Symposium. USENIX Association. https://www.usenix.org/conference/usenixs ecurity25/presentation/chen-sizhe (cit. on p. 8)

work page 2025
[24]

Mihai Christodorescu et al. 2025. Systems security foundations for agentic computing.CoRR, abs/2512.01295. arXiv: 2512.01295 (cit. on p. 8)

work page arXiv 2025
[25]

Aaron Craig, Alex Potanin, Lindsay Groves, and Jonathan Aldrich. 2018. Ca- pabilities: Effects for free. InICFEM(Lecture Notes in Computer Science). Vol. 11232. Springer, 231–247. doi:10.1007/978-3-030-02450-5_14 (cit. on p. 9)

work page doi:10.1007/978-3-030-02450-5_14 2018
[26]

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. InNeurIPS. doi:10.52202/079017-2636 (cit. on pp. 1, 6, 8)

work page doi:10.52202/079017-2636 2024
[27]

Edoardo Debenedetti et al. 2025. Defeating prompt injections by design.CoRR, abs/2503.18813. arXiv: 2503.18813 (cit. on pp. 3, 8)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

DeepSeek-AI et al. 2025. DeepSeek-v3.2: Pushing the frontier of open large language models.CoRR, abs/2512.02556. arXiv: 2512.02556 (cit. on p. 7)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[29]

Dennis and Earl C

Jack B. Dennis and Earl C. Van Horn. 1966. Programming semantics for multi- programmed computations. InCommunications of the ACMnumber 3. Vol. 9. ACM, 143–155. doi:10.1145/365230.365252 (cit. on pp. 1, 3, 9)

work page doi:10.1145/365230.365252 1966
[30]

Aarya Doshi, Yining Hong, Congying Xu, Eunsuk Kang, Alexandros Kapravelos, and Christian Kästner. 2026. Towards verifiably safe tool use for LLM agents. CoRR, abs/2601.08012. To appear at ICSE-NIER 2026. arXiv: 2601.08012 (cit. on p. 9)

work page arXiv 2026
[31]

Fish, Darya Melicher, and Jonathan Aldrich

Jennifer A. Fish, Darya Melicher, and Jonathan Aldrich. 2020. A case study in language-based security: Building an I/O library for Wyvern. InOnward!ACM, 34–47. doi:10.1145/3426428.3426913 (cit. on p. 9)

work page doi:10.1145/3426428.3426913 2020
[32]

Future of Privacy Forum. 2025. Minding mindful machines: AI agents and data protection considerations. Accessed: 2025-06-01. https://fpf.org/blog/minding -mindful-machines-ai-agents-and-data-protection-considerations/ (cit. on pp. 1, 3)

work page 2025
[33]

Google. 2024. Fuchsia operating system. Accessed: 2025-06-01. https://fuchsia .dev/ (cit. on pp. 1, 9)

work page 2024
[34]

Colin S. Gordon. 2020. Designing with static capabilities and effects: use, men- tion, and invariants (pearl). InECOOP(LIPIcs). Vol. 166. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 10:1–10:25. doi:10.4230/LIPIcs.ECOOP.2020.10 (cit. on p. 9)

work page doi:10.4230/lipics.ecoop.2020.10 2020
[35]

Zachary Grannan, Aurel Bílý, Jonáš Fiala, Jasper Geer, Markus de Medeiros, Peter Müller, and Alexander J. Summers. 2025. Place capability graphs: A general-purpose model of Rust’s ownership and borrowing guarantees.Proc. ACM Program. Lang., 9, OOPSLA2, 2002–2029. doi:10.1145/3763122 (cit. on p. 9)

work page doi:10.1145/3763122 2025
[36]

Hellerstein, Shadaj Laddad, Mae Milano, Conor Power, and Mingwei Samuel

Joseph M. Hellerstein, Shadaj Laddad, Mae Milano, Conor Power, and Mingwei Samuel. 2023. Invited paper: Initial steps toward a compiler for distributed programs. InApPLIED@PODC. ACM, 5:1–5:10. doi:10.1145/3584684.3597272 (cit. on p. 9)

work page doi:10.1145/3584684.3597272 2023
[37]

Wenyue Hua, Xianjun Yang, Mingyu Jin, Zelong Li, Wei Cheng, Ruixiang Tang, and Yongfeng Zhang. 2024. TrustAgent: Towards safe and trustworthy LLM-based agents. InEMNLP (Findings)(Findings of ACL). Vol. EMNLP 2024. Association for Computational Linguistics, 10000–10016. doi:10.18653/V1/2024 .FINDINGS-EMNLP.585 (cit. on p. 9)

work page doi:10.18653/v1/2024 2024
[38]

Wenlong Huang et al. 2022. Inner monologue: embodied reasoning through planning with language models. (2022). https://arxiv.org/abs/2207.05608 arXiv: 2207.05608[cs.RO](cit. on p. 1)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[39]

Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R Narasimhan. 2024. SWE-bench: can language models resolve real-world GitHub issues? InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=VTF8yNQM66 (cit. on pp. 7, 9)

work page 2024
[40]

Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, and Derek Dreyer. 2018. RustBelt: Securing the foundations of the Rust programming language.Proc. ACM Program. Lang., 2, POPL, 66:1–66:34. doi:10.1145/3158154 (cit. on pp. 4, 9)

work page doi:10.1145/3158154 2018
[41]

Christoph Kern. 2025. Safe coding: Rigorous modular reasoning about software safety.ACM Queue, 23(5). https://spawn-queue.acm.org/doi/10.1145/3773098 (cit. on pp. 8, 9)

work page doi:10.1145/3773098 2025
[42]

Hellerstein, and Mae Milano

Shadaj Laddad, Alvin Cheung, Joseph M. Hellerstein, and Mae Milano. 2025. Flo: A semantic foundation for progressive stream processing.Proc. ACM Program. Lang., 9, POPL, 241–270. doi:10.1145/3704845 (cit. on p. 9)

work page doi:10.1145/3704845 2025
[43]

Rustan M

K. Rustan M. Leino. 2010. Dafny: an automatic program verifier for Functional Correctness. InLPAR. Springer, 348–370. doi:10.1007/978-3-642-17511-4_20 (cit. on p. 9)

work page doi:10.1007/978-3-642-17511-4_20 2010
[44]

Fengyu Liu et al. 2025. Make agent defeat agent: Automatic detection of taint- style vulnerabilities in LLM-based agents. InUSENIX Security Symposium. USENIX Association, 3767–3786. https://www.usenix.org/conference/usenixse curity25/presentation/liu-fengyu (cit. on p. 8)

work page 2025
[45]

Lucassen and David K

John M. Lucassen and David K. Gifford. 1988. Polymorphic effect systems. In POPL. ACM Press, 47–57. doi:10.1145/73560.73564 (cit. on p. 9)

work page doi:10.1145/73560.73564 1988
[46]

Erik Meijer. 2025. Guardians of the agents.Commun. ACM, 69, 1, 46–52. doi:10 .1145/3777544 (cit. on p. 9)

work page 2025
[47]

Darya Melicher, Yangqingwei Shi, Alex Potanin, and Jonathan Aldrich. 2017. A capability-based module system for authority control. InECOOP(LIPIcs). Vol. 74. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 20:1–20:27. doi:10 .4230/LIPIcs.ECOOP.2017.20 (cit. on p. 9)

work page 2017
[48]

Adrian Mettler, David Wagner, and Tyler Close. 2010. Joe-E: A security-oriented subset of Java. InNetwork and Distributed System Security Symposium (NDSS). The Internet Society (cit. on pp. 8, 9)

work page 2010
[49]

Mark S. Miller. 2006.Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. Ph.D. Dissertation. John Hopkins University (cit. on pp. 1, 3, 9). 10 Tracking Capabilities for Safer Agents

work page 2006
[50]

Miller, E

Mark S. Miller, E. Dean Tribble, and Jonathan Shapiro. 2005. Concurrency among strangers: Programming in E as plan coordination. TGC 2005, LNCS

work page 2005
[51]

http://www.erights.org/talks/promises/paper/tgc05.pdf (cit. on pp. 8, 9)

work page
[52]

Miller, Ka-Ping Yee, and Jonathan Shapiro

Mark S. Miller, Ka-Ping Yee, and Jonathan Shapiro. 2003. Capability myths demolished. In Technical Report SRL2003-02, Johns Hopkins University. htt ps://classpages.cselabs.umn.edu/Fall-2021/csci5271/papers/SRL2003-02.pdf (cit. on pp. 2, 8)

work page 2003
[53]

MiniMax. 2026. MiniMax m2.5: Built for real-world productivity. https://www .minimax.io/news/minimax-m25. (2026) (cit. on p. 7)

work page 2026
[54]

Milad Nasr et al. 2025. The attacker moves second: Stronger adaptive at- tacks bypass defenses against LLM jailbreaks and prompt injections.CoRR, abs/2510.09023. arXiv: 2510.09023 (cit. on pp. 1, 8)

work page arXiv 2025
[55]

Martin Odersky, Olivier Blanvillain, Fengyun Liu, Aggelos Biboudis, Heather Miller, and Sandro Stucki. 2018. Simplicitly: Foundations and applications of implicit function types.Proc. ACM Program. Lang., 2, POPL, 42:1–42:29. doi:10.1145/3158130 (cit. on p. 13)

work page doi:10.1145/3158130 2018
[56]

OpenAI. 2025. Gpt-oss-120b & gpt-oss-20b model card.CoRR, abs/2508.10925. arXiv: 2508.10925 (cit. on p. 7)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[57]

OpenCode. 2025. OpenCode: The open source AI coding agent. Accessed: 2026- 02-22. https://github.com/anomalyco/opencode (cit. on pp. 7, 9)

work page 2025
[58]

Essertel, Xilun Wu, Lilliam I

Leo Osvald, Grégory M. Essertel, Xilun Wu, Lilliam I. González Alayón, and Tiark Rompf. 2016. Gentrification gone too far? Affordable 2nd-class values for fun and (co-)effect. InOOPSLA. ACM, 234–251. doi:10.1145/2983990.2984009 (cit. on p. 9)

work page doi:10.1145/2983990.2984009 2016
[59]

Patil, Tianjun Zhang, Xin Wang, and Joseph E

Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. 2024. Gorilla: Large language model connected with massive APIs. InNeurIPS. Vol. 37. Curran Associates, Inc., 126544–126565. doi:10.52202/079017-4020 (cit. on p. 9)

work page doi:10.52202/079017-4020 2024
[60]

Patil et al

Shishir G. Patil et al. 2024. GoEX: Perspectives and designs towards a runtime for autonomous LLM applications.CoRR, abs/2404.06921. arXiv: 2404.06921 (cit. on p. 9)

work page arXiv 2024
[61]

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2022. Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions. InSP. IEEE, 754–768. doi:10.1109/SP46214.2022.9 833571 (cit. on p. 9)

work page doi:10.1109/sp46214.2022.9 2022
[62]

Pydantic. 2025. Monty: A Python interpreter in Rust. https://github.com/pyda ntic/monty (cit. on p. 9)

work page 2025
[63]

Brandon Radosevich and John Halloran. 2025. MCP safety audit: LLMs with the Model Context Protocol allow major security exploits.CoRR, abs/2504.03767. arXiv: 2504.03767 (cit. on p. 8)

work page arXiv 2025
[64]

Johann Rehberger. 2025. Claude AI APIs can be abused for data exfiltration. https://www.securityweek.com/claude-ai-apis-can-be-abused-for-data-exfi ltration/ (cit. on pp. 1, 8)

work page 2025
[65]

Tiark Rompf and Nada Amin. 2016. Type soundness for dependent object types (DOT). InOOPSLA. ACM, 624–641. doi:10.1145/2983990.2984008 (cit. on p. 2)

work page doi:10.1145/2983990.2984008 2016
[66]

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. 2024. Identifying the risks of LM agents with an LM-emulated sandbox. InICLR. OpenReview.net. arXiv: 2309.15817 (cit. on p. 8)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[67]

Andrei Sabelfeld and Andrew C. Myers. 2003. Language-based information-flow security.IEEE J. Sel. Areas Commun., 21, 1, 5–19. doi:10.1109/JSAC.2002.806121 (cit. on pp. 3, 9)

work page doi:10.1109/jsac.2002.806121 2003
[68]

Saltzer and Michael D

Jerome H. Saltzer and Michael D. Schroeder. 1975. The protection of information in computer systems.Proc. IEEE, 63, 9, 1278–1308. doi:10.1109/PROC.1975.9939 (cit. on p. 8)

work page doi:10.1109/proc.1975.9939 1975
[69]

An experimental asynchronous programming library,

[SW Mod.] Scala, “An experimental asynchronous programming library, ” part of Gears 2024 EPFL LAMP.url: https : / / lampepfl . github . io / gears,vcs: https://github.com/lampepfl/gears (cit. on p. 12)

work page 2024
[70]

Capture Checker,

[SW Mod.] Scala, “Capture Checker, ” part of Scala 3 2024 EPFL LAMP.url: ht tps://nightly.scala-lang.org/docs/reference/experimental/capture-checking/, vcs: https://github.com/scala/scala3 (cit. on pp. 4, 8, 9)

work page 2024
[71]

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools. InNeurIPS. arXiv: 2302.04761 (cit. on p. 9)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[72]

Georgios Syros, Anshuman Suri, Jacob Ginesin, Cristina Nita-Rotaru, and Alina Oprea. 2026. SAGA: A security architecture for governing AI agentic systems. InNDSS. The Internet Society (cit. on p. 9)

work page 2026
[73]

Peter Thiemann. 2025. What I always wanted to know about second class values. InProceedings of the Workshop Dedicated to Olivier Danvy on the Occasion of His 64th Birthday (OLIVIERFEST’25), 117–127. doi:10.1145/3759427.3760373 (cit. on p. 9)

work page doi:10.1145/3759427.3760373 2025
[74]

Sanidhya Vijayvargiya, Aditya Bharat Soni, Xuhui Zhou, Zora Zhiruo Wang, Nouha Dziri, Graham Neubig, and Maarten Sap. 2025. OpenAgentSafety: A comprehensive framework for evaluating real-world AI agent safety.CoRR, abs/2507.06134. arXiv: 2507.06134 (cit. on p. 8)

work page arXiv 2025
[75]

Robert N. M. Watson, Simon W. Moore, Peter Sewell, and Peter G. Neumann

work page
[76]

In University of Cambridge, Computer Labo- ratory, Technical Report UCAM-CL-TR-941

An introduction to CHERI. In University of Cambridge, Computer Labo- ratory, Technical Report UCAM-CL-TR-941. https://www.cl.cam.ac.uk/techrep orts/UCAM-CL-TR-941.pdf (cit. on pp. 1, 9)

work page
[77]

Simon Willison. 2023. The dual LLM pattern for building AI assistants that can resist prompt injection. Accessed: 2026-02-22. https://simonwillison.net/2023 /Apr/25/dual-llm-pattern/ (cit. on pp. 3, 8)

work page 2023
[78]

Wulf, Ellis Cohen, William Corwin, Anita Jones, Roy Levin, C

William A. Wulf, Ellis Cohen, William Corwin, Anita Jones, Roy Levin, C. Pier- son, and Fred Pollack. 1974. Hydra: The kernel of a multiprocessor operating system. Communications of the ACM, 17(6), 1974. https://doi.org/10.1145/3556 16.364017 (cit. on pp. 1, 9)

work page doi:10.1145/3556 1974
[79]

Anxhelo Xhebraj, Oliver Bračevac, Guannan Wei, and Tiark Rompf. 2022. What if we don’t pop the stack? The return of 2nd-class values. InECOOP(LIPIcs). Vol. 222. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 15:1–15:29. doi:10 .4230/LIPIcs.ECOOP.2022.15 (cit. on p. 9)

work page 2022
[80]

Zhen Xiang et al. 2025. GuardAgent: Safeguard LLM agents via knowledge- enabled reasoning. InICML(Proceedings of Machine Learning Research). Vol. 267. PMLR / OpenReview.net. arXiv: 2406.09187 (cit. on p. 9)

work page arXiv 2025

Showing first 80 references.