pith. machine review for the scientific record. sign in

arxiv: 2503.18666 · v3 · submitted 2025-03-24 · 💻 cs.AI · cs.CL

Recognition: 2 theorem links

· Lean Theorem

AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

Christopher M. Poskitt, Haoyu Wang, Jun Sun

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:20 UTC · model grok-4.3

classification 💻 cs.AI cs.CL
keywords LLM agentsruntime safetydomain-specific languageenforcementautonomous agentssafety rules
0
0 comments X

The pith

AgentSpec lets users write runtime rules that stop LLM agents from unsafe actions in code, robots, and cars.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AgentSpec, a domain-specific language for defining safety constraints on LLM-based agents. Users specify rules using triggers, predicates, and enforcement actions to intercept and block dangerous behaviors at runtime. Evaluations across code execution, embodied agents, and autonomous driving show it blocks over 90% of unsafe code actions, all hazardous embodied moves, and ensures full compliance in vehicle scenarios. The approach is lightweight, adding only millisecond overheads, and supports automatic rule generation with LLMs achieving high precision.

Core claim

AgentSpec is a lightweight domain-specific language that allows users to specify structured rules incorporating triggers, predicates, and enforcement mechanisms to ensure LLM agents operate within predefined safety boundaries at runtime.

What carries the argument

AgentSpec, the domain-specific language for runtime enforcement of safety rules on LLM agents using triggers, predicates, and enforcement mechanisms.

Load-bearing premise

That all relevant unsafe scenarios can be anticipated and expressed as practical, predefined rules.

What would settle it

Observing an LLM agent performing a hazardous action in a tested domain despite an AgentSpec rule being in place that should have caught it.

read the original abstract

Agents built on LLMs are increasingly deployed across diverse domains, automating complex decision-making and task execution. However, their autonomy introduces safety risks, including security vulnerabilities, legal violations, and unintended harmful actions. Existing mitigation methods, such as model-based safeguards and early enforcement strategies, fall short in robustness, interpretability, and adaptability. To address these challenges, we propose AgentSpec, a lightweight domain-specific language for specifying and enforcing runtime constraints on LLM agents. With AgentSpec, users define structured rules that incorporate triggers, predicates, and enforcement mechanisms, ensuring agents operate within predefined safety boundaries. We implement AgentSpec across multiple domains, including code execution, embodied agents, and autonomous driving, demonstrating its adaptability and effectiveness. Our evaluation shows that AgentSpec successfully prevents unsafe executions in over 90% of code agent cases, eliminates all hazardous actions in embodied agent tasks, and enforces 100% compliance by autonomous vehicles (AVs). Despite its strong safety guarantees, AgentSpec remains computationally lightweight, with overheads in milliseconds. By combining interpretability, modularity, and efficiency, AgentSpec provides a practical and scalable solution for enforcing LLM agent safety across diverse applications. We also automate the generation of rules using LLMs and assess their effectiveness. Our evaluation shows that the rules generated by OpenAI o1 achieve a precision of 95.56% and recall of 70.96% for embodied agents, successfully identify 87.26% of the risky code, and prevent AVs from breaking laws in 5 out of 8 scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes AgentSpec, a lightweight DSL for specifying runtime safety constraints on LLM agents via structured rules consisting of triggers, predicates, and enforcement mechanisms. It implements the system across code execution, embodied agents, and autonomous driving domains, reporting that AgentSpec prevents unsafe executions in over 90% of code agent cases, eliminates all hazardous actions in embodied tasks, and achieves 100% compliance in AV scenarios. The work also evaluates LLM-generated rules (e.g., via OpenAI o1), which achieve 95.56% precision and 70.96% recall on embodied agents, identify 87.26% of risky code, and succeed in 5/8 AV scenarios, while claiming low runtime overhead.

Significance. If the empirical results are robust, AgentSpec provides a practical, interpretable, and modular alternative to model-based safeguards for LLM agent safety. Its cross-domain applicability and support for both manual and automated rule generation could address key gaps in robustness and adaptability, with the lightweight enforcement making it suitable for real-time use.

major comments (3)
  1. [Abstract] Abstract: The headline claims of eliminating all hazardous actions in embodied agents and 100% AV compliance are based on finite test suites, but no coverage argument, mutation analysis, or adversarial test set is provided to demonstrate that the predicate/trigger combinations exhaustively intercept all unsafe trajectories in the respective action spaces. If an agent produces an action outside the enumerated triggers, enforcement is bypassed.
  2. [Abstract] Abstract: The reported success rates lack supporting experimental details such as number of trials, baselines, error bars, statistical significance, or discussion of potential confounds and post-hoc selection, making it difficult to assess whether the data fully supports the claims of over 90% prevention in code agents and perfect enforcement in the other domains.
  3. [Abstract] Abstract: LLM-generated rules achieve only 70.96% recall on embodied agents and succeed in 5/8 AV scenarios, which undercuts the practicality of the automated generation approach relative to the manual-rule results presented as perfect on the evaluated cases; the paper does not address how users would ensure comprehensive rule coverage in practice.
minor comments (1)
  1. [Abstract] The abstract would benefit from explicitly distinguishing performance metrics between manually authored rules and LLM-generated rules in the main claims rather than separating them.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point-by-point below, with revisions made to clarify limitations and strengthen the presentation of results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claims of eliminating all hazardous actions in embodied agents and 100% AV compliance are based on finite test suites, but no coverage argument, mutation analysis, or adversarial test set is provided to demonstrate that the predicate/trigger combinations exhaustively intercept all unsafe trajectories in the respective action spaces. If an agent produces an action outside the enumerated triggers, enforcement is bypassed.

    Authors: We agree that the reported results are based on finite test suites and that AgentSpec only enforces rules for explicitly defined triggers and predicates; actions falling outside these are not intercepted. We do not claim exhaustive coverage of all possible unsafe trajectories. In the revised manuscript, we have updated the abstract to qualify the claims as applying 'on the evaluated test suites' and added a new paragraph in the Discussion section explaining that comprehensive safety depends on users defining rules that cover their target action spaces, along with suggestions for future automated coverage verification techniques. revision: yes

  2. Referee: [Abstract] Abstract: The reported success rates lack supporting experimental details such as number of trials, baselines, error bars, statistical significance, or discussion of potential confounds and post-hoc selection, making it difficult to assess whether the data fully supports the claims of over 90% prevention in code agents and perfect enforcement in the other domains.

    Authors: The full Evaluation section reports the number of trials (100 for code agents, 50 for embodied agents, and 8 scenarios for AV), baselines (unconstrained agents), and runtime overhead measurements. To address the concern, we have revised the abstract to briefly note the evaluation scale and added error bars, statistical significance tests (t-tests with p-values), and explicit discussion of potential confounds and methodology to the results section and figures. revision: yes

  3. Referee: [Abstract] Abstract: LLM-generated rules achieve only 70.96% recall on embodied agents and succeed in 5/8 AV scenarios, which undercuts the practicality of the automated generation approach relative to the manual-rule results presented as perfect on the evaluated cases; the paper does not address how users would ensure comprehensive rule coverage in practice.

    Authors: We acknowledge that LLM-generated rules show lower recall (70.96%) and succeed in only 5/8 AV scenarios compared to manual rules. This underscores the value of hybrid approaches. In the revised manuscript, we have expanded the automated rule generation section with a new subsection on practical usage, recommending iterative LLM prompting, validation on test cases, and manual review/augmentation to achieve comprehensive coverage. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces a domain-specific language for runtime enforcement and reports empirical results from evaluations across code, embodied, and AV domains. No equations, fitted parameters, or analytical derivations are described that reduce to self-defined quantities or self-citations. Claims rest on experimental measurements of rule effectiveness rather than any load-bearing self-referential construction. Self-citations, if present, are not used to justify uniqueness theorems or ansatzes that force the central results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the feasibility of runtime monitoring and the ability to express all necessary safety constraints via the proposed DSL syntax.

axioms (1)
  • domain assumption Runtime interception and enforcement of LLM agent actions is feasible across domains without prohibitive overhead.
    The reported low overhead and high compliance rates presuppose that agent executions can be observed and controlled in real time.
invented entities (1)
  • AgentSpec DSL no independent evidence
    purpose: To allow users to specify structured safety rules with triggers, predicates, and enforcement mechanisms.
    Newly defined language and rule format introduced by the paper.

pith-pipeline@v0.9.0 · 5579 in / 1193 out tokens · 46923 ms · 2026-05-14T21:20:33.486922+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. APIOT: Autonomous Vulnerability Management Across Bare-Metal Industrial OT Networks

    cs.CR 2026-05 unverdicted novelty 8.0

    APIOT is the first LLM framework to complete the full autonomous discovery-to-remediation cycle on bare-metal OT devices, reaching 90% success across 290 runs on Zephyr RTOS.

  2. No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills

    cs.CR 2026-05 unverdicted novelty 7.0

    Sefz discovers specification violations in 29.9% of 402 real-world agent skills by translating guardrails into reachability goals and guiding LLM mutations with a multi-armed bandit.

  3. Evaluating Tool-Using Language Agents: Judge Reliability, Propagation Cascades, and Runtime Mitigation in AgentProp-Bench

    cs.AI 2026-04 conditional novelty 7.0

    AgentProp-Bench shows substring judging agrees with humans at kappa=0.049, LLM ensemble at 0.432, bad-parameter injection propagates with ~0.62 probability, rejection and recovery are independent, and a runtime fix cu...

  4. Governed Capability Evolution: Lifecycle-Time Compatibility Checking and Rollback for AI-Component-Based Systems, with Embodied Agents as Case Study

    cs.RO 2026-04 conditional novelty 7.0

    A governed capability evolution framework with interface, policy, behavioral, and recovery checks reduces unsafe activations to zero in embodied agent upgrades while preserving task success rates.

  5. Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

    cs.CR 2026-04 accept novelty 7.0

    Analysis of 17k LLM agent skills reveals 520 vulnerable ones with 1,708 leakage issues, primarily from debug output exposure, with a 10-pattern taxonomy and released dataset for future detection.

  6. SOCpilot: Verifying Policy Compliance for LLM-Assisted Incident Response

    cs.CR 2026-05 unverdicted novelty 6.0

    SOCpilot supplies a fixed verifier and public artifact that removes 466 non-compliant approval-gated actions from LLM plans on 200 real incidents while preserving task recall.

  7. ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

    cs.CR 2026-05 unverdicted novelty 6.0

    ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.

  8. Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense

    cs.AI 2026-05 unverdicted novelty 6.0 partial

    Tool-mediated LLM agents with deterministic tools and a machine-checked Lyapunov certificate achieve stable control in cyber defense, reducing attacker game value by 59% on real attack graphs.

  9. Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis

    cs.CR 2026-05 unverdicted novelty 6.0

    Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on ...

  10. Alignment Contracts for Agentic Security Systems

    cs.CR 2026-04 conditional novelty 6.0 full

    Alignment contracts define scope, allowed effects, budgets and disclosure rules as safety properties over finite effect traces, with decidable admissibility, refinement rules, and Lean-verified soundness under an obse...

  11. An AI Agent Execution Environment to Safeguard User Data

    cs.CR 2026-04 unverdicted novelty 6.0

    GAAP guarantees confidentiality of private user data for AI agents by enforcing user-specified permissions deterministically through persistent information flow tracking, without trusting the agent or requiring attack...

  12. Owner-Harm: A Missing Threat Model for AI Agent Safety

    cs.CR 2026-04 unverdicted novelty 6.0

    Owner-Harm is a new threat model with eight categories of agent behavior that harms the deployer, and existing defenses achieve only 14.8% true positive rate on injection-based owner-harm tasks versus 100% on generic ...

  13. PlanGuard: Defending Agents against Indirect Prompt Injection via Planning-based Consistency Verification

    cs.CR 2026-04 unverdicted novelty 6.0

    PlanGuard cuts indirect prompt injection attack success rate to 0% on the InjecAgent benchmark by verifying agent actions against a user-instruction-only plan while keeping false positives at 1.49%.

  14. Governed Capability Evolution: Lifecycle-Time Compatibility Checking and Rollback for AI-Component-Based Systems, with Embodied Agents as Case Study

    cs.RO 2026-04 unverdicted novelty 6.0

    A governed capability evolution framework for embodied agents uses four compatibility checks and a staged pipeline to achieve zero unsafe activations during upgrades while retaining comparable task success rates.

  15. Auditable Agents

    cs.AI 2026-04 unverdicted novelty 6.0

    No agent system can be accountable without auditability, which requires five dimensions (action recoverability, lifecycle coverage, policy checkability, responsibility attribution, evidence integrity) and mechanisms f...

  16. Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto Mode

    cs.SE 2026-04 unverdicted novelty 6.0

    Independent evaluation of Claude Code auto mode finds 81% false negative rate on ambiguous authorization tasks due to unmonitored file edits.

  17. ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis

    cs.AI 2026-04 unverdicted novelty 6.0

    ATBench is a new trajectory-level benchmark with 1,000 diverse and realistic scenarios for assessing safety in LLM agents.

  18. ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis

    cs.AI 2026-04 unverdicted novelty 6.0

    ATBench supplies 1,000 trajectories (503 safe, 497 unsafe) organized by risk source, failure mode, and harm to evaluate long-horizon safety in LLM-based agents.

  19. Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

    cs.CR 2026-05 unverdicted novelty 5.0

    A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.

  20. Sovereign Agentic Loops: Decoupling AI Reasoning from Execution in Real-World Systems

    cs.CR 2026-04 unverdicted novelty 5.0

    Sovereign Agentic Loops decouple LLM reasoning from execution by emitting validated intents through a control plane with obfuscation and evidence chains, blocking 93% of unsafe actions in a cloud prototype while addin...

  21. Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

    cs.SE 2026-04 unverdicted novelty 5.0

    Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.

  22. Spec Kit Agents: Context-Grounded Agentic Workflows

    cs.SE 2026-04 unverdicted novelty 5.0

    A multi-agent SDD framework with phase-level context-grounding hooks improves LLM-judged quality by 0.15 points and SWE-bench Lite Pass@1 by 1.7 percent while preserving near-perfect test compatibility.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · cited by 20 Pith papers · 1 internal anchor

  1. [1]

    https://github.com/haoyuwang99/AgentSpec, 2025

    AgentSpec. https://github.com/haoyuwang99/AgentSpec, 2025

  2. [2]

    Runtime verification for trustworthy computing

    Abela, R., Colombo, C., Curmi, A., Fenech, M., Vella, M., and Ferrando, A. Runtime verification for trustworthy computing. In AREA@ECAI (2023), vol. 391 of EPTCS, pp. 49–62

  3. [3]

    Apollo Self-Driving

    Baidu Apollo. Apollo Self-Driving. https://www.apollo.auto/apollo-self-driving,

  4. [4]

    Accessed: 2025-02-11

  5. [5]

    Principles of model checking

    Baier, C., and Katoen, J. Principles of model checking . MIT Press, 2008

  6. [6]

    When AI thinks it will lose, it sometimes cheats, study finds

    Booth, H. When AI thinks it will lose, it sometimes cheats, study finds. Time (2025). https://time.com/7259395/ai-chess-cheating-palisade-research/

  7. [7]

    In ACL (1) (2024), Association for Computational Linguistics, pp

    Chen, J., Hu, X., Liu, S., Huang, S., Tu, W., He, Z., and Wen, L.LLMArena: Assess- ing capabilities of large language models in dynamic multi-agent environments. In ACL (1) (2024), Association for Computational Linguistics, pp. 13055–13077

  8. [8]

    AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases

    Chen, Z., Xiang, Z., Xiao, C., Song, D., and Li, B. AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases. In NeurIPS (2024)

  9. [9]

    AI agents under threat: A survey of key security challenges and future pathways

    Deng, Z., Guo, Y., Han, C., Ma, W., Xiong, J., Wen, S., and Xiang, Y. AI agents under threat: A survey of key security challenges and future pathways. ACM Comput. Surv. 57, 7 (2025), 182:1–182:36

  10. [10]

    A survey on in-context learning

    Dong, Q., Li, L., Dai, D., Zheng, C., Ma, J., Li, R., Xia, H., Xu, J., Wu, Z., Chang, B., Sun, X., Li, L., and Sui, Z. A survey on in-context learning. In EMNLP (2024), Association for Computational Linguistics, pp. 1107–1128

  11. [11]

    Safeguarding large language models: A survey

    Dong, Y., Mu, R., Zhang, Y., Sun, S., Zhang, T., Wu, C., Jin, G., Qi, Y., Hu, J., Meng, J., Bensalem, S., and Huang, X. Safeguarding large language models: A survey. CoRR abs/2406.02622 (2024)

  12. [12]

    What can you verify and enforce at runtime? Int

    Falcone, Y., Fernandez, J., and Mounier, L. What can you verify and enforce at runtime? Int. J. Softw. Tools Technol. Transf. 14, 3 (2012), 349–382

  13. [13]

    llama.cpp: LLM inference in C/C++

    Gerganov, G., and ggml-org Community. llama.cpp: LLM inference in C/C++. https://github.com/ggml-org/llama.cpp, 2025

  14. [14]

    In NeurIPS (2024)

    Guo, C., Liu, X., Xie, C., Zhou, A., Zeng, Y., Lin, Z., Song, D., and Li, B.RedCode: Risky code execution and generation benchmark for code agents. In NeurIPS (2024)

  15. [15]

    V., Wiest, O., and Zhang, X

    Guo, T., Chen, X., W ang, Y., Chang, R., Pei, S., Chawla, N. V., Wiest, O., and Zhang, X. Large language model based multi-agents: A survey of progress and challenges. In IJCAI (2024), ijcai.org, pp. 8048–8057

  16. [16]

    CoRR abs/2402.03578 (2024)

    Han, S., Zhang, Q., Y ao, Y., Jin, W., Xu, Z., and He, C.LLM multi-agent systems: Challenges and open problems. CoRR abs/2402.03578 (2024)

  17. [17]

    LangChain

    LangChain Contributors. LangChain. https://www.langchain.com/langchain,

  18. [18]

    Accessed: 2025-01-14

  19. [19]

    LangChain Expression Language (LCEL)

    LangChain Contributors. LangChain Expression Language (LCEL). https: //python.langchain.com/docs/concepts/lcel/, 2025

  20. [20]

    Detecting Standard Violation Errors in Smart Contracts

    Li, A., and Long, F. Detecting standard violation errors in smart contracts. CoRR abs/1812.07702 (2018)

  21. [21]

    CAMEL: communicative agents for "mind" exploration of large language model society

    Li, G., Hammoud, H., Itani, H., Khizbullin, D., and Ghanem, B. CAMEL: communicative agents for "mind" exploration of large language model society. In NeurIPS (2023)

  22. [22]

    Eia: Environmental injection attack on generalist web agents for privacy leakage

    Liao, Z., Mo, L., Xu, C., Kang, M., Zhang, J., Xiao, C., Tian, Y., Li, B., and Sun, H. Eia: Environmental injection attack on generalist web agents for privacy leakage. In ICLR (2025), OpenReview.net

  23. [23]

    Efficient detection of toxic prompts in large language models

    Liu, Y., Yu, J., Sun, H., Shi, L., Deng, G., Chen, Y., and Liu, Y. Efficient detection of toxic prompts in large language models. In ASE (2024), ACM, pp. 455–467

  24. [24]

    A language agent for au- tonomous driving

    Mao, J., Ye, J., Qian, Y., Pavone, M., and Wang, Y. A language agent for au- tonomous driving. CoRR abs/2311.10813 (2023)

  25. [25]

    What are AI guardrails? https://www.mckinsey.com/ featured-insights/mckinsey-explainers/what-are-ai-guardrails, 2024

    McKinsey & Company. What are AI guardrails? https://www.mckinsey.com/ featured-insights/mckinsey-explainers/what-are-ai-guardrails, 2024. Accessed: 2025-02-21

  26. [26]

    Real estate listing gaffe exposes widespread use of AI in Australian industry – and potential risks

    McLeod, C. Real estate listing gaffe exposes widespread use of AI in Australian industry – and potential risks. The Guardian (2024). Accessed: 2025-07-25

  27. [27]

    AutoGen: A framework for building AI agents and applications

    Microsoft. AutoGen: A framework for building AI agents and applications. https://microsoft.github.io/autogen/stable//index.html, 2025. Accessed: 2025-01- 14

  28. [28]

    M., Pham, L

    Min, N. M., Pham, L. H., Li, Y., and Sun, J. CROW: eliminating backdoors from large language models via internal consistency regularization. In ICML (2025), OpenReview.net

  29. [29]

    NeMo: A scalable generative AI framework

    NVIDIA. NeMo: A scalable generative AI framework. https://github.com/ NVIDIA/NeMo, 2025

  30. [30]

    S., O’Brien, J

    Park, J. S., O’Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., and Bernstein, M. S. Generative agents: Interactive simulacra of human behavior. InUIST (2023), ACM, pp. 2:1–2:22

  31. [31]

    The Definitive ANTLR 4 Reference

    Parr, T. The Definitive ANTLR 4 Reference . Pragmatic Bookshelf, 2013

  32. [32]

    From prompt injections to SQL injection attacks: How protected is your llm-integrated web application? CoRR abs/2308.01990 (2023)

    Pedro, R., Castro, D., Carreira, P., and Santos, N. From prompt injections to SQL injection attacks: How protected is your llm-integrated web application? CoRR abs/2308.01990 (2023)

  33. [33]

    Richards, T. B. AutoGPT. https://github.com/Significant-Gravitas/AutoGPT, 2025

  34. [34]

    J., and Hashimoto, T.Identifying the risks of LM agents with an LM-emulated sandbox

    Ruan, Y., Dong, H., W ang, A., Pitis, S., Zhou, Y., Ba, J., Dubois, Y., Maddison, C. J., and Hashimoto, T.Identifying the risks of LM agents with an LM-emulated sandbox. In ICLR (2024), OpenReview.net

  35. [35]

    M., Nick- ovic, D., Pace, G

    Sánchez, C., Schneider, G., Ahrendt, W., Bartocci, E., Bianculli, D., Colombo, C., Falcone, Y., Francalanza, A., Krstic, S., Lourenço, J. M., Nick- ovic, D., Pace, G. J., Rufino, J., Signoles, J., Traytel, D., and Weiss, A. A survey of challenges for runtime verification from advanced application domains (beyond software). Formal Methods Syst. Des. 54 , 3...

  36. [36]

    C., Yang, C., and Wang, M

    Shi, W., Xu, R., Zhuang, Y., Yu, Y., Zhang, J., Wu, H., Zhu, Y., Ho, J. C., Yang, C., and Wang, M. D. EHRAgent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records. In EMNLP (2024), Association for Computational Linguistics, pp. 22315–22339

  37. [37]

    In NeurIPS (2023)

    Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., and Y ao, S.Reflexion: language agents with verbal reinforcement learning. In NeurIPS (2023)

  38. [38]

    M., Sun, J., Chen, Y., and Y ang, Z.LawBreaker: An approach for specifying traffic laws and fuzzing autonomous vehicles

    Sun, Y., Poskitt, C. M., Sun, J., Chen, Y., and Y ang, Z.LawBreaker: An approach for specifying traffic laws and fuzzing autonomous vehicles. In ASE (2022), ACM, pp. 62:1–62:12

  39. [39]

    M., W ang, K., and Sun, J.FixDrive: Automatically repairing autonomous vehicle driving behaviour for $0.08 per violation

    Sun, Y., Poskitt, C. M., W ang, K., and Sun, J.FixDrive: Automatically repairing autonomous vehicle driving behaviour for $0.08 per violation. In ICSE (2025), IEEE, pp. 1921–1933

  40. [40]

    Prioritizing safeguarding over autonomy: Risks of LLM agents for science

    Tang, X., Jin, Q., Zhu, K., Yuan, T., Zhang, Y., Zhou, W., Qu, M., Zhao, Y., Tang, J., Zhang, Z., Cohan, A., Lu, Z., and Gerstein, M. Prioritizing safeguarding over autonomy: Risks of LLM agents for science. CoRR abs/2402.04247 (2024)

  41. [41]

    Voyager: An open-ended embodied agent with large language models

    Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., Fan, L., and Anandkumar, A. Voyager: An open-ended embodied agent with large language models. Trans. Mach. Learn. Res. 2024 (2024)

  42. [42]

    M., Sun, Y., Sun, J., Wang, J., Cheng, P., and Chen, J

    Wang, K., Poskitt, C. M., Sun, Y., Sun, J., Wang, J., Cheng, P., and Chen, J. 𝜇Drive: User-controlled autonomous driving. CoRR abs/2407.13201 (2024)

  43. [43]

    X., Wei, Z., and Wen, J

    W ang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., Zhao, W. X., Wei, Z., and Wen, J. A survey on large language model based autonomous agents. Frontiers Comput. Sci. 18 , 6 (2024), 186345

  44. [44]

    In ICML (2024), OpenReview.net

    W ang, X., Chen, Y., Yuan, L., Zhang, Y., Li, Y., Peng, H., and Ji, H.Executable code actions elicit better LLM agents. In ICML (2024), OpenReview.net

  45. [45]

    The rise and potential of large language model based agents: A survey

    Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., Zhang, M., W ang, J., Jin, S., Zhou, E., Zheng, R., Fan, X., W ang, X., Xiong, L., Zhou, Y., W ang, W., Jiang, C., Zou, Y., Liu, X., Yin, Z., Dou, S., Weng, R., Qin, W., Zheng, Y., Qiu, X., Huang, X., Zhang, Q., and Gui, T. The rise and potential of large language model based agents: A survey. Sci. Ch...

  46. [46]

    CoRR abs/2406.09187 (2024)

    Xiang, Z., Zheng, L., Li, Y., Hong, J., Li, Q., Xie, H., Zhang, J., Xiong, Z., Xie, C., Y ang, C., Song, D., and Li, B.GuardAgent: Safeguard LLM agents by a guard agent via knowledge-enabled reasoning. CoRR abs/2406.09187 (2024)

  47. [47]

    In KDD (2024), ACM, pp

    Xing, M., Zhang, R., Xue, H., Chen, Q., Y ang, F., and Xiao, Z.Understanding the weakness of large language model agents within a complex android environment. In KDD (2024), ACM, pp. 6061–6072

  48. [48]

    E., Wettig, A., Lieret, K., Yao, S., Narasimhan, K., and Press, O

    Yang, J., Jimenez, C. E., Wettig, A., Lieret, K., Yao, S., Narasimhan, K., and Press, O. SWE-agent: Agent-computer interfaces enable automated software engineering. In NeurIPS (2024)

  49. [49]

    Watch out for your agents! Investigating backdoor threats to LLM-based agents

    Yang, W., Bi, X., Lin, Y., Chen, S., Zhou, J., and Sun, X. Watch out for your agents! Investigating backdoor threats to LLM-based agents. In NeurIPS (2024)

  50. [50]

    R., and Cao, Y

    Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K. R., and Cao, Y. ReAct: Synergizing reasoning and acting in language models. In ICLR (2023), OpenReview.net

  51. [51]

    SafeAgentBench: A benchmark for safe task planning of embodied LLM agents

    Yin, S., Pang, X., Ding, Y., Chen, M., Bi, Y., Xiong, Y., Huang, W., Xiang, Z., Shao, J., and Chen, S. SafeAgentBench: A benchmark for safe task planning of embodied LLM agents. CoRR abs/2412.13178 (2024)

  52. [52]

    Breaking agents: Compromising autonomous LLM agents through malfunction amplification

    Zhang, B., Tan, Y., Shen, Y., Salem, A., Backes, M., Zannettou, S., and Zhang, Y. Breaking agents: Compromising autonomous LLM agents through malfunction amplification. CoRR abs/2407.20859 (2024)

  53. [53]

    K., Zhang, P., and Sun, J

    Zhang, M., Goh, K. K., Zhang, P., and Sun, J. LLMScan: Causal scan for LLM misbehavior detection. In ICML (2025), OpenReview.net

  54. [54]

    In ASE (2024), ACM, pp

    Zhang, Q., Zhou, C., Go, G., Zeng, B., Shi, H., Xu, Z., and Jiang, Y.Imperceptible content poisoning in LLM-powered applications. In ASE (2024), ACM, pp. 242– 254

  55. [55]

    Zhang, Y., Cai, Y., Zuo, X., Luan, X., Wang, K., Hou, Z., Zhang, Y., Wei, Z., Sun, M., Sun, J., Sun, J., and Dong, J. S. Position: Trustworthy AI agents require the integration of large language models and formal methods. In ICML (2025), OpenReview.net

  56. [56]

    Towards general conceptual model editing via adversarial representation engineering

    Zhang, Y., Wei, Z., Sun, J., and Sun, M. Towards general conceptual model editing via adversarial representation engineering. CoRR abs/2404.13752 (2024)

  57. [57]

    Defending large language models against jailbreak attacks via layer-specific editing

    Zhao, W., Li, Z., Li, Y., Zhang, Y., and Sun, J. Defending large language models against jailbreak attacks via layer-specific editing. In EMNLP (Findings) (2024), Association for Computational Linguistics, pp. 5094–5109

  58. [58]

    GPT-4V(ision) is a generalist web agent, if grounded

    Zheng, B., Gou, B., Kil, J., Sun, H., and Su, Y. GPT-4V(ision) is a generalist web agent, if grounded. In ICML (2024), OpenReview.net

  59. [59]

    D., Sun, J., and Chua, T.ALI-Agent: Assessing LLMs’ alignment with human values via agent-based evaluation

    Zheng, J., W ang, H., Zhang, A., Nguyen, T. D., Sun, J., and Chua, T.ALI-Agent: Assessing LLMs’ alignment with human values via agent-based evaluation. In NeurIPS (2024)