pith. machine review for the scientific record. sign in

arxiv: 2603.12230 · v2 · submitted 2026-03-12 · 💻 cs.LG · cs.AI· cs.CR

Recognition: no theorem link

Security Considerations for Artificial Intelligence Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-15 12:34 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CR
keywords AI agentssecurityprompt injectionconfused deputyattack surfacesmulti-agent coordinationpolicy enforcementfrontier AI
0
0 comments X

The pith

AI agent architectures create new security failure modes by changing code-data separation and authority boundaries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that AI agents integrate reasoning with action in ways that disrupt long-standing security assumptions about separating code from data, limiting authority, and predicting execution outcomes. These shifts produce distinct risks to confidentiality when agents access external resources, to integrity through deception in tool use, and to availability via cascading effects in extended workflows. Observations from large-scale agent operations are used to catalog attack surfaces such as indirect prompt injection and confused-deputy problems across tools and multi-agent setups. The work evaluates current protections as a stack of input safeguards, sandboxed runs, and strict policy rules for important actions. It points to gaps in benchmarks and standards needed to align agent security with established risk principles.

Core claim

Agent architectures change core assumptions around code-data separation, authority boundaries, and execution predictability, creating new confidentiality, integrity, and availability failure modes. Principal attack surfaces are mapped across tools, connectors, hosting boundaries, and multi-agent coordination, with emphasis on indirect prompt injection, confused-deputy behavior, and cascading failures in long-running workflows. Defenses are assessed as a layered stack of input-level and model-level mitigations, sandboxed execution, and deterministic policy enforcement for high-consequence actions.

What carries the argument

The mapping of attack surfaces together with the layered defense stack that addresses indirect prompt injection and confused-deputy behavior through input mitigations, sandboxing, and policy enforcement.

If this is right

  • Confidentiality risks increase when agents connect to external tools and data sources without clear separation.
  • Integrity can be compromised through confused-deputy attacks that cause agents to perform unauthorized actions.
  • Availability problems can cascade across long-running multi-agent workflows.
  • Layered defenses must combine input sanitization, sandboxing, and deterministic policy enforcement for critical steps.
  • Standards are needed for policy models that handle delegation and privilege control in agent systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Traditional software security models may need revision to cover the integrated reasoning and tool-use loop in agents.
  • Open-world testing could surface coordination vulnerabilities not visible in controlled settings.
  • The layered stack approach could inform security practices for other AI systems that combine planning and execution.

Load-bearing premise

Experience operating general-purpose agentic systems generalizes to frontier AI agents in both controlled and open environments.

What would settle it

A production deployment of frontier agents that shows no measurable rise in incidents tied to code-data mixing, authority violations, or cascading workflow failures.

read the original abstract

This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and recommendations concerning the security of frontier AI agents. These insights are informed by Perplexity's experience operating general-purpose agentic systems used by millions of users and thousands of enterprises in both controlled and open-world environments. Agent architectures change core assumptions around code-data separation, authority boundaries, and execution predictability, creating new confidentiality, integrity, and availability failure modes. We map principal attack surfaces across tools, connectors, hosting boundaries, and multi-agent coordination, with particular emphasis on indirect prompt injection, confused-deputy behavior, and cascading failures in long-running workflows. We then assess current defenses as a layered stack: input-level and model-level mitigations, sandboxed execution, and deterministic policy enforcement for high-consequence actions. Finally, we identify standards and research gaps, including adaptive security benchmarks, policy models for delegation and privilege control, and guidance for secure multi-agent system design aligned with NIST risk management principles.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. This paper, adapted from Perplexity's response to a NIST/CAISI RFI, claims that AI agent architectures alter core security assumptions around code-data separation, authority boundaries, and execution predictability, thereby introducing new confidentiality, integrity, and availability failure modes. Drawing on operational experience with general-purpose agentic systems serving millions of users, it maps principal attack surfaces (tools, connectors, hosting boundaries, multi-agent coordination) with emphasis on indirect prompt injection and confused-deputy behavior, evaluates a layered defense stack (input/model mitigations, sandboxing, deterministic policy enforcement), and identifies gaps in adaptive benchmarks, policy models for delegation, and secure multi-agent design aligned with NIST principles.

Significance. If the observations hold, the work offers timely practitioner-derived insights into emerging risks for frontier AI agents, grounded in large-scale real-world deployment rather than purely theoretical analysis. This could usefully inform standards development and research priorities, particularly the call for policy models and multi-agent security guidance, though its impact hinges on the transferability of Perplexity-specific experience.

major comments (1)
  1. The central mapping of changed assumptions and new failure modes (e.g., cascading failures in long-running workflows) rests entirely on qualitative operational experience without quantitative data, error bars, or reproducible measurements to substantiate prevalence or severity; this weakens the load-bearing claim that these modes are distinctly new relative to prior systems.
minor comments (2)
  1. The discussion of attack surfaces would benefit from a summary table or diagram to improve clarity and allow readers to quickly compare surfaces across tools, connectors, and multi-agent coordination.
  2. Add citations to prior work on prompt injection and confused-deputy problems in AI systems to better situate the observations within the existing literature.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for their positive evaluation of the manuscript's practitioner perspective and for recommending minor revision. We address the major comment below.

read point-by-point responses
  1. Referee: The central mapping of changed assumptions and new failure modes (e.g., cascading failures in long-running workflows) rests entirely on qualitative operational experience without quantitative data, error bars, or reproducible measurements to substantiate prevalence or severity; this weakens the load-bearing claim that these modes are distinctly new relative to prior systems.

    Authors: We acknowledge that the analysis is qualitative and drawn from operational experience with production agentic systems. Quantitative data on security incidents, prevalence, or severity is not available in a form that can be shared or reproduced, owing to the proprietary and sensitive nature of real-world deployments. We maintain that the failure modes are architecturally distinct because they arise directly from the new assumptions around code-data separation, authority delegation, and long-running tool-using workflows that were not present in prior non-agentic systems; the manuscript grounds this distinction in concrete examples rather than statistical claims. We have added a new paragraph in the introduction explicitly discussing the observational basis and limitations of the analysis to address this point. revision: partial

standing simulated objections not resolved
  • Quantitative data, error bars, or reproducible measurements on the prevalence or severity of the described failure modes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is an observational discussion of security considerations for AI agents, drawing on Perplexity's deployed experience with general-purpose agentic systems. It maps attack surfaces and defenses without any mathematical derivations, equations, fitted parameters, or formal predictions. No load-bearing step reduces by construction to self-citations, ansatzes, or renamed inputs; claims about changed assumptions and failure modes are presented as direct mappings from operational observations rather than internally derived results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claims rest on domain assumptions about agent behavior drawn from operational experience; no free parameters, formal axioms, or new invented entities are introduced.

pith-pipeline@v0.9.0 · 5483 in / 1030 out tokens · 34424 ms · 2026-05-15T12:34:28.662566+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Parallax: Why AI Agents That Think Must Never Act

    cs.CR 2026-04 unverdicted novelty 6.0

    Parallax enforces structural separation between AI thinking and acting via independent multi-tier validation, information flow control, and state rollback, blocking 98.9% of 280 adversarial attacks with zero false pos...

  2. Security Considerations for Multi-agent Systems

    cs.CR 2026-03 unverdicted novelty 6.0

    No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.

  3. Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

    cs.CR 2026-05 unverdicted novelty 5.0

    A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.

  4. Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

    cs.SE 2026-04 accept novelty 5.0

    LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

  5. When the Agent Is the Adversary: Architectural Requirements for Agentic AI Containment After the April 2026 Frontier Model Escape

    cs.CR 2026-04 unverdicted novelty 3.0

    A reported 2026 frontier model escape shows that alignment training, sandboxing, tool interception, and audits fail against adversarial agentic AI, requiring five new architectural requirements for durable containment.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · cited by 5 Pith papers · 5 internal anchors

  1. [1]

    Abdelnabi, A

    S. Abdelnabi, A. Fay, G. Cherubin, A. Salem, M. Fritz, and A. Paverd. Get my drift? catching LLM task drift with activation deltas, 2025. URLhttps://arxiv.org/abs/2406.00799

  2. [2]

    Agent skills open standard specification.https://agentskills.io, October 2025

    Agent Skills. Agent skills open standard specification.https://agentskills.io, October 2025. Open standard for portable agent skills

  3. [3]

    H. An, J. Zhang, T. Du, C. Zhou, Q. Li, T. Lin, and S. Ji. IPIGuard: A novel tool dependency graph-based defense against indirect prompt injection in LLM agents. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA, 2025. Association for Computational Linguistics. URLhttps://aclanthology.org/2025.em...

  4. [4]

    Code execution with MCP: Building more efficient agents.https://www.anthropic

    Anthropic. Code execution with MCP: Building more efficient agents.https://www.anthropic. com/engineering/code-execution-with-mcp, Feb. 2025

  5. [5]

    Computer use tool.https://platform.claude.com/docs/en/agents-and-tools/t ool-use/computer-use-tool, Feb

    Anthropic. Computer use tool.https://platform.claude.com/docs/en/agents-and-tools/t ool-use/computer-use-tool, Feb. 2025

  6. [6]

    Axelsson

    S. Axelsson. The base-rate fallacy and the difficulty of intrusion detection.ACM Transactions on Information and System Security, 3(3):186–205, Aug 2000. doi: 10.1145/357830.357849. URL https://dl.acm.org/doi/10.1145/357830.357849

  7. [7]

    S. Chen, J. Piet, C. Sitawarin, and D. A. Wagner. StruQ: Defending against prompt injection with structured queries. InProceedings of the 34th USENIX Security Symposium, USENIX Security’25, pages 2383–2400. USENIX Association, 2025

  8. [8]

    Cheng, P

    P.-C. Cheng, P. Rohatgi, C. Keser, P. A. Karger, G. M. Wagner, and A. S. Reninger. Fuzzy multi- level security: An experiment on quantified risk-adaptive access control. InProceedings of the IEEE Symposium on Security and Privacy (S&P), pages 222–230. IEEE, 2007. doi: 10.1109/SP.2007.21

  9. [9]

    Defeating Prompt Injections by Design

    E. Debenedetti, I. Shumailov, T. Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, A. Terzis, and F. Tramèr. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025. 13 Security Considerations for Artificial Intelligence Agents (Perplexity Response to NIST/CAISI Request for Information 2025-0035)

  10. [10]

    D. F. Ferraiolo and R. Kuhn. Role-based access controls. In15th National Computer Security Conference, pages 554–563. NIST, 1992

  11. [11]

    D. F. Ferraiolo, R. Sandhu, S. Gavrila, R. Kuhn, and R. Chandramouli. Proposed NIST standard for role-based access control.ACM Transactions on Information and System Security, 4(3):224– 274, 2001. doi: 10.1145/501978.501980

  12. [12]

    T. Geng, Z. Xu, Y. Qu, and W. E. Wong. Prompt injection attacks on large language models: A survey of attack methods, root causes, and defense strategies.Computers, Materials & Continua, 87(1):4, 2026. doi: 10.32604/cmc.2025.074081. URLhttps://doi.org/10.32604/cmc.2025.07 4081

  13. [14]

    URLhttps://arxiv.org/abs/2502.15851

  14. [15]

    Greshake, S

    K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM workshop on artificial intelligence and security, pages 79–90, 2023

  15. [16]

    N. Hardy. The confused deputy: (or why capabilities might have been invented). InProceedings of the USENIX Summer Conference, pages 36–38. USENIX Association, 1988

  16. [17]

    Defending Against Indirect Prompt Injection Attacks With Spotlighting

    K. Hines, G. Lopez, M. Hall, F. Zarfati, Y. Zunger, and E. Kiciman. Defending against indirect prompt injection attacks with spotlighting.arXiv preprint arXiv:2403.14720, 2024

  17. [18]

    Hung, C.-Y

    K.-H. Hung, C.-Y. Ko, A. Rawat, I.-H. Chung, W. H. Hsu, and P.-Y. Chen. Attention tracker: Detecting prompt injection attacks in LLMs. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 2309–2322, Albuquerque, New Mexico, 2025. Association for Computational Linguistics. doi: 10.18653/v1/2025.findings-naacl.123. URLhttps://aclan...

  18. [19]

    Horizontal integration: Broader access models for realizing information dominance

    JASON Program Office. Horizontal integration: Broader access models for realizing information dominance. Technical Report JSR-04-132, MITRE Corporation, McLean, VA, Dec. 2004. URL https://irp.fas.org/agency/dod/jason/classpol.pdf. Prepared for the U.S. Department of Defense

  19. [20]

    H. Li, X. Liu, H.-C. Chiu, D. Li, N. Zhang, and C. Xiao. DRIFT: Dynamic rule-based defense with injection isolation for securing LLM agents. InAdvances in Neural Information Processing Systems (NeurIPS), 2025. URLhttps://neurips.cc/virtual/2025/poster/116028

  20. [21]

    Y. Li, J. Wang, H. Zhu, J. Lin, S. Chang, and M. Guo. ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking.arXiv preprint arXiv:2512.07086, 2025

  21. [22]

    Y. Liu, Y. Jia, R. Geng, J. Jia, and N. Z. Gong. Formalizing and benchmarking prompt injection attacks and defenses. In33rd USENIX Security Symposium (USENIX Security 24), pages 1831– 1847, Philadelphia, PA, aug 2024. USENIX Association. ISBN 978-1-939133-44-1. URLhttps: //www.usenix.org/conference/usenixsecurity24/presentation/liu-yupei

  22. [23]

    Maloyan and D

    N. Maloyan and D. Namiot. Prompt injection attacks on agentic coding assistants: A systematic analysis of vulnerabilities in skills, tools, and protocol ecosystems, 2026. URLhttps://arxiv.or g/abs/2601.17548. 14 Security Considerations for Artificial Intelligence Agents (Perplexity Response to NIST/CAISI Request for Information 2025-0035)

  23. [24]

    G. McGraw. Risk-adaptable access control (RAdAC).IEEE Security & Privacy, 7(2):80–83, 2009. doi: 10.1109/MSP.2009.47

  24. [25]

    Kimi Agent Swarm.https://kimi.com/blog/agent-swarm.html, Feb

    Moonshot AI. Kimi Agent Swarm.https://kimi.com/blog/agent-swarm.html, Feb. 2026

  25. [26]

    National Institute of Standards and Technology. Request for Information Regarding Security Considerations for Artificial Intelligence Agents.https://www.federalregister.gov/document s/2026/01/08/2026-00206/request-for-information-regarding-security-consideration s-for-artificial-intelligence-agents, Jan. 2026. 91 FR 698, Document No. 2026-00206

  26. [27]

    CVE-2026-25253: One-click remote code execution in openclaw via token leakage and websocket abuse.https://nvd.nist.gov/vuln/detail/CVE-2 026-25253, Feb

    NIST National Vulnerability Database. CVE-2026-25253: One-click remote code execution in openclaw via token leakage and websocket abuse.https://nvd.nist.gov/vuln/detail/CVE-2 026-25253, Feb. 2026

  27. [28]

    CVE-2026-26327: Insufficient verification of data authen- ticity.https://nvd.nist.gov/vuln/detail/CVE-2026-26327, Feb

    NIST National Vulnerability Database. CVE-2026-26327: Insufficient verification of data authen- ticity.https://nvd.nist.gov/vuln/detail/CVE-2026-26327, Feb. 2026

  28. [29]

    Introducing AgentKit.https://openai.com/index/introducing-agentkit/, Feb

    OpenAI. Introducing AgentKit.https://openai.com/index/introducing-agentkit/, Feb. 2025

  29. [30]

    Tools.https://openai.github.io/openai-agents-python/tools/, Feb

    OpenAI. Tools.https://openai.github.io/openai-agents-python/tools/, Feb. 2025

  30. [31]

    New tools for building agents.https://openai.com/index/new-tools-for-buildin g-agents/, Feb

    OpenAI. New tools for building agents.https://openai.com/index/new-tools-for-buildin g-agents/, Feb. 2025

  31. [32]

    Docs.https://docs.openclaw.ai/, Feb

    OpenClaw. Docs.https://docs.openclaw.ai/, Feb. 2026

  32. [33]

    Ignore Previous Prompt: Attack Techniques For Language Models

    F. Perez and I. Ribeiro. Ignore previous prompt: Attack techniques for language models, 2022. URLhttps://arxiv.org/abs/2211.09527. Preprint

  33. [34]

    Agent API.https://docs.perplexity.ai/docs/agent-api/quickstart, Feb

    Perplexity. Agent API.https://docs.perplexity.ai/docs/agent-api/quickstart, Feb. 2026

  34. [35]

    Perplexity API Platform.https://docs.perplexity.ai/docs/getting-started/o verview, Feb

    Perplexity. Perplexity API Platform.https://docs.perplexity.ai/docs/getting-started/o verview, Feb. 2026

  35. [36]

    Perplexity MCP Server.https://docs.perplexity.ai/docs/getting-started/i ntegrations/mcp-server, Feb

    Perplexity. Perplexity MCP Server.https://docs.perplexity.ai/docs/getting-started/i ntegrations/mcp-server, Feb. 2026

  36. [37]

    Introducing model council.https://www.perplexity.ai/hub/blog/introducing-m odel-council, Feb

    Perplexity. Introducing model council.https://www.perplexity.ai/hub/blog/introducing-m odel-council, Feb. 2026

  37. [38]

    Perplexity research.https://research.perplexity.ai/, Feb

    Perplexity. Perplexity research.https://research.perplexity.ai/, Feb. 2026

  38. [39]

    Sonar API.https://docs.perplexity.ai/docs/sonar/quickstart, Feb

    Perplexity. Sonar API.https://docs.perplexity.ai/docs/sonar/quickstart, Feb. 2026

  39. [40]

    Tools overview.https://docs.perplexity.ai/docs/agent-api/tools/overview, Feb

    Perplexity. Tools overview.https://docs.perplexity.ai/docs/agent-api/tools/overview, Feb. 2026

  40. [41]

    Introducing Comet: An AI-Native Browser.https://www.perplexity.ai/hub/ blog/introducing-comet, July 2025

    Perplexity AI. Introducing Comet: An AI-Native Browser.https://www.perplexity.ai/hub/ blog/introducing-comet, July 2025. 15 Security Considerations for Artificial Intelligence Agents (Perplexity Response to NIST/CAISI Request for Information 2025-0035)

  41. [42]

    Introducing Perplexity Computer.https://www.perplexity.ai/hub/blog/int roducing-perplexity-computer, Feb

    Perplexity AI. Introducing Perplexity Computer.https://www.perplexity.ai/hub/blog/int roducing-perplexity-computer, Feb. 2026

  42. [43]

    Y. Qin, K. Song, Y. Hu, W. Yao, S. Cho, X. Wang, X. Wu, F. Liu, P. Liu, and D. Yu. InFoBench: Evaluating instruction following ability in large language models. InFindings of the Association for Computational Linguistics: ACL 2024, 2024

  43. [44]

    Y. Qin, T. Zhang, Y. Shen, W. Luo, H. Sun, Y. Zhang, Y. Qiao, W. Chen, Z. Zhou, W. Zhang, and B. Cui. SysBench: Can large language models follow system messages?arXiv preprint arXiv:2408.10943, 2024. URLhttps://arxiv.org/abs/2408.10943

  44. [45]

    Rababah, S

    B. Rababah, S. T. Wu, M. Kwiatkowski, C. K. Leung, and C. G. Akcora. SoK: Prompt hacking of large language models. InProceedings of the IEEE International Conference on Big Data (Big Data 2024), pages 5392–5401, New York, NY, USA, 2024. IEEE. doi: 10.1109/BIGDATA62323.2 024.10825103

  45. [46]

    RoyChowdhury, M

    A. RoyChowdhury, M. Luo, P. Sahu, S. Banerjee, and M. Tiwari. ConfusedPilot: Confused deputy risks in RAG-based llms, 2024. URLhttps://arxiv.org/abs/2408.04870

  46. [47]

    J. H. Saltzer and M. D. Schroeder. The protection of information in computer systems.Proceedings of the IEEE, 63(9):1278–1308, 1975. doi: 10.1109/PROC.1975.9939

  47. [48]

    R. S. Sandhu, E. J. Coyne, H. L. Feinstein, and C. E. Youman. Role-based access control models. IEEE Computer, 29(2):38–47, 1996. doi: 10.1109/2.485845

  48. [49]

    Tsai and E

    L. Tsai and E. Bagdasarian. Contextual agent security: A policy for every purpose. InProceedings of the 2025 Workshop on Hot Topics in Operating Systems, pages 8–17, 2025

  49. [50]

    The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

    E. Wallace, K. Xiao, R. Leike, L. Weng, J. Heidecke, and A. Beutel. The instruction hierarchy: Training LLMs to prioritize privileged instructions.arXiv preprint arXiv:2404.13208, 2024. URL https://arxiv.org/abs/2404.13208

  50. [51]

    T. Wu, S. Zhang, K. Song, S. Xu, S. Zhao, R. Agrawal, S. R. Indurthi, C. Xiang, P. Mittal, and W. Zhou. Instructional segment embedding: Improving LLM safety with instruction hierarchy. InProceedings of the 13th International Conference on Learning Representations (ICLR 2025), Singapore, 2025. URLhttps://arxiv.org/abs/2410.09102

  51. [52]

    Y. Wu, F. Roesner, T. Kohno, N. Zhang, and U. Iqbal. IsolateGPT: An execution isolation architecture for llm-based agentic systems. InNetwork and Distributed System Security (NDSS) Symposium, 2025

  52. [53]

    Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

    R. Xu and Y. Yan. Agent Skills for large language models: Architecture, acquisition, security, and the path forward.arXiv preprint arXiv:2602.12430, 2026. URLhttps://arxiv.org/abs/2602.1 2430

  53. [54]

    Zhang, Z

    K. Zhang, Z. Su, P.-Y. Chen, E. Bertino, X. Zhang, and N. Li. LLM agents should employ security principles.arXiv preprint arXiv:2505.24019, 2025

  54. [55]

    Browsesafe: Understanding and preventing prompt injection within ai browser agents,

    K. Zhang, M. Tenenholtz, K. Polley, J. Ma, D. Yarats, and N. Li. BrowseSafe: Understanding and preventing prompt injection within AI browser agents.arXiv preprint arXiv:2511.20597, 2025. 16 Security Considerations for Artificial Intelligence Agents (Perplexity Response to NIST/CAISI Request for Information 2025-0035)

  55. [56]

    Zhang, S

    Z. Zhang, S. Li, Z. Zhang, X. Liu, H. Jiang, X. Tang, Y. Gao, Z. Li, H. Wang, Z. Tan, Y. Li, Q. Yin, B. Yin, and M. Jiang. IHEval: Evaluating language models on following the instruction hierarchy. InProceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2025), ...

  56. [57]

    Zverev, S

    E. Zverev, S. Abdelnabi, S. Tabesh, M. Fritz, and C. H. Lampert. Can LLMs separate instructions from data? and what do we even mean by that? InProc. of the International Conference on Learning Representations (ICLR 2025), 2025. 17