Instructional segment embedding: Improving llm safety with instruction hierarchy

Tong Wu, Shujian Zhang, Kaiqiang Song, Silei Xu, Sanqiang Zhao, Ravi Agrawal, Sathish Reddy Indurthi, Chong Xiang, Prateek Mittal, Wenxuan Zhou · 2025 · arXiv 2410.09102

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck

cs.CR · 2026-05-11 · unverdicted · novelty 7.0

PACT achieves perfect security and utility under oracle provenance by enforcing argument-level trust contracts based on semantic roles and cross-step provenance tracking, outperforming invocation-level monitors in AgentDojo evaluations.

Many-Tier Instruction Hierarchy in LLM Agents

cs.CL · 2026-04-10 · unverdicted · novelty 7.0

ManyIH and ManyIH-Bench address instruction conflicts in LLM agents with up to 12 privilege levels across 853 tasks, revealing frontier models achieve only ~40% accuracy.

Agent Security is a Systems Problem

cs.CR · 2026-05-18 · unverdicted · novelty 4.0 · 2 refs

The paper argues that agent security is best addressed as a systems problem by applying principles from operating systems, networks, and formal methods rather than relying solely on model robustness improvements.

Security Considerations for Artificial Intelligence Agents

cs.LG · 2026-03-12 · unverdicted · novelty 3.0

Frontier AI agents introduce new confidentiality, integrity, and availability risks through changed assumptions on code-data separation and authority boundaries, requiring layered defenses like sandboxing and policy enforcement.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Many-Tier Instruction Hierarchy in LLM Agents cs.CL · 2026-04-10 · unverdicted · none · ref 27
ManyIH and ManyIH-Bench address instruction conflicts in LLM agents with up to 12 privilege levels across 853 tasks, revealing frontier models achieve only ~40% accuracy.

Instructional segment embedding: Improving llm safety with instruction hierarchy

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer