TRACES learns prefix-level trajectory risk states from LLM hidden representations using weak trajectory-level supervision to enable proactive safety auditing for multi-turn agents.
10 Boxuan Zhang, Jianing Zhu, Zeru Shi, Dongfang Liu, and Ruixiang Tang
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling
TRACES learns prefix-level trajectory risk states from LLM hidden representations using weak trajectory-level supervision to enable proactive safety auditing for multi-turn agents.