AuditFlow combines a graph-grounded symbolic environment with a multi-agent LLM setup to reach 82.09% joint audit accuracy on structured financial reports, 14.93 points above the strongest baseline.
9 From Reward-Hack Activations to Agentic Risk States Table 5.Feature groups used for next-step risk prediction
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Reward-hack activations flag latent policy states in LLM agents but require added entropy and context features to better predict when those states lead to exploit actions.
citing papers explorer
-
AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification
AuditFlow combines a graph-grounded symbolic environment with a multi-agent LLM setup to reach 82.09% joint audit accuracy on structured financial reports, 14.93 points above the strongest baseline.
-
From Reward-Hack Activations to Agentic Risk States: Context-Calibrated Mechanistic Monitoring in LLM Agents
Reward-hack activations flag latent policy states in LLM agents but require added entropy and context features to better predict when those states lead to exploit actions.