NARCBench and five activation-probing methods detect multi-agent collusion with 0.73-1.00 AUROC across distribution shifts and steganographic tasks by aggregating per-agent signals.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
System 1 intuition in edge SLMs delivers 100% adversarial robustness and low latency for DAO consensus while System 2 reasoning causes 26.7% cognitive collapse and 17x slowdown.
The authors introduce agentic microphysics and generative safety to link local agent interactions to population-level risks in agentic AI through a causally explicit framework.
citing papers explorer
-
Detecting Multi-Agent Collusion Through Multi-Agent Interpretability
NARCBench and five activation-probing methods detect multi-agent collusion with 0.73-1.00 AUROC across distribution shifts and steganographic tasks by aggregating per-agent signals.
-
The Cognitive Penalty: Ablating System 1 and System 2 Reasoning in Edge-Native SLMs for Decentralized Consensus
System 1 intuition in edge SLMs delivers 100% adversarial robustness and low latency for DAO consensus while System 2 reasoning causes 26.7% cognitive collapse and 17x slowdown.
-
Agentic Microphysics: A Manifesto for Generative AI Safety
The authors introduce agentic microphysics and generative safety to link local agent interactions to population-level risks in agentic AI through a causally explicit framework.