The TAB benchmark reveals that frontier terminal agents achieve high task completion but low selective alignment with relevant environmental cues over distractors, and prompt-injection defenses block both.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
Proposes a probabilistic framework for latent agentic substructures in DNNs using log-score utilities and log pooling, with proofs on unanimity and an application to persona emergence in LLM alignment.
citing papers explorer
-
No More, No Less: Task Alignment in Terminal Agents
The TAB benchmark reveals that frontier terminal agents achieve high task completion but low selective alignment with relevant environmental cues over distractors, and prompt-injection defenses block both.
-
Probabilistic Modeling of Latent Agentic Substructures in Deep Neural Networks
Proposes a probabilistic framework for latent agentic substructures in DNNs using log-score utilities and log pooling, with proofs on unanimity and an application to persona emergence in LLM alignment.