CTA framework detects 522 skill influence patterns in LLM agent traces across 49 tasks where average pass rate shifts only +0.3%, exposing evaluation gaps in behavioral effects like template copying and excess planning.
Panoptic scene graph generation with semantics-prototype learning.AAAI, 38(4):3145–3153, Mar
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.AI 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
FORTIS benchmark shows over-privilege is the norm in LLM agent skill selection and execution, with models reaching for higher-privilege skills and tools than required across ten frontier models and three domains.
UFCOD extracts Path Energy and Dynamics Energy from diffusion trajectories to perform few-shot OOD detection across unrelated domains with one fixed model.
citing papers explorer
-
Counterfactual Trace Auditing of LLM Agent Skills
CTA framework detects 522 skill influence patterns in LLM agent traces across 49 tasks where average pass rate shifts only +0.3%, exposing evaluation gaps in behavioral effects like template copying and excess planning.
-
FORTIS: Benchmarking Over-Privilege in Agent Skills
FORTIS benchmark shows over-privilege is the norm in LLM agent skill selection and execution, with models reaching for higher-privilege skills and tools than required across ten frontier models and three domains.
-
Geometry over Density: Few-Shot Cross-Domain OOD Detection
UFCOD extracts Path Energy and Dynamics Energy from diffusion trajectories to perform few-shot OOD detection across unrelated domains with one fixed model.