FORTIS benchmark shows over-privilege is the norm in LLM agent skill selection and execution, with models reaching for higher-privilege skills and tools than required across ten frontier models and three domains.
AAAI , author=
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.AI 4years
2026 4roles
background 1polarities
background 1representative citing papers
UFCOD extracts Path Energy and Dynamics Energy from diffusion trajectories to perform few-shot OOD detection across unrelated domains with one fixed model.
RaMem improves LLM agent memory by grounding fragments in original conditions like time and participants, then using validity-aware retrieval, yielding >10% average F1 gains over baselines.
citing papers explorer
-
FORTIS: Benchmarking Over-Privilege in Agent Skills
FORTIS benchmark shows over-privilege is the norm in LLM agent skill selection and execution, with models reaching for higher-privilege skills and tools than required across ten frontier models and three domains.
-
Geometry over Density: Few-Shot Cross-Domain OOD Detection
UFCOD extracts Path Energy and Dynamics Energy from diffusion trajectories to perform few-shot OOD detection across unrelated domains with one fixed model.
-
RaMem: Contextual Reinstatement for Long-term Agentic Memory
RaMem improves LLM agent memory by grounding fragments in original conditions like time and participants, then using validity-aware retrieval, yielding >10% average F1 gains over baselines.
- Counterfactual Trace Auditing of LLM Agent Skills