FAPO automates LLM pipeline optimization via iterative diagnosis and prompt-or-structure edits, beating GEPA baseline by +14.1 pp mean across 18 comparisons and +33.8 pp when structural changes occur.
Llama-3.1-foundationai-securityllm-8b-instruct technical report
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 2polarities
background 2representative citing papers
Malicious Skills induce coding agents to hallucinate and import attacker-controlled packages at high rates while evading detection.
The paper reformulates industrial continual learning for LLMs as a closed-loop ecosystem problem, identifies three core challenges, and organizes solutions around five lifecycle design principles.
Domain-adapted LLMs and SLMs do not consistently outperform general models on STRIDE threat classification for 5G, with decoding strategies and model scale affecting validity but gains remaining insufficient for reliable use.
citing papers explorer
-
FAPO: Fully Automated Prompt Optimization of Multi-Step LLM Pipelines
FAPO automates LLM pipeline optimization via iterative diagnosis and prompt-or-structure edits, beating GEPA baseline by +14.1 pp mean across 18 comparisons and +33.8 pp when structural changes occur.
-
Trust Me, Import This: Dependency Steering Attacks via Malicious Agent Skills
Malicious Skills induce coding agents to hallucinate and import attacker-controlled packages at high rates while evading detection.
-
LLM Evolution as an Industry-Scale Ecosystem: A Lifecycle Perspective on Continual Learning
The paper reformulates industrial continual learning for LLMs as a closed-loop ecosystem problem, identifies three core challenges, and organizes solutions around five lifecycle design principles.
-
Threat Modelling using Domain-Adapted Language Models: Empirical Evaluation and Insights
Domain-adapted LLMs and SLMs do not consistently outperform general models on STRIDE threat classification for 5G, with decoding strategies and model scale affecting validity but gains remaining insufficient for reliable use.