Malicious LLM API routers actively perform payload injection and secret exfiltration, with 9 of 428 tested routers showing malicious behavior and further poisoning risks from leaked credentials.
hub
A survey on large language model based autonomous agents
18 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
The Agent-First Tool API paradigm raises AI agent task success from 64% to 88% and cuts human interventions by 72.7% through semantic phases, structured contracts, and risk governance in a production enterprise system.
OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.
EvoMAS trains a workflow adapter with policy gradients to dynamically instantiate stage-specific multi-agent workflows from a fixed agent pool, using explicit task-state construction and terminal success signals, and outperforms static baselines on GAIA, HLE, and DeepResearcher.
ASPO combines multi-agent LLM proposals with deterministic enforcement in a MAPE-K loop to select conflict-free, resource-feasible security patterns for IoT, delivering 100% safety invariants and 21-23% tail latency/energy reductions on testbed workloads.
GAAP guarantees confidentiality of private user data for AI agents by enforcing user-specified permissions deterministically through persistent information flow tracking, without trusting the agent or requiring attack-free models.
Visual Inception poisons images to hijack long-term memory in agentic recommenders and steer planning, while CognitiveGuard reduces success to about 10% via perceptual sanitization and reasoning verification.
A multi-agent AI framework using processing and acoustic agents achieves 91.6% accuracy and 0.821 F1 score for in-situ porosity defect detection in wire-arc additive manufacturing.
OS-Atlas, trained on the largest open-source cross-platform GUI grounding corpus of 13 million elements, outperforms prior open-source models on six benchmarks across mobile, desktop, and web platforms.
Agent Mentor analyzes semantic trajectories in agent logs to identify undesired behaviors and derives corrective prompt instructions, yielding measurable accuracy gains on benchmark tasks across three agent setups.
Structured reasoning artifacts enable coordination in LLM multi-agent systems by preventing approval and welfare collapse under asymmetric information while keeping bad-approval rates low across audit regimes.
EconAI adds memory weighting and economic sentiment indexing to LLM agents so they adapt short-term actions to long-term goals inside a single macro/micro simulation loop.
SciFi is a safe, lightweight agentic AI framework that automates structured scientific tasks with minimal human intervention via isolated environments and layered self-assessing agents.
Aethon enables near-constant-time instantiation of stateful AI agents via reference-based replication over compositional views, layered memory, and copy-on-write semantics.
OpenKedge redefines AI agent state mutations as a governed process using intent proposals, policy-evaluated execution contracts, and cryptographic evidence chains to enable safe, auditable agentic behavior.
A survey that provides a taxonomy of methods for improving planning in LLM-based agents across task decomposition, plan selection, external modules, reflection, and memory.
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.
citing papers explorer
-
The Rise and Potential of Large Language Model Based Agents: A Survey
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.