hub

A survey on large language model based autonomous agents

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang + 2 more · 2024 · Frontiers of Computer Science · DOI 10.1007/s11704-024-40231-1

22 Pith papers cite this work, alongside 1,046 external citations. Polarity classification is still indexing.

22 Pith papers citing it

1,046 external citations · Crossref

open at publisher browse 22 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values

cs.AI · 2026-05-11 · unverdicted · novelty 8.0

Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.

Demystifying and Detecting Agentic Workflow Injection Vulnerabilities in GitHub Actions

cs.CR · 2026-05-08 · conditional · novelty 8.0

Agentic Workflow Injection is a new injection vulnerability class in LLM-augmented GitHub Actions, with two patterns (P2A and P2S) detected via the TaintAWI tool yielding 496 confirmed exploitable instances across 13,392 workflows.

Why Do Multi-Agent LLM Systems Fail?

cs.AI · 2025-03-17 · unverdicted · novelty 8.0

The authors create the first large-scale dataset and taxonomy of failure modes in multi-agent LLM systems to explain their limited performance gains.

Can a Single Message Paralyze the AI Infrastructure? The Rise of AbO-DDoS Attacks through Targeted Mobius Injection

cs.CR · 2026-05-12 · unverdicted · novelty 7.0

Mobius Injection exploits semantic closure in LLM agents to enable single-message AbO-DDoS attacks achieving up to 51x call amplification and 229x latency inflation.

Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

Evolving-RL jointly optimizes experience extraction and utilization in LLM agents via RL with separate evaluation signals, delivering up to 98.7% relative gains on out-of-distribution tasks in ALFWorld and Mind2Web.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

SkCC: Portable and Secure Skill Compilation for Cross-Framework LLM Agents

cs.CR · 2026-05-05 · unverdicted · novelty 7.0

SkCC compiles LLM skills via SkIR to achieve portability across agent frameworks, reduce adaptation effort from O(m×n) to O(m+n), and enforce security with reported gains in task success rates and token efficiency.

When Alignment Isn't Enough: Response-Path Attacks on LLM Agents

cs.CR · 2026-05-04 · unverdicted · novelty 7.0

A malicious relay can strategically rewrite aligned LLM outputs in BYOK agent architectures to achieve up to 99.1% attack success on benchmarks like AgentDojo and ASB.

BIM Information Extraction Through LLM-based Adaptive Exploration

cs.CL · 2026-05-03 · unverdicted · novelty 7.0

LLM adaptive exploration via runtime code execution outperforms static query generation for information extraction from heterogeneous BIM models on the new ifc-bench v2 benchmark.

Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models

cs.CR · 2026-04-22 · unverdicted · novelty 7.0

A novel function hijacking attack achieves 70-100% success rates in forcing specific function calls across five LLMs on the BFCL benchmark and is robust to context semantics.

Memory-Augmented LLM-based Multi-Agent System for Automated Feature Generation on Tabular Data

cs.AI · 2026-04-22 · unverdicted · novelty 7.0

MALMAS is a memory-augmented multi-agent LLM system that generates diverse, high-quality features for tabular data via agent decomposition, routing, and iterative memory-guided refinement.

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

cs.SE · 2024-05-06 · unverdicted · novelty 7.0

SWE-agent introduces a custom agent-computer interface that lets LM agents solve software engineering tasks, reaching 12.5% pass@1 on SWE-bench and 87.7% on HumanEvalFix, exceeding prior non-interactive approaches.

Sustaining Cooperation in Populations Guided by AI: A Folk Theorem for LLMs

cs.GT · 2026-05-07 · unverdicted · novelty 6.0

A folk theorem for LLMs proves that all feasible and individually rational outcomes can be sustained as ε-equilibria in repeated games where LLMs advise client populations, despite indirect observation.

Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems

cs.AI · 2026-04-23 · unverdicted · novelty 6.0

DiffMAS jointly optimizes latent communication and reasoning in multi-agent LLM systems via parameter-efficient supervised training on trajectories, yielding consistent gains over baselines on math, science, and code benchmarks.

The Semantic Training Gap: Ontology-Grounded Tool Architectures for Industrial AI Agent Systems

cs.AI · 2026-05-11 · unverdicted · novelty 5.0

Ontology-grounded tool architectures eliminate hallucination of domain identifiers in industrial AI agents by enforcing semantic constraints through a typed relational configuration and three-operation interface.

Unpredictability dissociates from structured control in language agents

cs.AI · 2026-05-10 · unverdicted · novelty 5.0

Stochastic unpredictability does not reproduce structured action-coupled control in language agents, as lesioning reason and veto components reduces structured profiles while high-stochasticity variants remain distinct from structured controls in 7/7 datasets.

Heterogeneous Scientific Foundation Model Collaboration

cs.AI · 2026-04-30 · unverdicted · novelty 5.0

Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.

JTPRO: A Joint Tool-Prompt Reflective Optimization Framework for Language Agents

cs.AI · 2026-04-20 · unverdicted · novelty 5.0

JTPRO co-optimizes prompts and tool descriptions via reflection to raise overall success rate by 5-20% over baselines on multi-tool benchmarks.

Coding-Free and Privacy-Preserving Agentic Framework for Data-Driven Clinical Research

cs.CL · 2026-04-14 · unverdicted · novelty 5.0

CARIS is a new agentic LLM framework that automates clinical research workflows from planning to reporting in a coding-free and privacy-preserving manner, achieving high completeness scores on heterogeneous datasets.

Agentic Federated Learning: The Future of Distributed Training Orchestration

cs.MA · 2026-04-06 · unverdicted · novelty 5.0

Agentic-FL introduces language model agents for autonomous orchestration in federated learning to address client heterogeneity and dynamic conditions.

Impact of Task Phrasing on Presumptions in Large Language Models

cs.CL · 2026-05-01 · unverdicted · novelty 3.0

LLMs show susceptibility to presumptions induced by task phrasing in decision tasks like the iterated prisoner's dilemma, mitigated by neutral wording.

Building an Internal Coding Agent at Zup: Lessons and Open Questions

cs.SE · 2026-04-10 · unverdicted · novelty 3.0

Engineering choices for tools, safety guardrails, and human oversight determine whether an internal coding agent delivers value in practice more than the underlying model quality.

citing papers explorer

Showing 22 of 22 citing papers.

Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values cs.AI · 2026-05-11 · unverdicted · none · ref 24
Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.
Demystifying and Detecting Agentic Workflow Injection Vulnerabilities in GitHub Actions cs.CR · 2026-05-08 · conditional · none · ref 47
Agentic Workflow Injection is a new injection vulnerability class in LLM-augmented GitHub Actions, with two patterns (P2A and P2S) detected via the TaintAWI tool yielding 496 confirmed exploitable instances across 13,392 workflows.
Why Do Multi-Agent LLM Systems Fail? cs.AI · 2025-03-17 · unverdicted · none · ref 5
The authors create the first large-scale dataset and taxonomy of failure modes in multi-agent LLM systems to explain their limited performance gains.
Can a Single Message Paralyze the AI Infrastructure? The Rise of AbO-DDoS Attacks through Targeted Mobius Injection cs.CR · 2026-05-12 · unverdicted · none · ref 1
Mobius Injection exploits semantic closure in LLM agents to enable single-message AbO-DDoS attacks achieving up to 51x call amplification and 229x latency inflation.
Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents cs.AI · 2026-05-11 · unverdicted · none · ref 27
Evolving-RL jointly optimizes experience extraction and utilization in LLM agents via RL with separate evaluation signals, delivering up to 98.7% relative gains on out-of-distribution tasks in ALFWorld and Mind2Web.
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment cs.CL · 2026-05-08 · unverdicted · none · ref 143
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
SkCC: Portable and Secure Skill Compilation for Cross-Framework LLM Agents cs.CR · 2026-05-05 · unverdicted · none · ref 3
SkCC compiles LLM skills via SkIR to achieve portability across agent frameworks, reduce adaptation effort from O(m×n) to O(m+n), and enforce security with reported gains in task success rates and token efficiency.
When Alignment Isn't Enough: Response-Path Attacks on LLM Agents cs.CR · 2026-05-04 · unverdicted · none · ref 106
A malicious relay can strategically rewrite aligned LLM outputs in BYOK agent architectures to achieve up to 99.1% attack success on benchmarks like AgentDojo and ASB.
BIM Information Extraction Through LLM-based Adaptive Exploration cs.CL · 2026-05-03 · unverdicted · none · ref 12
LLM adaptive exploration via runtime code execution outperforms static query generation for information extraction from heterogeneous BIM models on the new ifc-bench v2 benchmark.
Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models cs.CR · 2026-04-22 · unverdicted · none · ref 20
A novel function hijacking attack achieves 70-100% success rates in forcing specific function calls across five LLMs on the BFCL benchmark and is robust to context semantics.
Memory-Augmented LLM-based Multi-Agent System for Automated Feature Generation on Tabular Data cs.AI · 2026-04-22 · unverdicted · none · ref 86
MALMAS is a memory-augmented multi-agent LLM system that generates diverse, high-quality features for tabular data via agent decomposition, routing, and iterative memory-guided refinement.
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering cs.SE · 2024-05-06 · unverdicted · none · ref 49
SWE-agent introduces a custom agent-computer interface that lets LM agents solve software engineering tasks, reaching 12.5% pass@1 on SWE-bench and 87.7% on HumanEvalFix, exceeding prior non-interactive approaches.
Sustaining Cooperation in Populations Guided by AI: A Folk Theorem for LLMs cs.GT · 2026-05-07 · unverdicted · none · ref 54
A folk theorem for LLMs proves that all feasible and individually rational outcomes can be sustained as ε-equilibria in repeated games where LLMs advise client populations, despite indirect observation.
Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems cs.AI · 2026-04-23 · unverdicted · none · ref 59
DiffMAS jointly optimizes latent communication and reasoning in multi-agent LLM systems via parameter-efficient supervised training on trajectories, yielding consistent gains over baselines on math, science, and code benchmarks.
The Semantic Training Gap: Ontology-Grounded Tool Architectures for Industrial AI Agent Systems cs.AI · 2026-05-11 · unverdicted · none · ref 4
Ontology-grounded tool architectures eliminate hallucination of domain identifiers in industrial AI agents by enforcing semantic constraints through a typed relational configuration and three-operation interface.
Unpredictability dissociates from structured control in language agents cs.AI · 2026-05-10 · unverdicted · none · ref 23
Stochastic unpredictability does not reproduce structured action-coupled control in language agents, as lesioning reason and veto components reduces structured profiles while high-stochasticity variants remain distinct from structured controls in 7/7 datasets.
Heterogeneous Scientific Foundation Model Collaboration cs.AI · 2026-04-30 · unverdicted · none · ref 33
Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.
JTPRO: A Joint Tool-Prompt Reflective Optimization Framework for Language Agents cs.AI · 2026-04-20 · unverdicted · none · ref 26
JTPRO co-optimizes prompts and tool descriptions via reflection to raise overall success rate by 5-20% over baselines on multi-tool benchmarks.
Coding-Free and Privacy-Preserving Agentic Framework for Data-Driven Clinical Research cs.CL · 2026-04-14 · unverdicted · none · ref 7
CARIS is a new agentic LLM framework that automates clinical research workflows from planning to reporting in a coding-free and privacy-preserving manner, achieving high completeness scores on heterogeneous datasets.
Agentic Federated Learning: The Future of Distributed Training Orchestration cs.MA · 2026-04-06 · unverdicted · none · ref 4
Agentic-FL introduces language model agents for autonomous orchestration in federated learning to address client heterogeneity and dynamic conditions.
Impact of Task Phrasing on Presumptions in Large Language Models cs.CL · 2026-05-01 · unverdicted · none · ref 6
LLMs show susceptibility to presumptions induced by task phrasing in decision tasks like the iterated prisoner's dilemma, mitigated by neutral wording.
Building an Internal Coding Agent at Zup: Lessons and Open Questions cs.SE · 2026-04-10 · unverdicted · none · ref 11
Engineering choices for tools, safety guardrails, and human oversight determine whether an internal coding agent delivers value in practice more than the underlying model quality.

A survey on large language model based autonomous agents

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer