The authors create the first large-scale dataset and taxonomy of failure modes in multi-agent LLM systems to explain their limited performance gains.
hub Canonical reference
arXiv preprint arXiv:2307.02485 , year=
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 6polarities
background 6representative citing papers
VIGA introduces a training-free interleaved multimodal reasoning loop that improves vision-as-inverse-graphics accuracy over one-shot baselines on BlenderGym, SlideBench, and new BlenderBench.
AgentComm achieves nearly 50% bandwidth reduction in embodied agent communication via LLM semantic processing, importance-aware transmission, and a task knowledge base, with negligible impact on task completion.
A principled reward design for tool selection and application in RL-trained LLMs delivers 17% gains over base models and 15% over SFT across benchmarks.
A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future directions.
Gated escalation and partitioned states enable more efficient multi-agent collaboration in Minecraft by making communication selective rather than automatic.
In real human subjects, AI transparency impacts imperfectly cooperative interactions far more than personality traits, unlike simulations where both are comparably influential.
RoboAgent chains basic vision-language capabilities inside a single VLM via a scheduler and trains it in three stages (behavior cloning, DAgger, RL) to improve embodied task planning.
CoEnv introduces a compositional environment that integrates real and simulated spaces for multi-agent robotic collaboration, using real-to-sim reconstruction, VLM action synthesis, and validated sim-to-real transfer to achieve high success rates on multi-arm manipulation tasks.
ROSClaw is a hierarchical framework that unifies vision-language model control with e-URDF-based sim-to-real mapping and closed-loop data collection to enable semantic-physical collaboration among heterogeneous multi-agent robots.
A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.
The paper surveys LLM-based multi-agent systems, covering simulated domains, agent profiling and communication, mechanisms for capacity growth, and common benchmarks.
The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
citing papers explorer
-
Why Do Multi-Agent LLM Systems Fail?
The authors create the first large-scale dataset and taxonomy of failure modes in multi-agent LLM systems to explain their limited performance gains.
-
A Survey on Large Language Model based Autonomous Agents
A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future directions.
-
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review
A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.
-
Agent AI: Surveying the Horizons of Multimodal Interaction
The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.
-
The Rise and Potential of Large Language Model Based Agents: A Survey
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.