hub

The Rise and Potential of Large Language Model Based Agents: A Survey

Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong · 2023 · cs.AI · arXiv 2309.07864

46 Pith papers cite this work. Polarity classification is still indexing.

46 Pith papers citing it

open full Pith review browse 46 citing papers arXiv PDF

abstract

For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training strategies to enhance specific capabilities or performance on particular tasks. Actually, what the community lacks is a general and powerful model to serve as a starting point for designing AI agents that can adapt to diverse scenarios. Due to the versatile capabilities they demonstrate, large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI), offering hope for building general AI agents. Many researchers have leveraged LLMs as the foundation to build AI agents and have achieved significant progress. In this paper, we perform a comprehensive survey on LLM-based agents. We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. Building upon this, we present a general framework for LLM-based agents, comprising three main components: brain, perception, and action, and the framework can be tailored for different applications. Subsequently, we explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation. Following this, we delve into agent societies, exploring the behavior and personality of LLM-based agents, the social phenomena that emerge from an agent society, and the insights they offer for human society. Finally, we discuss several key topics and open problems within the field. A repository for the related papers at https://github.com/WooooDyy/LLM-Agent-Paper-List.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

claims ledger

abstract For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training strategies to enhance specific capabilities or performance on particular tasks. Actually, what the community lacks is a general and powerful model to serve as a starting point for designing AI agents that

co-cited works

representative citing papers

Revisable by Design: A Theory of Streaming LLM Agent Execution

cs.LG · 2026-04-25 · unverdicted · novelty 8.0

LLM agents achieve greater flexibility during execution by classifying actions via a reversibility taxonomy and using an Earliest-Conflict Rollback algorithm that matches full-restart quality while wasting far less completed work.

MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

MAD-OPD recasts on-policy distillation teachers as a debating collective to supply better supervision, lifting agentic and code performance over single-teacher OPD across multiple model sizes.

OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory

cs.CL · 2026-04-29 · unverdicted · novelty 7.0

OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

Dr.Sai: An agentic AI for real-world physics analysis at BESIII

hep-ex · 2026-04-24 · unverdicted · novelty 7.0

Dr.Sai autonomously executed full physics analysis pipelines on real BESIII data to re-measure ten J/psi decay branching fractions, matching established benchmarks without any manual coding.

Feedback-Driven Execution for LLM-Based Binary Analysis

cs.CR · 2026-04-16 · unverdicted · novelty 7.0

FORGE uses a reasoning-action-observation loop and Dynamic Forest of Agents to perform scalable LLM-based binary analysis, finding 1,274 vulnerabilities across 591 of 3,457 real-world firmware binaries at 72.3% precision and broader coverage than prior methods.

SAGE: A Service Agent Graph-guided Evaluation Benchmark

cs.AI · 2026-04-10 · unverdicted · novelty 7.0

SAGE is a new multi-agent benchmark that formalizes service SOPs as dynamic dialogue graphs to measure LLM agents on logical compliance and path coverage, uncovering an execution gap and empathy resilience across 27 models in 6 scenarios.

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

cs.CR · 2025-03-30 · unverdicted · novelty 7.0

MCP lifecycle is defined with four phases and 16 activities; a threat taxonomy of 16 scenarios is constructed, validated via case studies, and paired with phase-specific safeguards.

CHAL: Council of Hierarchical Agentic Language

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.

From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

A practical evaluation protocol for AI pentesting agents that uses validated vulnerability discovery, LLM semantic matching, and bipartite scoring to assess performance in realistic, complex targets.

OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.

Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem

cs.SE · 2026-05-08 · unverdicted · novelty 6.0

MCP-BiFlow detects 93.8% of known bidirectional data-flow vulnerabilities in MCP servers and identifies 118 confirmed issues across 87 real-world servers from a scan of 15,452 repositories.

SOD: Step-wise On-policy Distillation for Small Language Model Agents

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.

LoopTrap: Termination Poisoning Attacks on LLM Agents

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

LoopTrap is an automated red-teaming framework that crafts termination-poisoning prompts to amplify LLM agent steps by 3.57x on average (up to 25x) across 8 agents.

Agentic Retrieval-Augmented Generation for Financial Document Question Answering

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9.32 points.

Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systems

cs.AI · 2026-05-03 · unverdicted · novelty 6.0 · 2 refs

A foresight-based local purification method simulates future agent interactions, detects infections via response diversity across personas, and applies targeted rollback or recursive diagnosis to cut maximum infection rates from over 95% to under 5.47% while preserving benign performance.

From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills

cs.CL · 2026-04-27 · unverdicted · novelty 6.0

SSL representation disentangles skill scheduling, structure, and logic using an LLM normalizer, improving skill discovery MRR@50 from 0.649 to 0.729 and risk assessment macro F1 from 0.409 to 0.509 over text baselines.

LLM-Steered Power Allocation for Parallel QPSK-AWGN Channels

cs.IT · 2026-04-23 · unverdicted · novelty 6.0

LLM interprets natural-language policies to steer a projected-gradient power allocator in 8 parallel QPSK-AWGN channels, producing policy-dependent allocations and 60% lower mutual-information spread after abrupt channel reversals compared with the optimizer alone.

When Agents Go Quiet: Output Generation Capacity and Format-Cost Separation for LLM Document Synthesis

cs.AI · 2026-04-17 · unverdicted · novelty 6.0

LLM agents avoid output stalling and reduce generation tokens by 48-72% via deferred template rendering guided by Output Generation Capacity and a Format-Cost Separation Theorem.

Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models

cs.AI · 2026-04-14 · unverdicted · novelty 6.0

Large language models display three universal scale-dependent regimes of behavior—stable, chaotic, and signal-dominated—driven by floating-point rounding errors that produce an avalanche effect in early layers.

Policy-Invisible Violations in LLM-Based Agents

cs.AI · 2026-04-14 · unverdicted · novelty 6.0

LLM agents commit policy-invisible violations when policy facts are hidden from their context; a graph-simulation enforcer reaches 93% accuracy vs 68.8% for content-only baselines on a new 600-trace benchmark.

Generative AI Agent Empowered Power Allocation for HAP Propulsion and Communication Systems

cs.NI · 2026-04-10 · unverdicted · novelty 6.0

A generative AI agent creates a realistic HAP propulsion power model including aerodynamic interference and enables a Q3E beamforming algorithm that improves QoS and energy efficiency.

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

cs.CR · 2026-04-06 · conditional · novelty 6.0

Poisoning any single CIK dimension of an AI agent raises average attack success rate from 24.6% to 64-74% across models, and tested defenses leave substantial residual risk.

Agentless: Demystifying LLM-based Software Engineering Agents

cs.SE · 2024-07-01 · conditional · novelty 6.0

Agentless, a basic three-phase LLM pipeline for bug localization, repair, and validation, outperforms complex open-source agents on SWE-bench Lite with 32% success rate at $0.70 cost.

citing papers explorer

Showing 46 of 46 citing papers.

Revisable by Design: A Theory of Streaming LLM Agent Execution cs.LG · 2026-04-25 · unverdicted · none · ref 11 · internal anchor
LLM agents achieve greater flexibility during execution by classifying actions via a reversibility taxonomy and using an Earliest-Conflict Rollback algorithm that matches full-restart quality while wasting far less completed work.
MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate cs.CL · 2026-05-02 · unverdicted · none · ref 35 · internal anchor
MAD-OPD recasts on-policy distillation teachers as a debating collective to supply better supervision, lifting agentic and code performance over single-teacher OPD across multiple model sizes.
OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory cs.CL · 2026-04-29 · unverdicted · none · ref 23 · internal anchor
OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework cs.CR · 2026-04-25 · unverdicted · none · ref 9 · internal anchor
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Dr.Sai: An agentic AI for real-world physics analysis at BESIII hep-ex · 2026-04-24 · unverdicted · none · ref 8 · internal anchor
Dr.Sai autonomously executed full physics analysis pipelines on real BESIII data to re-measure ten J/psi decay branching fractions, matching established benchmarks without any manual coding.
Feedback-Driven Execution for LLM-Based Binary Analysis cs.CR · 2026-04-16 · unverdicted · none · ref 46 · internal anchor
FORGE uses a reasoning-action-observation loop and Dynamic Forest of Agents to perform scalable LLM-based binary analysis, finding 1,274 vulnerabilities across 591 of 3,457 real-world firmware binaries at 72.3% precision and broader coverage than prior methods.
SAGE: A Service Agent Graph-guided Evaluation Benchmark cs.AI · 2026-04-10 · unverdicted · none · ref 55 · internal anchor
SAGE is a new multi-agent benchmark that formalizes service SOPs as dynamic dialogue graphs to measure LLM agents on logical compliance and path coverage, uncovering an execution gap and empathy resilience across 27 models in 6 scenarios.
Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions cs.CR · 2025-03-30 · unverdicted · none · ref 72 · internal anchor
MCP lifecycle is defined with four phases and 16 activities; a threat taxonomy of 16 scenarios is constructed, validated via case studies, and paired with phase-specific safeguards.
CHAL: Council of Hierarchical Agentic Language cs.AI · 2026-05-12 · unverdicted · none · ref 170 · internal anchor
CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.
From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World cs.AI · 2026-05-11 · unverdicted · none · ref 43 · internal anchor
A practical evaluation protocol for AI pentesting agents that uses validated vulnerability discovery, LLM semantic matching, and bipartite scoring to assess performance in realistic, complex targets.
OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces cs.AI · 2026-05-09 · unverdicted · none · ref 43 · internal anchor
OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.
Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem cs.SE · 2026-05-08 · unverdicted · none · ref 60 · internal anchor
MCP-BiFlow detects 93.8% of known bidirectional data-flow vulnerabilities in MCP servers and identifies 118 confirmed issues across 87 real-world servers from a scan of 15,452 repositories.
SOD: Step-wise On-policy Distillation for Small Language Model Agents cs.CL · 2026-05-08 · unverdicted · none · ref 1 · internal anchor
SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.
LoopTrap: Termination Poisoning Attacks on LLM Agents cs.CR · 2026-05-07 · unverdicted · none · ref 47 · internal anchor
LoopTrap is an automated red-teaming framework that crafts termination-poisoning prompts to amplify LLM agent steps by 3.57x on average (up to 25x) across 8 agents.
Agentic Retrieval-Augmented Generation for Financial Document Question Answering cs.AI · 2026-05-06 · unverdicted · none · ref 34 · internal anchor
FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9.32 points.
Catching the Infection Before It Spreads: Foresight-Guided Defense in Multi-Agent Systems cs.AI · 2026-05-03 · unverdicted · none · ref 48 · 2 links · internal anchor
A foresight-based local purification method simulates future agent interactions, detects infections via response diversity across personas, and applies targeted rollback or recursive diagnosis to cut maximum infection rates from over 95% to under 5.47% while preserving benign performance.
From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills cs.CL · 2026-04-27 · unverdicted · none · ref 29 · internal anchor
SSL representation disentangles skill scheduling, structure, and logic using an LLM normalizer, improving skill discovery MRR@50 from 0.649 to 0.729 and risk assessment macro F1 from 0.409 to 0.509 over text baselines.
LLM-Steered Power Allocation for Parallel QPSK-AWGN Channels cs.IT · 2026-04-23 · unverdicted · none · ref 1 · internal anchor
LLM interprets natural-language policies to steer a projected-gradient power allocator in 8 parallel QPSK-AWGN channels, producing policy-dependent allocations and 60% lower mutual-information spread after abrupt channel reversals compared with the optimizer alone.
When Agents Go Quiet: Output Generation Capacity and Format-Cost Separation for LLM Document Synthesis cs.AI · 2026-04-17 · unverdicted · none · ref 32 · internal anchor
LLM agents avoid output stalling and reduce generation tokens by 48-72% via deferred template rendering guided by Output Generation Capacity and a Format-Cost Separation Theorem.
Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models cs.AI · 2026-04-14 · unverdicted · none · ref 2 · internal anchor
Large language models display three universal scale-dependent regimes of behavior—stable, chaotic, and signal-dominated—driven by floating-point rounding errors that produce an avalanche effect in early layers.
Policy-Invisible Violations in LLM-Based Agents cs.AI · 2026-04-14 · unverdicted · none · ref 11 · internal anchor
LLM agents commit policy-invisible violations when policy facts are hidden from their context; a graph-simulation enforcer reaches 93% accuracy vs 68.8% for content-only baselines on a new 600-trace benchmark.
Generative AI Agent Empowered Power Allocation for HAP Propulsion and Communication Systems cs.NI · 2026-04-10 · unverdicted · none · ref 29 · internal anchor
A generative AI agent creates a realistic HAP propulsion power model including aerodynamic interference and enables a Q3E beamforming algorithm that improves QoS and energy efficiency.
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw cs.CR · 2026-04-06 · conditional · none · ref 18 · internal anchor
Poisoning any single CIK dimension of an AI agent raises average attack success rate from 24.6% to 64-74% across models, and tested defenses leave substantial residual risk.
Agentless: Demystifying LLM-based Software Engineering Agents cs.SE · 2024-07-01 · conditional · none · ref 97 · internal anchor
Agentless, a basic three-phase LLM pipeline for bug localization, repair, and validation, outperforms complex open-source agents on SWE-bench Lite with 32% success rate at $0.70 cost.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code cs.SE · 2024-03-12 · unverdicted · none · ref 196 · internal anchor
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
The Semantic Training Gap: Ontology-Grounded Tool Architectures for Industrial AI Agent Systems cs.AI · 2026-05-11 · unverdicted · none · ref 5 · internal anchor
Ontology-grounded tool architectures eliminate hallucination of domain identifiers in industrial AI agents by enforcing semantic constraints through a typed relational configuration and three-operation interface.
Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization cs.AI · 2026-05-02 · unverdicted · none · ref 35 · internal anchor
SCM-GRPO grounds multi-hop fact verification in structural causal models and applies GRPO reinforcement learning to optimize reasoning chain length, outperforming baselines on HoVer and EX-FEVER.
Think Before You Act -- A Neurocognitive Governance Model for Autonomous AI Agents cs.AI · 2026-04-28 · unverdicted · none · ref 5 · internal anchor
A neurocognitive governance model formalizes a Pre-Action Governance Reasoning Loop that consults global, workflow, agent, and situational rules before each action, yielding 95% compliance accuracy with zero false escalations in a retail supply-chain implementation.
KISS Sorcar: A Stupidly-Simple General-Purpose and Software Engineering AI Assistant cs.SE · 2026-04-26 · unverdicted · none · ref 23 · internal anchor
KISS Sorcar introduces a simple layered agent framework and VS Code IDE that reaches 62.2% pass rate on Terminal Bench 2.0 by combining ReAct execution, summarization-based continuation, parallel tools, persistent history, and git worktree isolation while self-validating outputs.
ARMove: Learning to Predict Human Mobility through Agentic Reasoning cs.MA · 2026-04-19 · unverdicted · none · ref 42 · internal anchor
ARMove is a transferable framework for human mobility prediction that combines agentic LLM reasoning, feature management, and large-small model synergy to outperform baselines on several metrics while improving interpretability and robustness.
Safe and Policy-Compliant Multi-Agent Orchestration for Enterprise AI cs.AI · 2026-04-19 · unverdicted · none · ref 7 · internal anchor
CAMCO enforces policy constraints on multi-agent AI at deployment time via convex projection, risk-weighted Lagrangian shaping, and bounded-convergence negotiation, yielding zero violations and 92-97% utility in tested enterprise scenarios.
Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents cs.AI · 2026-04-16 · unverdicted · none · ref 24 · 2 links · internal anchor
Persistent self-modifying AI agents exhibit compositional drift from mismatches across five mutability layers, with governance difficulty rising under rapid mutation, strong coupling, weak reversibility, and low observability, as indicated by a 0.68 identity hysteresis ratio in a preliminary ratchet
Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis cs.AI · 2026-04-12 · unverdicted · none · ref 30 · internal anchor
Agent Mentor analyzes semantic trajectories in agent logs to identify undesired behaviors and derives corrective prompt instructions, yielding measurable accuracy gains on benchmark tasks across three agent setups.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering cs.SE · 2026-04-09 · accept · none · ref 163 · internal anchor
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent cs.LG · 2026-04-07 · unverdicted · none · ref 27 · internal anchor
AgentOpt introduces a framework-agnostic package that uses algorithms like UCB-E to find cost-effective model assignments in multi-step LLM agent pipelines, cutting evaluation budgets by 62-76% while maintaining near-optimal accuracy on benchmarks.
LanG -- A Governance-Aware Agentic AI Platform for Unified Security Operations cs.CR · 2026-04-07 · unverdicted · none · ref 14 · internal anchor
LanG presents a governance-aware agentic AI platform for unified security operations that reports strong performance on incident correlation, rule generation, attack reconstruction, and AI safety guardrails in an open-source package.
AIVV: Neuro-Symbolic LLM Agent-Integrated Verification and Validation for Trustworthy Autonomous Systems cs.AI · 2026-04-02 · unverdicted · none · ref 14 · internal anchor
AIVV deploys LLM agents in a council to semantically validate anomalies in time-series data against natural-language requirements, automating human-in-the-loop verification for autonomous systems.
EconAI: Dynamic Persona Evolution and Memory-Aware Agents in Evolving Economic Environments cs.MA · 2026-05-13 · unverdicted · none · ref 6 · internal anchor
EconAI adds memory weighting and economic sentiment indexing to LLM agents so they adapt short-term actions to long-term goals inside a single macro/micro simulation loop.
A Multi-Agent Orchestration Framework for Venture Capital Due Diligence cs.MA · 2026-05-13 · unverdicted · none · ref 2 · internal anchor
A multi-agent orchestration framework automates VC due diligence using LLMs, web retrieval, and a programmatic pipeline to extract and parse official Greek business registry filings while flagging data gaps.
From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms cs.AI · 2026-05-07 · unverdicted · none · ref 20 · internal anchor
LLM agent memory is organized into Storage (preserving trajectories), Reflection (refining them), and Experience (abstracting into reusable knowledge) stages driven by needs for long-range consistency, dynamic adaptation, and continual learning.
Vibe Medicine: Redefining Biomedical Research Through Human-AI Co-Work cs.AI · 2026-04-26 · unverdicted · none · ref 123 · internal anchor
Vibe Medicine proposes directing AI agents via natural language for end-to-end biomedical workflows using LLMs, agent frameworks, and a curated collection of over 1,000 medical skills.
Governance-Aware Agent Telemetry for Closed-Loop Enforcement in Multi-Agent AI Systems cs.MA · 2026-04-06 · unverdicted · none · ref 18 · internal anchor
GAAT is a proposed architecture extending OpenTelemetry with governance schemas, OPA-based detection, graduated enforcement, and trusted provenance to close the observe-but-do-not-act gap in multi-agent systems.
Multi-Agent Collaboration Mechanisms: A Survey of LLMs cs.AI · 2025-01-10 · unverdicted · none · ref 136 · internal anchor
The survey organizes LLM-based multi-agent collaboration mechanisms into a framework with dimensions of actors, types, structures, strategies, and coordination protocols, reviews applications across domains, and identifies challenges for future research.
A Survey on Large Language Models for Code Generation cs.CL · 2024-06-01 · unverdicted · none · ref 291 · internal anchor
A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.
Large Language Models: A Survey cs.CL · 2024-02-09 · accept · none · ref 174 · internal anchor
The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.
Multi-Dimensional Behavioral Evaluation of Agentic Stock Prediction Systems Using Large Language Model Judges with Closed-Loop Reinforcement Learning Feedback cs.LG · 2026-05-07 · unreviewed · ref 8 · internal anchor

The Rise and Potential of Large Language Model Based Agents: A Survey

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer