hub Canonical reference

ChatDev: Communicative Agents for Software Development

Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li · 2023 · cs.SE · arXiv 2307.07924

Canonical reference. 100% of citing Pith papers cite this work as background.

61 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 61 citing papers arXiv PDF

abstract

Software development is a complex task that necessitates cooperation among multiple members with diverse skills. Numerous studies used deep learning to improve specific phases in a waterfall model, such as design, coding, and testing. However, the deep learning model in each phase requires unique designs, leading to technical inconsistencies across various phases, which results in a fragmented and ineffective development process. In this paper, we introduce ChatDev, a chat-powered software development framework in which specialized agents driven by large language models (LLMs) are guided in what to communicate (via chat chain) and how to communicate (via communicative dehallucination). These agents actively contribute to the design, coding, and testing phases through unified language-based communication, with solutions derived from their multi-turn dialogues. We found their utilization of natural language is advantageous for system design, and communicating in programming language proves helpful in debugging. This paradigm demonstrates how linguistic communication facilitates multi-agent collaboration, establishing language as a unifying bridge for autonomous task-solving among LLM agents. The code and data are available at https://github.com/OpenBMB/ChatDev.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 16 method 1

citation-polarity summary

background 17

representative citing papers

Why Do Multi-Agent LLM Systems Fail?

cs.AI · 2025-03-17 · unverdicted · novelty 8.0

The authors create the first large-scale dataset and taxonomy of failure modes in multi-agent LLM systems to explain their limited performance gains.

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

cs.MA · 2024-10-09 · unverdicted · novelty 8.0 · 2 refs

Prompt injection attacks can self-replicate across LLM agents in multi-agent systems, enabling data theft, misinformation, and system disruption while propagating silently.

From Prompt to Process: a Process Taxonomy and Comparative Assessment of Frameworks Supporting AI Software Development Agents

cs.SE · 2026-06-03 · conditional · novelty 7.0

A new six-dimension process taxonomy for AI software development frameworks shows convergence on artifact persistence and human oversight but reveals that no framework covers all dimensions strongly, indicating a depth-portability trade-off.

Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems

cs.AI · 2026-05-22 · unverdicted · novelty 7.0

IDS is an agentic LLM system that incrementally synthesizes both implementation and proof for distributed key-value stores, succeeding on all 7 specs where prior agents succeeded on only 2.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

Social Bias in LLM-Generated Code: Benchmark and Mitigation

cs.SE · 2026-05-01 · unverdicted · novelty 7.0

LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.

FineState-Bench: Benchmarking State-Conditioned Grounding for Fine-grained GUI State Setting

cs.CV · 2026-04-30 · unverdicted · novelty 7.0

FineState-Bench and FineState-Metrics show LVLMs achieve only 22.8% average exact-state success in GUI interactions, with visual diagnostic hints improving results by up to 14.9 points.

Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves

cs.SE · 2026-04-29 · unverdicted · novelty 7.0

Comet-H orchestrates LLMs via deficit-scoring prompt selection and half-life task tracking to co-evolve research software components, demonstrated by a static analysis tool reaching F1=0.768 versus a 0.364 baseline.

PhysCodeBench: Benchmarking Physics-Aware Symbolic Simulation of 3D Scenes via Self-Corrective Multi-Agent Refinement

cs.RO · 2026-04-26 · unverdicted · novelty 7.0

PhysCodeBench benchmark and SMRF multi-agent framework enable better AI generation of physically accurate 3D simulation code, boosting performance by 31 points over baselines.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

ClawNet: Human-Symbiotic Agent Network for Cross-User Autonomous Cooperation

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

ClawNet digitizes human collaborative relationships into a network of identity-governed AI agents that collaborate on behalf of their owners through a central orchestrator enforcing binding and verification.

Detecting Multi-Agent Collusion Through Multi-Agent Interpretability

cs.AI · 2026-04-01 · conditional · novelty 7.0

NARCBench and five activation-probing methods detect multi-agent collusion with 0.73-1.00 AUROC across distribution shifts and steganographic tasks by aggregating per-agent signals.

Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?

cs.AI · 2026-03-26 · conditional · novelty 7.0

An agent factory combining sub-kernel ILP assembly with multi-agent cross-optimization lets general coding agents deliver mean 8.27x speedups in HLS designs on standard benchmarks.

Efficient Remote KV Cache Reuse with GPU-native Video Codec

cs.DC · 2026-02-10 · conditional · novelty 7.0

KVCodec uses GPU-native video codecs and pipelined fetching to compress and transmit KV caches, delivering up to 3.51x faster TTFT than prior methods while preserving accuracy.

Emergent Coordination in Multi-Agent Language Models

cs.MA · 2025-10-05 · unverdicted · novelty 7.0

Multi-agent LLM systems can be steered via prompt design from mere aggregates to higher-order collectives with identity-linked differentiation and goal-directed complementarity, as measured by partial information decomposition of time-delayed mutual information.

Automated Design of Agentic Systems

cs.AI · 2024-08-15 · conditional · novelty 7.0

Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

cs.CL · 2023-12-20 · accept · novelty 7.0

A three-agent loop of code generation, test creation, and execution feedback lifts pass@1 to 96.3% on HumanEval and 91.8% on MBPP for GPT-4 while using roughly half the tokens of prior state-of-the-art.

Natural Language Query to Configuration for Retrieval Agents

cs.AI · 2026-05-26 · unverdicted · novelty 6.0

BRANE maps queries to optimal retrieval pipeline configurations using LLM-derived features and per-configuration correctness predictors, improving the cost-quality Pareto frontier on three benchmarks.

Swarm Skills: A Portable, Self-Evolving Multi-Agent System Specification for Coordination Engineering

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

Swarm Skills is a portable multi-agent coordination specification with roles, workflows, bounds, and a self-evolution algorithm that distills trajectories using Effectiveness, Utilization, and Freshness scores for zero-adapter portability.

CTM-AI: A Blueprint for General AI Inspired by a Model of Consciousness

q-bio.NC · 2026-04-30 · unverdicted · novelty 6.0

CTM-AI combines a formal consciousness model with foundation models to report state-of-the-art results on sarcasm detection, humor, and agentic tool-use benchmarks.

CreativeGame:Toward Mechanic-Aware Creative Game Generation

cs.AI · 2026-04-21 · unverdicted · novelty 6.0

CreativeGame enables iterative HTML5 game generation via mechanic-guided planning, lineage memory, runtime validation, and programmatic rewards to produce inspectable version-to-version mechanic evolution.

CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

cs.CR · 2026-04-05 · unverdicted · novelty 6.0

CoopGuard deploys cooperative agents to track conversation history and counter evolving multi-round attacks on LLMs, achieving a 78.9% reduction in attack success rate on a new 5,200-sample benchmark.

Towards Automated Crowdsourced Testing via Personified-LLM

cs.SE · 2026-03-25 · unverdicted · novelty 6.0

PersonaTester uses LLMs guided by three-dimensional personas to replicate crowdworker testing patterns, yielding higher behavioral consistency, variability, and more bug detections than baseline LLM agents.

NOMAD: A Multi-Agent LLM System for UML Class Diagram Generation from Natural Language Requirements

cs.SE · 2025-11-27 · unverdicted · novelty 6.0

NOMAD decomposes UML class diagram creation into a multi-agent LLM workflow that outperforms baselines on a Northwind case study and human exercises while introducing a taxonomy of structural, relationship, and semantic errors.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models cs.AI · 2025-01-16 · unverdicted · none · ref 108 · internal anchor
The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.

ChatDev: Communicative Agents for Software Development

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer