hub Canonical reference

Autogen: Enabling next-gen llm applications via multi-agent conversations

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al · 2024

Canonical reference. 94% of citing Pith papers cite this work as background.

26 Pith papers citing it

Background 94% of classified citations

browse 26 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 15 other 1

citation-polarity summary

background 15 unclear 1

representative citing papers

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

cs.AI · 2026-05-10 · accept · novelty 8.0 · 2 refs

SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.

Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems

cs.AI · 2026-05-22 · unverdicted · novelty 7.0

IDS is an agentic LLM system that incrementally synthesizes both implementation and proof for distributed key-value stores, succeeding on all 7 specs where prior agents succeeded on only 2.

AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

AstraFlow decouples RL components into autonomous dataflow services to natively support multi-policy agentic LLM training, elastic scaling, and cross-region execution with 2.7x speedup on math, code, search, and AgentBench workloads.

FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems

cs.CR · 2026-05-12 · unverdicted · novelty 7.0

FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.

EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.

AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems

cs.CL · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

AgentForesight introduces an online auditor model that predicts decisive errors in multi-agent trajectories at the earliest step using a coarse-to-fine reinforcement learning recipe on a new curated dataset AFTraj-2K.

Evolving Idea Graphs with Learnable Edits-and-Commits for Multi-Agent Scientific Ideation

cs.MA · 2026-05-06 · unverdicted · novelty 7.0

EIG represents research ideas as evolving graphs with nodes for claims and edges for relations, using a learned controller for edits and commits to produce higher-quality scientific proposals than text-only multi-agent baselines.

ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

cs.CV · 2026-04-26 · unverdicted · novelty 7.0

ClawMark is a new benchmark for multi-turn multi-day multimodal coworker agents in stateful evolving services, with deterministic Python checkers showing frontier models achieve only 20% strict task success.

SoK: Blockchain Agent-to-Agent Payments

q-fin.GN · 2026-04-04 · unverdicted · novelty 7.0

The first systematization of blockchain-based agent-to-agent payments organizes designs into discovery, authorization, execution, and accounting stages while identifying trust and security gaps.

Self-Evolving Multi-Agent Systems via Decentralized Memory

cs.MA · 2026-05-21 · unverdicted · novelty 6.0

DecentMem is a decentralized dual-pool memory framework for self-evolving multi-agent systems that provides O(log T) regret guarantees and yields up to 23.8% accuracy gains over centralized baselines.

RMA: an Agentic System for Research-Level Mathematical Problems

cs.AI · 2026-05-20 · unverdicted · novelty 6.0

RMA, a multi-agent system with structured memory and iterative feedback loops, solves 8 out of 10 research-level math problems on the new First Proof benchmark and outperforms GPT-5.2R and Aletheia according to expert evaluation.

LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

LEMON trains an LLM orchestrator with counterfactual-augmented GRPO to produce deployable multi-agent specifications that reach state-of-the-art results on six reasoning and coding benchmarks.

STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning

cs.AI · 2026-05-11 · unverdicted · novelty 6.0 · 3 refs

STAR presents a failure-aware routing framework using a state-conditioned transition policy and an agent routing matrix combining expert routes with learned recoveries from execution traces to improve multi-agent spatiotemporal reasoning.

The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.

Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

A critique-and-routing controller cast as a finite-horizon MDP with policy-gradient optimization outperforms one-shot routing baselines on reasoning benchmarks while using the strongest agent for under 25% of calls.

HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

HMACE deploys Proposer, Generator, Evaluator, and Reflector agents in an evolutionary loop to generate and refine heuristics for NP-hard problems, reporting lower optimality gaps and token costs than baselines on TSP and Online BPP.

FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

FAME models scientific topic trajectories in continuous time to forecast paper impact more accurately than LLMs by aligning manuscripts with field momentum in a dynamic latent space.

Do Agent Societies Develop Intellectual Elites? The Hidden Power Laws of Collective Cognition in LLM Multi-Agent Systems

cs.MA · 2026-04-03 · unverdicted · novelty 6.0

LLM agent societies develop power-law coordination cascades and intellectual elites through an integration bottleneck that grows with system size.

Code as Agent Harness

cs.CL · 2026-05-18 · accept · novelty 5.0

A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.

OptiMat Alloys: a FAIR, living database of multi-principal element alloys enabled by a conversational agent

cond-mat.mtrl-sci · 2026-04-23 · unverdicted · novelty 5.0

OptiMat Alloys is a conversational AI system that maintains a living FAIR database of multi-principal element alloy calculations and enables natural-language, on-demand computations with built-in uncertainty checks.

"Theater of Mind" for LLMs: A Cognitive Architecture Based on Global Workspace Theory

cs.MA · 2026-04-09 · unverdicted · novelty 5.0

Global Workspace Agents (GWA) is proposed as an active, event-driven cognitive architecture for LLMs featuring an entropy-based intrinsic drive and dual-layer memory to enable sustained self-directed agency.

Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On

cs.AI · 2026-05-18 · unverdicted · novelty 4.0

Argues that trustworthiness in Agent-to-Agent networks requires a new conceptual framework with four design pillars baked in from the beginning, as retrofitting existing single-agent methods is insufficient.

Social Theory Should Be a Structural Prior for Agentic AI: A Formal Framework for Multi-Agent Social Systems

cs.MA · 2026-05-08 · unverdicted · novelty 4.0 · 3 refs

Agentic AI needs social theory as structural priors in the MASS framework to model emergent dynamics from multi-agent interactions.

Skills-Coach: A Self-Evolving Skill Optimizer via Training-Free GRPO

cs.CL · 2026-04-30 · unverdicted · novelty 3.0

Skills-Coach optimizes LLM agent skills via task generation, prompt/code tuning, comparative execution, and traceable evaluation, reporting gains on a 48-skill benchmark called Skill-X.

citing papers explorer

Showing 26 of 26 citing papers.

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning cs.AI · 2026-05-10 · accept · none · ref 87 · 2 links
SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.
Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems cs.AI · 2026-05-22 · unverdicted · none · ref 56
IDS is an agentic LLM system that incrementally synthesizes both implementation and proof for distributed key-value stores, succeeding on all 7 specs where prior agents succeeded on only 2.
AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs cs.LG · 2026-05-15 · unverdicted · none · ref 35
AstraFlow decouples RL components into autonomous dataflow services to natively support multi-policy agentic LLM training, elastic scaling, and cross-region execution with 2.7x speedup on math, code, search, and AgentBench workloads.
FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems cs.CR · 2026-05-12 · unverdicted · none · ref 56
FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.
EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium cs.AI · 2026-05-10 · unverdicted · none · ref 79
EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.
AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems cs.CL · 2026-05-09 · unverdicted · none · ref 55 · 2 links
AgentForesight introduces an online auditor model that predicts decisive errors in multi-agent trajectories at the earliest step using a coarse-to-fine reinforcement learning recipe on a new curated dataset AFTraj-2K.
Evolving Idea Graphs with Learnable Edits-and-Commits for Multi-Agent Scientific Ideation cs.MA · 2026-05-06 · unverdicted · none · ref 14
EIG represents research ideas as evolving graphs with nodes for claims and edges for relations, using a learned controller for edits and commits to produce higher-quality scientific proposals than text-only multi-agent baselines.
ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents cs.CV · 2026-04-26 · unverdicted · none · ref 18
ClawMark is a new benchmark for multi-turn multi-day multimodal coworker agents in stateful evolving services, with deterministic Python checkers showing frontier models achieve only 20% strict task success.
SoK: Blockchain Agent-to-Agent Payments q-fin.GN · 2026-04-04 · unverdicted · none · ref 1
The first systematization of blockchain-based agent-to-agent payments organizes designs into discovery, authorization, execution, and accounting stages while identifying trust and security gaps.
Self-Evolving Multi-Agent Systems via Decentralized Memory cs.MA · 2026-05-21 · unverdicted · none · ref 15
DecentMem is a decentralized dual-pool memory framework for self-evolving multi-agent systems that provides O(log T) regret guarantees and yields up to 23.8% accuracy gains over centralized baselines.
RMA: an Agentic System for Research-Level Mathematical Problems cs.AI · 2026-05-20 · unverdicted · none · ref 54
RMA, a multi-agent system with structured memory and iterative feedback loops, solves 8 out of 10 research-level math problems on the new First Proof benchmark and outperforms GPT-5.2R and Aletheia according to expert evaluation.
LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning cs.AI · 2026-05-14 · unverdicted · none · ref 1
LEMON trains an LLM orchestrator with counterfactual-augmented GRPO to produce deployable multi-agent specifications that reach state-of-the-art results on six reasoning and coding benchmarks.
STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning cs.AI · 2026-05-11 · unverdicted · none · ref 27 · 3 links
STAR presents a failure-aware routing framework using a state-conditioned transition policy and an agent routing matrix combining expert routes with learned recoveries from execution traces to improve multi-agent spatiotemporal reasoning.
The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory cs.LG · 2026-05-10 · unverdicted · none · ref 53
Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.
Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs cs.AI · 2026-05-09 · unverdicted · none · ref 34
A critique-and-routing controller cast as a finite-horizon MDP with policy-gradient optimization outperforms one-shot routing baselines on reasoning benchmarks while using the strongest agent for under 25% of calls.
HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization cs.AI · 2026-05-08 · unverdicted · none · ref 47
HMACE deploys Proposer, Generator, Evaluator, and Reflector agents in an evolutionary loop to generate and refine heuristics for NP-hard problems, reporting lower optimality gaps and token costs than baselines on TSP and Online BPP.
FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution cs.LG · 2026-05-08 · unverdicted · none · ref 34
FAME models scientific topic trajectories in continuous time to forecast paper impact more accurately than LLMs by aligning manuscripts with field momentum in a dynamic latent space.
Do Agent Societies Develop Intellectual Elites? The Hidden Power Laws of Collective Cognition in LLM Multi-Agent Systems cs.MA · 2026-04-03 · unverdicted · none · ref 66
LLM agent societies develop power-law coordination cascades and intellectual elites through an integration bottleneck that grows with system size.
Code as Agent Harness cs.CL · 2026-05-18 · accept · none · ref 54
A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.
OptiMat Alloys: a FAIR, living database of multi-principal element alloys enabled by a conversational agent cond-mat.mtrl-sci · 2026-04-23 · unverdicted · none · ref 36
OptiMat Alloys is a conversational AI system that maintains a living FAIR database of multi-principal element alloy calculations and enables natural-language, on-demand computations with built-in uncertainty checks.
"Theater of Mind" for LLMs: A Cognitive Architecture Based on Global Workspace Theory cs.MA · 2026-04-09 · unverdicted · none · ref 10
Global Workspace Agents (GWA) is proposed as an active, event-driven cognitive architecture for LLMs featuring an entropy-based intrinsic drive and dual-layer memory to enable sustained self-directed agency.
Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On cs.AI · 2026-05-18 · unverdicted · none · ref 66
Argues that trustworthiness in Agent-to-Agent networks requires a new conceptual framework with four design pillars baked in from the beginning, as retrofitting existing single-agent methods is insufficient.
Social Theory Should Be a Structural Prior for Agentic AI: A Formal Framework for Multi-Agent Social Systems cs.MA · 2026-05-08 · unverdicted · none · ref 98 · 3 links
Agentic AI needs social theory as structural priors in the MASS framework to model emergent dynamics from multi-agent interactions.
Skills-Coach: A Self-Evolving Skill Optimizer via Training-Free GRPO cs.CL · 2026-04-30 · unverdicted · none · ref 5
Skills-Coach optimizes LLM agent skills via task generation, prompt/code tuning, comparative execution, and traceable evaluation, reporting gains on a 48-skill benchmark called Skill-X.
Counterfactual Trace Auditing of LLM Agent Skills cs.AI · 2026-05-12 · unreviewed · ref 15
SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning cs.AI · 2026-05-02 · unreviewed · ref 48

Autogen: Enabling next-gen llm applications via multi-agent conversations

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer