Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , month=nov, year=

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate , author= · 2024 · DOI 10.18653/v1/2024.emnlp-main.992

26 Pith papers cite this work. Polarity classification is still indexing.

26 Pith papers citing it

open at publisher browse 26 citing papers

citation-role summary

background 3 baseline 1

citation-polarity summary

background 3 baseline 1

representative citing papers

ArgBench: Benchmarking LLMs on Computational Argumentation Tasks

cs.CL · 2026-04-19 · unverdicted · novelty 8.0

ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

MAGEO is a multi-agent system that distills validated editing patterns into reusable optimization skills for generative engines, outperforming heuristic baselines on visibility and fidelity via a new benchmark and evaluation protocol.

Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs

cs.AI · 2026-04-09 · accept · novelty 7.0

The paper delivers the first survey of abductive reasoning in LLMs, a unified two-stage taxonomy, a compact benchmark, and an analysis of gaps relative to deductive and inductive reasoning.

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

cs.CL · 2024-12-30 · unverdicted · novelty 7.0

o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.

Agent-based models for the evolution of morphological alternation patterns

cs.CL · 2026-06-10 · unverdicted · novelty 6.0

Multi-agent simulations with naturalistic lexicons and phonological rules show scale-free networks and Bernoulli adoption produce more plausible morphologies, evaluated by an LLM historical linguist debate system and tested via historical case studies.

Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems

cs.CL · 2026-06-04 · unverdicted · novelty 6.0

Multicultural multi-agent LLM systems exhibit substantially lower value diversity than human societies on the World Values Survey, with diversity uncorrelated to per-agent alignment and further reduced by agent interactions.

CRAFT: Cost-aware Refinement And Front-aware Tuning of Prompts

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

CRAFT is a Pareto-front prompt optimizer that allocates scarce LLM validation calls to candidates near the current front using accuracy- and cost-oriented generators plus NSGA-II retention.

Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

cs.CR · 2026-05-21 · conditional · novelty 6.0

Domain-camouflaged injection attacks reduce detection rates from 93.8% to 9.7% on Llama 3.1 8B and 100% to 55.6% on Gemini 2.0 Flash, with the gap persisting in production classifiers and multi-agent debate setups.

Multi-agent AI systems outperform human teams in creativity

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

Multi-agent LLM teams outperform human teams in creativity (d=1.50) across tasks by producing more novel ideas, with distinct semantic exploration patterns predicting success for each group.

Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems

cs.CL · 2026-05-15 · unverdicted · novelty 6.0

Nexa learns a response-conditioned policy that starts with parallel agent execution and adds at most one round of sequential message passing via a predicted sparse DAG, strictly subsuming pure parallel mode.

LLM-X: A Scalable Negotiation-Oriented Exchange for Communication Among Personal LLM Agents

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

LLM-X is a scalable architecture for direct negotiation and communication among personal LLM agents, featuring federated gateways, typed protocols, and policy enforcement, shown stable in experiments with up to 12 agents.

Training-Free Cultural Alignment of Large Language Models via Persona Disagreement

cs.CL · 2026-05-11 · conditional · novelty 6.0 · 2 refs

DISCA converts within-country disagreement among World Values Survey personas into a bounded logit correction that reduces cultural misalignment by 10-24% on MultiTP for models 3.8B and larger across 20 countries, without any weight updates.

When Reviews Disagree: Fine-Grained Contradiction Analysis in Scientific Peer Reviews

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

Introduces RevCI benchmark and IMPACT multi-agent framework for evidence-level contradiction detection and graded intensity scoring in peer reviews, distilled into efficient TIDE model.

ECHO: Event-Centric Hypergraph Operations via Multi-Agent Collaboration for Multimedia Event Extraction

cs.CV · 2026-03-04 · unverdicted · novelty 6.0

ECHO reframes multimedia event extraction as multi-agent iterative refinement over an explicit Multimedia Event Hypergraph with a decoupled Link-then-Bind strategy, delivering 7.3 and 15.5 F1 gains on event mention and argument role.

Sakana Fugu Technical Report

cs.LG · 2026-06-19 · unverdicted · novelty 5.0

Sakana Fugu trains LLM orchestrators using fine-tuning, evolutionary algorithms, and RL to build query-adaptive multi-agent scaffolds, claiming SOTA results on benchmarks including SWE-Bench Pro and GPQA-Diamond.

CMIP-Forge: An Agentic System that Retrieves, Computes, and Self-Reviews Climate Science

physics.ao-ph · 2026-06-10 · unverdicted · novelty 5.0

CMIP-Forge presents a retrieval-augmented agentic system with automated guardrails and adversarial self-review for autonomous execution of climate research tasks on CMIP6 literature and ESGF data.

MADRAG: Multi-Agent Debate with Retrieval-Augmented Generation for Training-Free Analytic Essay Scoring

cs.MA · 2026-06-04 · unverdicted · novelty 5.0

MADRAG combines multi-agent debate with retrieval-augmented generation to produce training-free analytic essay scores that outperform prompt baselines and approach supervised systems.

Beyond Inefficiency: Systemic Costs of Incivility in Multi-Agent Monte Carlo Simulations

cs.AI · 2026-05-12 · unverdicted · novelty 5.0

Monte Carlo simulations of LLM agents confirm that toxic debates take 25% longer to converge, with larger delays in smaller models, and show a first-mover advantage independent of toxicity.

AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks

cs.AI · 2026-05-11 · unverdicted · novelty 5.0

Single-agent LLM frameworks outperform naive multi-agent systems in multimodal clinical risk prediction tasks and are better calibrated.

Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing

cs.AI · 2026-04-09 · unverdicted · novelty 5.0

SAVeR adds self-auditing of internal beliefs in LLM agents via persona-based candidates and constraint-guided repairs, improving faithfulness on six benchmarks without hurting task performance.

Improving Role Consistency in Multi-Agent Collaboration via Quantitative Role Clarity

cs.AI · 2026-04-03 · conditional · novelty 5.0

A role clarity matrix from softmax-normalized behavior-role similarities is employed as a regularizer to enhance role consistency in multi-agent LLM collaborations.

Normative Common Ground Replication (NormCoRe): Replication-by-Translation for Studying Norms in Multi-Agent AI

cs.AI · 2026-03-12 · conditional · novelty 5.0

NormCoRe is a replication-by-translation framework that maps human subject studies onto multi-agent AI environments, showing AI normative judgments on fairness differ from human baselines and vary with model choice and persona language.

Human-LLM Dialogue Improves Diagnostic Accuracy in Emergency Care

cs.AI · 2026-05-08 · unverdicted · novelty 4.0

Interactive LLM dialogue raised residents' hard-case diagnostic correctness from 0.589 to 0.734 and produced medium effect sizes in a blinded study of seven physicians on 52 emergency cases.

citing papers explorer

Showing 26 of 26 citing papers.

ArgBench: Benchmarking LLMs on Computational Argumentation Tasks cs.CL · 2026-04-19 · unverdicted · none · ref 35
ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment cs.CL · 2026-05-08 · unverdicted · none · ref 138
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning cs.AI · 2026-04-21 · unverdicted · none · ref 73
MAGEO is a multi-agent system that distills validated editing patterns into reusable optimization skills for generative engines, outperforming heuristic baselines on visibility and fidelity via a new benchmark and evaluation protocol.
Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs cs.AI · 2026-04-09 · accept · none · ref 56
The paper delivers the first survey of abductive reasoning in LLMs, a unified two-stage taxonomy, a compact benchmark, and an analysis of gaps relative to deductive and inductive reasoning.
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs cs.CL · 2024-12-30 · unverdicted · none · ref 271
o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
Agent-based models for the evolution of morphological alternation patterns cs.CL · 2026-06-10 · unverdicted · none · ref 225
Multi-agent simulations with naturalistic lexicons and phonological rules show scale-free networks and Bernoulli adoption produce more plausible morphologies, evaluated by an LLM historical linguist debate system and tested via historical case studies.
Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems cs.CL · 2026-06-04 · unverdicted · none · ref 42
Multicultural multi-agent LLM systems exhibit substantially lower value diversity than human societies on the World Values Survey, with diversity uncorrelated to per-agent alignment and further reduced by agent interactions.
CRAFT: Cost-aware Refinement And Front-aware Tuning of Prompts cs.CL · 2026-06-03 · unverdicted · none · ref 190
CRAFT is a Pareto-front prompt optimizer that allocates scarce LLM validation calls to candidates near the current front using accuracy- and cost-oriented generators plus NSGA-II retention.
Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems cs.CR · 2026-05-21 · conditional · none · ref 3
Domain-camouflaged injection attacks reduce detection rates from 93.8% to 9.7% on Llama 3.1 8B and 100% to 55.6% on Gemini 2.0 Flash, with the gap persisting in production classifiers and multi-agent debate setups.
Multi-agent AI systems outperform human teams in creativity cs.CL · 2026-05-18 · unverdicted · none · ref 17
Multi-agent LLM teams outperform human teams in creativity (d=1.50) across tasks by producing more novel ideas, with distinct semantic exploration patterns predicting success for each group.
Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems cs.CL · 2026-05-15 · unverdicted · none · ref 27
Nexa learns a response-conditioned policy that starts with parallel agent execution and adds at most one round of sequential message passing via a predicted sparse DAG, strictly subsuming pure parallel mode.
LLM-X: A Scalable Negotiation-Oriented Exchange for Communication Among Personal LLM Agents cs.AI · 2026-05-12 · unverdicted · none · ref 15
LLM-X is a scalable architecture for direct negotiation and communication among personal LLM agents, featuring federated gateways, typed protocols, and policy enforcement, shown stable in experiments with up to 12 agents.
Training-Free Cultural Alignment of Large Language Models via Persona Disagreement cs.CL · 2026-05-11 · conditional · none · ref 24 · 2 links
DISCA converts within-country disagreement among World Values Survey personas into a bounded logit correction that reduces cultural misalignment by 10-24% on MultiTP for models 3.8B and larger across 20 countries, without any weight updates.
When Reviews Disagree: Fine-Grained Contradiction Analysis in Scientific Peer Reviews cs.CL · 2026-05-11 · unverdicted · none · ref 34
Introduces RevCI benchmark and IMPACT multi-agent framework for evidence-level contradiction detection and graded intensity scoring in peer reviews, distilled into efficient TIDE model.
ECHO: Event-Centric Hypergraph Operations via Multi-Agent Collaboration for Multimedia Event Extraction cs.CV · 2026-03-04 · unverdicted · none · ref 18
ECHO reframes multimedia event extraction as multi-agent iterative refinement over an explicit Multimedia Event Hypergraph with a decoupled Link-then-Bind strategy, delivering 7.3 and 15.5 F1 gains on event mention and argument role.
Sakana Fugu Technical Report cs.LG · 2026-06-19 · unverdicted · none · ref 282
Sakana Fugu trains LLM orchestrators using fine-tuning, evolutionary algorithms, and RL to build query-adaptive multi-agent scaffolds, claiming SOTA results on benchmarks including SWE-Bench Pro and GPQA-Diamond.
CMIP-Forge: An Agentic System that Retrieves, Computes, and Self-Reviews Climate Science physics.ao-ph · 2026-06-10 · unverdicted · none · ref 17
CMIP-Forge presents a retrieval-augmented agentic system with automated guardrails and adversarial self-review for autonomous execution of climate research tasks on CMIP6 literature and ESGF data.
MADRAG: Multi-Agent Debate with Retrieval-Augmented Generation for Training-Free Analytic Essay Scoring cs.MA · 2026-06-04 · unverdicted · none · ref 31
MADRAG combines multi-agent debate with retrieval-augmented generation to produce training-free analytic essay scores that outperform prompt baselines and approach supervised systems.
Beyond Inefficiency: Systemic Costs of Incivility in Multi-Agent Monte Carlo Simulations cs.AI · 2026-05-12 · unverdicted · none · ref 13
Monte Carlo simulations of LLM agents confirm that toxic debates take 25% longer to converge, with larger delays in smaller models, and show a first-mover advantage independent of toxicity.
AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks cs.AI · 2026-05-11 · unverdicted · none · ref 113
Single-agent LLM frameworks outperform naive multi-agent systems in multimodal clinical risk prediction tasks and are better calibrated.
Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing cs.AI · 2026-04-09 · unverdicted · none · ref 33
SAVeR adds self-auditing of internal beliefs in LLM agents via persona-based candidates and constraint-guided repairs, improving faithfulness on six benchmarks without hurting task performance.
Improving Role Consistency in Multi-Agent Collaboration via Quantitative Role Clarity cs.AI · 2026-04-03 · conditional · none · ref 11
A role clarity matrix from softmax-normalized behavior-role similarities is employed as a regularizer to enhance role consistency in multi-agent LLM collaborations.
Normative Common Ground Replication (NormCoRe): Replication-by-Translation for Studying Norms in Multi-Agent AI cs.AI · 2026-03-12 · conditional · none · ref 34
NormCoRe is a replication-by-translation framework that maps human subject studies onto multi-agent AI environments, showing AI normative judgments on fairness differ from human baselines and vary with model choice and persona language.
Human-LLM Dialogue Improves Diagnostic Accuracy in Emergency Care cs.AI · 2026-05-08 · unverdicted · none · ref 25
Interactive LLM dialogue raised residents' hard-case diagnostic correctness from 0.589 to 0.734 and produced medium effect sizes in a blinded study of seven physicians on 52 emergency cases.
Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation cs.CL · 2025-04-02 · unverdicted · none · ref 88
A literature survey that organizes prompting, fine-tuning, preference optimization, and context-aware techniques for LLM-based machine translation with emphasis on low-resource languages.
Bimanual Robot Manipulation via Multi-Agent In-Context Learning cs.RO · 2026-04-22 · unreviewed · ref 29

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , month=nov, year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer