Title resolution pending

doi:10 · 2024 · DOI 10.18653/v1/2024.emnlp-main.992

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

open at publisher browse 14 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

ArgBench: Benchmarking LLMs on Computational Argumentation Tasks

cs.CL · 2026-04-19 · unverdicted · novelty 8.0

ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

MAGEO is a multi-agent system that distills validated editing patterns into reusable optimization skills for generative engines, outperforming heuristic baselines on visibility and fidelity via a new benchmark and evaluation protocol.

Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs

cs.AI · 2026-04-09 · accept · novelty 7.0

The paper delivers the first survey of abductive reasoning in LLMs, a unified two-stage taxonomy, a compact benchmark, and an analysis of gaps relative to deductive and inductive reasoning.

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

cs.CL · 2024-12-30 · unverdicted · novelty 7.0

o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.

LLM-X: A Scalable Negotiation-Oriented Exchange for Communication Among Personal LLM Agents

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

LLM-X is a scalable architecture for direct negotiation and communication among personal LLM agents, featuring federated gateways, typed protocols, and policy enforcement, shown stable in experiments with up to 12 agents.

Training-Free Cultural Alignment of Large Language Models via Persona Disagreement

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

DISCA uses disagreement among WVS-grounded persona panels to apply loss-averse logit corrections that reduce cultural misalignment by 10-24% on MultiTP for models 3.8B and larger, without weight changes.

When Reviews Disagree: Fine-Grained Contradiction Analysis in Scientific Peer Reviews

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

Introduces RevCI benchmark and IMPACT multi-agent framework for evidence-level contradiction detection and graded intensity scoring in peer reviews, distilled into efficient TIDE model.

Bimanual Robot Manipulation via Multi-Agent In-Context Learning

cs.RO · 2026-04-22 · unverdicted · novelty 6.0

BiCICLe frames bimanual robot control as a multi-agent leader-follower problem with Arms' Debate and an LLM judge, achieving up to 71.1% success on 13 TWIN benchmark tasks without fine-tuning.

Beyond Inefficiency: Systemic Costs of Incivility in Multi-Agent Monte Carlo Simulations

cs.AI · 2026-05-12 · unverdicted · novelty 5.0

Monte Carlo simulations of LLM agents confirm that toxic debates take 25% longer to converge, with larger delays in smaller models, and show a first-mover advantage independent of toxicity.

AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks

cs.AI · 2026-05-11 · unverdicted · novelty 5.0

Single-agent LLM frameworks outperform naive multi-agent systems in multimodal clinical risk prediction tasks and are better calibrated.

Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing

cs.AI · 2026-04-09 · unverdicted · novelty 5.0

SAVeR adds self-auditing of internal beliefs in LLM agents via persona-based candidates and constraint-guided repairs, improving faithfulness on six benchmarks without hurting task performance.

Improving Role Consistency in Multi-Agent Collaboration via Quantitative Role Clarity

cs.AI · 2026-04-03 · conditional · novelty 5.0

A role clarity matrix from softmax-normalized behavior-role similarities is employed as a regularizer to enhance role consistency in multi-agent LLM collaborations.

Human-LLM Dialogue Improves Diagnostic Accuracy in Emergency Care

cs.AI · 2026-05-08 · unverdicted · novelty 4.0

Interactive LLM dialogue raised residents' hard-case diagnostic correctness from 0.589 to 0.734 and produced medium effect sizes in a blinded study of seven physicians on 52 emergency cases.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Bimanual Robot Manipulation via Multi-Agent In-Context Learning cs.RO · 2026-04-22 · unverdicted · none · ref 29
BiCICLe frames bimanual robot control as a multi-agent leader-follower problem with Arms' Debate and an LLM judge, achieving up to 71.1% success on 13 TWIN benchmark tasks without fine-tuning.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer