WORC improves multi-agent LLM reasoning to 82.2% average accuracy by predicting and compensating for the weakest agent via targeted extra sampling rather than uniform reinforcement.
Qineng Wang, Zihao Wang, Ying Su, Hanghang Tong, and Yangqiu Song
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
support 1representative citing papers
ARM evolves specialized reasoning modules from basic CoT via tree search to serve as reusable components in multi-agent systems that generalize across models and domains without per-task re-optimization.
Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.
A layered Mixture-of-Agents system combining multiple LLMs achieves state-of-the-art results on AlpacaEval 2.0 (65.1%), MT-Bench, and FLASK, outperforming GPT-4 Omni.
BashCoder-R1 applies CPT, L-CoT SFT, and R-GRPO to reach higher syntax, robustness, and functionality rates than baselines on the new BashBench benchmark of 952 tasks.
AstroVLM deploys expert multi-agent collaboration with VLMs to outperform baselines on real-world astronomical imaging quality diagnosis.
A survey of emerging AI agent architectures that organizes single and multi-agent designs around reasoning, planning, tool use, communication, and reflection phases.
citing papers explorer
-
Mixture-of-Agents Enhances Large Language Model Capabilities
A layered Mixture-of-Agents system combining multiple LLMs achieves state-of-the-art results on AlpacaEval 2.0 (65.1%), MT-Bench, and FLASK, outperforming GPT-4 Omni.