arxiv: 2605.14892 · v1 · submitted 2026-05-14 · 💻 cs.AI

Recognition: no theorem link

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

Shihao Qi , Jie Ma , Rui Xing , Wei Guo , Xiao Huang , Zhitao Gao , Jianhao Deng , Jun Liu

show 10 more authors

Lingling Zhang Bifan Wei Boqian Yang Pinghui Wang Jianwen Sun Jing Tao Yaqiang Wu Hui Liu Yu Yao Tongliang Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-15 03:03 UTC · model grok-4.3

classification 💻 cs.AI

keywords LLM-based multi-agent systemscollaborationfailure attributionself-evolutioncollective intelligenceautonomous agents

0 comments

The pith

LLM multi-agent systems advance through four causally linked stages from foundations to self-evolution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a unified framework for understanding LLM-based multi-agent systems by organizing existing research into four causally dependent stages known as the LIFE progression. This addresses the gap where prior surveys examined collaboration, failure handling, and self-evolution in isolation without exploring how they influence each other. By providing taxonomies for each stage and characterizing the dependencies, the survey shows how better collaboration enables more accurate fault attribution, which in turn supports autonomous evolution. Readers would care because this causal view can inform the design of systems that continuously improve their collective performance by learning from propagated errors rather than treating stages independently.

Core claim

The central claim is that multi-agent LLM systems follow the LIFE progression consisting of laying the capability foundation for individual agents, integrating them through collaboration, finding faults through attribution, and evolving through autonomous self-improvement, with each stage depending on the outputs of the previous and imposing constraints on the subsequent ones, as formalized through systematic taxonomies and boundary challenges.

What carries the argument

The LIFE progression, a sequence of four stages that causally connect individual agent capabilities to collaborative structures, failure diagnosis, and self-improving behaviors.

If this is right

Error propagation in multi-agent interactions can be diagnosed by tracing back through collaboration mechanisms to initial capability gaps.
Self-evolution mechanisms become more effective when informed by structured failure attribution across multiple agents and rounds.
Research agendas should prioritize closed-loop systems that reorganize agent structures based on attributed faults.
Existing coordination frameworks can be extended to support self-organizing collective intelligence by incorporating stage dependencies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could extend to identifying similar causal chains in other types of intelligent systems beyond LLMs.
Testing the framework might involve simulating multi-agent tasks where failure attribution is disabled to observe impacts on evolution.
Developers could use the stage model to prioritize which capabilities to strengthen first in building new multi-agent applications.

Load-bearing premise

The LIFE stages accurately represent the causal dependencies among collaboration, failure attribution, and self-evolution without overlooking significant literature at the boundaries between stages.

What would settle it

An example of a multi-agent system achieving effective self-evolution despite lacking robust failure attribution mechanisms would falsify the necessity of the proposed stage dependencies.

read the original abstract

LLM-based autonomous agents have demonstrated strong capabilities in reasoning, planning, and tool use, yet remain limited when tasks require sustained coordination across roles, tools, and environments. Multi-agent systems address this through structured collaboration among specialized agents, but tighter coordination also amplifies a less explored risk: errors can propagate across agents and interaction rounds, producing failures that are difficult to diagnose and rarely translate into structural self-improvement. Existing surveys cover individual agent capabilities, multi-agent collaboration, or agent self-evolution separately, leaving the causal dependencies among them unexamined. This survey provides a unified review organized around four causally linked stages, which we term the LIFE progression: Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement. For each stage, we provide systematic taxonomies and formally characterize the dependencies between adjacent stages, revealing how each stage both depends on and constrains the next. Beyond synthesizing existing work, we identify open challenges at stage boundaries and propose a cross-stage research agenda for closed-loop multi-agent systems capable of continuously diagnosing failures, reorganizing structures, and refining agent behaviors, extending current coordination frameworks toward more self-organizing forms of collective intelligence. By bridging these previously fragmented research threads, this survey aims to offer both a systematic reference and a conceptual roadmap toward autonomous, self-improving multi-agent intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey frames multi-agent LLM work as a four-stage LIFE progression linking capabilities, collaboration, fault finding, and self-evolution, but it is a high-level reorganization of existing literature rather than a source of new data or proofs.

read the letter

The main takeaway is that this paper organizes scattered research on LLM agents into a single sequence they label LIFE: first build solid individual capabilities, then integrate agents for collaboration, then attribute failures across the group, and finally let the system evolve on its own. The authors argue each stage depends on and limits the next, and they sketch open problems at those handoffs plus a research agenda for closed-loop systems that diagnose and fix themselves. That linkage is the clearest new piece; prior surveys handled the topics in isolation, so pulling them together this way gives a cleaner map of how collaboration can amplify errors and why self-improvement rarely closes the loop today. They also flag concrete gaps, such as better ways to trace faults through multi-round interactions and mechanisms that actually restructure agents rather than just tweak prompts. The synthesis is coherent on its own terms and points to practical next steps for anyone building collective agents. The soft spots are straightforward. The claimed causal dependencies between stages rest on the authors' reading of the literature rather than new measurements or formal checks, so they remain plausible but untested. The taxonomies and paper-selection rules are not visible in the abstract, which makes it hard to judge whether important boundary cases were left out. As a survey it brings no fresh experiments, code, or derivations, so its strength is entirely in how well the framing holds up once the full citations and tables are examined. This is useful for researchers already working on multi-agent setups who need a quick way to locate gaps or plan projects that span the full pipeline. It is less essential for people focused on single-agent reasoning or purely theoretical work. I would send it to peer review. The structure is clear enough that referees can verify coverage and tighten the dependency claims, and the field moves fast enough that a solid organizing survey adds value even if the links are interpretive.

Referee Report

2 major / 2 minor

Summary. The paper surveys LLM-based multi-agent systems and organizes the literature into a unified LIFE progression framework consisting of four causally linked stages: Lay the capability foundation (individual agent capabilities), Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement. It supplies systematic taxonomies for each stage, formally characterizes dependencies between adjacent stages (each depending on and constraining the next), identifies open challenges at stage boundaries, and proposes a cross-stage research agenda for closed-loop self-organizing multi-agent systems.

Significance. If the taxonomies prove comprehensive and the dependency characterizations hold, the survey would meaningfully bridge previously separate threads on individual agents, collaboration, failure attribution, and self-evolution, providing both a systematic reference and a conceptual roadmap toward autonomous, self-improving collective intelligence in LLM-based systems.

major comments (2)

LIFE progression section: the central claim that stages are causally linked (each depending on and constraining the next) is load-bearing for the unification thesis, yet the characterization remains largely descriptive; concrete literature examples demonstrating measurable constraints (e.g., how collaboration structures amplify or limit attribution accuracy) should be added with citations to substantiate the causal framing over a purely organizational one.
Taxonomy for the Find faults stage: the coverage of cross-agent error propagation and diagnosis mechanisms appears incomplete relative to the claimed dependency from the Integrate stage; explicit inclusion of distributed failure models from the surveyed literature is needed to support the progression's causal structure.

minor comments (2)

The abstract and introduction should state the literature selection criteria and approximate number of papers reviewed to allow readers to assess taxonomy completeness.
Figure or table summarizing the LIFE stages and their dependencies would improve clarity and help readers trace the claimed causal links.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. The comments highlight important opportunities to strengthen the causal claims in the LIFE framework, and we will revise the manuscript to address them directly.

read point-by-point responses

Referee: [—] LIFE progression section: the central claim that stages are causally linked (each depending on and constraining the next) is load-bearing for the unification thesis, yet the characterization remains largely descriptive; concrete literature examples demonstrating measurable constraints (e.g., how collaboration structures amplify or limit attribution accuracy) should be added with citations to substantiate the causal framing over a purely organizational one.

Authors: We agree that concrete examples are needed to substantiate the causal linkages rather than leaving them descriptive. In the revised manuscript, we will expand the LIFE progression section with specific literature examples illustrating measurable constraints, such as how hierarchical versus decentralized collaboration structures affect attribution accuracy (citing relevant works on multi-agent coordination and fault diagnosis from the surveyed literature). This will reinforce the causal framing over a purely organizational one. revision: yes
Referee: [—] Taxonomy for the Find faults stage: the coverage of cross-agent error propagation and diagnosis mechanisms appears incomplete relative to the claimed dependency from the Integrate stage; explicit inclusion of distributed failure models from the surveyed literature is needed to support the progression's causal structure.

Authors: We acknowledge this gap in the Find faults taxonomy. We will revise the section to explicitly incorporate distributed failure models and cross-agent error propagation mechanisms drawn from the surveyed literature. These additions will directly support the claimed dependency on the Integrate stage by demonstrating how collaboration structures shape error diagnosis capabilities. revision: yes

Circularity Check

0 steps flagged

No significant circularity in survey organization

full rationale

This is a literature survey that introduces the LIFE progression as an organizing framework for existing external research on LLM agents, collaboration, fault attribution, and self-evolution. No equations, fitted parameters, predictions, or derivations appear in the provided text. The four stages and their claimed dependencies are presented as a descriptive taxonomy drawn from prior work rather than computed or fitted from the paper's own content. No self-citation chains, ansatzes, or renamings reduce any central claim to the paper's inputs by construction. The synthesis is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The survey rests on standard domain assumptions about current LLM agent limitations and proposes a new organizational framework without introducing fitted parameters or new entities.

axioms (1)

domain assumption LLM-based autonomous agents have demonstrated strong capabilities in reasoning, planning, and tool use yet remain limited when tasks require sustained coordination
Presented as established background in the abstract to motivate the need for multi-agent approaches.

pith-pipeline@v0.9.0 · 5602 in / 1267 out tokens · 83917 ms · 2026-05-15T03:03:42.476724+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

298 extracted references · 298 canonical work pages · 17 internal anchors

[1]

2025 , month = aug, howpublished =

Introducing. 2025 , month = aug, howpublished =

work page 2025
[2]

arXiv preprint arXiv:2412.19437 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Qwen3 Technical Report

Yang, A. and others , title =. arXiv preprint arXiv:2505.09388 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[4]

2026 , month = feb, howpublished =

Introducing. 2026 , month = feb, howpublished =

work page 2026
[5]

and Yang, D

Guo, D. and Yang, D. and Zhang, H. and others , title =. Nature , volume =

work page
[6]

and Bosma, M

Wei, J. and Bosma, M. and Zhao, V. Y. and Guu, K. and Yu, A. W. and Lester, B. and Du, N. and Dai, A. M. and Le, Q. V. , title =. Proc. ICLR , year =

work page
[7]

and Wang, X

Wei, J. and Wang, X. and Schuurmans, D. and Bosma, M. and Ichter, B. and Xia, F. and Chi, E. H. and Le, Q. V. and Zhou, D. , title =. Proc. NeurIPS , volume =

work page
[8]

and Gu, S

Kojima, T. and Gu, S. S. and Reid, M. and Matsuo, Y. and Iwasawa, Y. , title =. Proc. NeurIPS , volume =

work page
[9]

2026 , month = mar, howpublished =

Introducing. 2026 , month = mar, howpublished =

work page 2026
[10]

and Ma, C

Wang, L. and Ma, C. and Feng, X. and Zhang, Z. and Yang, H. and Zhang, J. and Chen, Z. and Tang, J. and Chen, X. and Lin, Y. and Zhao, W. X. and Wei, Z. and Wen, J.-R. , title =. Frontiers of Computer Science , volume =. 2024 , note =

work page 2024
[11]

and Chen, W

Xi, Z. and Chen, W. and Guo, X. and He, W. and Ding, Y. and Hong, B. and Zhang, M. and Wang, J. and Jin, S. and Zhou, E. and Zheng, R. and Fan, X. and Wang, X. and Xiong, L. and Zhou, Y. and Wang, W. and Jiang, C. and Zou, Y. and Liu, X. and Yin, Z. and Dou, S. and Weng, R. and Cheng, W. and Zhang, Q. and Qin, Y. and Zheng, Y. and Qiu, X. and Huang, X. an...

work page 2025
[12]

Jimenez, C. E. and Yang, J. and Wettig, A. and Yao, S. and Pei, K. and Press, O. and Narasimhan, K. , title =. Proc. ICLR , year =

work page
[13]

Qian, Chen and Liu, Wei and Liu, Hongzhang and Chen, Nuo and Dang, Yufan and Li, Jiahao and Yang, Cheng and Chen, Weize and Su, Yusheng and Cong, Xin and Xu, Juyuan and Li, Dahai and Liu, Zhiyuan and Sun, Maosong , title =. Proc. ACL , pages =. 2024 , publisher =

work page 2024
[14]

Boiko, D. A. and MacKnight, R. and Kline, B. and Gomes, G. , title =. Nature , volume =

work page
[15]

Bran, A. M. and Cox, S. and Schilter, O. and Baldassari, C. and White, A. D. and Schwaller, P. , title =. Nature Machine Intelligence , volume =

work page
[16]

and Joshi, T

Zitkovich, B. and Joshi, T. J. and Irpan, A. and Ichter, B. and Hsu, J. and Herzog, A. and Hausman, K. and Gopalakrishnan, K. and Fu, C. and Florence, P. and Finn, C. and Dubey, K. A. and Driess, D. and Ding, T. and Choromanski, K. M. and Chen, X. and Chebotar, Y. and Carbajal, J. and Brown, N. and Brohan, A. and Arenas, M. G. and Han, K. , title =. Proc....

work page
[17]

Huang, Wenlong and Xia, Fei and Xiao, Ted and Chan, Harris and Liang, Jacky and Florence, Pete and Zeng, Andy and Tompson, Jonathan and Mordatch, Igor and Chebotar, Yevgen and Sermanet, Pierre and Jackson, Tomas and Brown, Noah and Luu, Linda and Levine, Sergey and Hausman, Karol and Ichter, Brian , title =. Proc. CoRL , volume =. 2023 , publisher =

work page 2023
[18]

and Zhao, J

Yao, S. and Zhao, J. and Yu, D. and Du, N. and Shafran, I. and Narasimhan, K. and Cao, Y. , title =. Proc. ICLR , year =

work page
[19]

and Cassano, F

Shinn, N. and Cassano, F. and Gopinath, A. and Narasimhan, K. and Yao, S. , title =. Proc. NeurIPS , volume =

work page
[20]

Why Do Multi-Agent LLM Systems Fail?

Cemri, Mert and Pan, Melissa Z. and Yang, Shuyi and Agrawal, Lakshya A. and Chopra, Bhavya and Tiwari, Rishabh and Keutzer, Kurt and Parameswaran, Aditya and Klein, Dan and Ramchandran, Kannan and Zaharia, Matei and Gonzalez, Joseph E. and Stoica, Ion , title =. arXiv preprint arXiv:2503.13657 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[21]

and Zhuge, M

Hong, S. and Zhuge, M. and Chen, J. and Zheng, X. and Cheng, Y. and Wang, J. and Zhang, C. and Wang, Z. and Yau, S. K. S. and Lin, Z. and Zhou, L. and Ran, C. and Xiao, L. and Wu, C. and Schmidhuber, J. , title =. Proc. ICLR , year =

work page
[22]

and Chen, X

Guo, T. and Chen, X. and Wang, Y. and Chang, R. and Pei, S. and Chawla, N. V. and Wiest, O. and Zhang, X. , title =. Proc. IJCAI , pages =

work page
[23]

and Bansal, G

Wu, Q. and Bansal, G. and Zhang, J. and Wu, Y. and Li, B. and Zhu, E. and Jiang, L. and Zhang, X. and Zhang, S. and Liu, J. and Awadallah, A. H. and White, R. W. and Burger, D. and Wang, C. , title =. Proc. COLM , year =

work page
[24]

and Han, S

Park, C. and Han, S. and Guo, X. and Ozdaglar, A. and Zhang, K. and Kim, J.-K. , title =. Proc. ACL , pages =

work page
[25]

2024 , month = nov, howpublished =

Introducing the. 2024 , month = nov, howpublished =

work page 2024
[26]

2025 , month = apr, howpublished =

Announcing the. 2025 , month = apr, howpublished =

work page 2025
[27]

and Yin, M

Zhang, S. and Yin, M. and Zhang, J. and Liu, J. and Han, Z. and Zhang, J. and Li, B. and Wang, C. and Wang, H. and Chen, Y. and Wu, Q. , title =. Proc. ICML , pages =

work page
[28]

TRAIL: Trace reasoning and agentic issue localization.arXiv preprint arXiv:2505.08638, 2025

Deshpande, D. and Gangal, V. and Mehta, H. and Krishnan, J. and Kannappan, A. and Qian, R. , title =. arXiv preprint arXiv:2505.08638 , year =

work page arXiv
[29]

and Xie, X

Ma, X. and Xie, X. and Wang, Y. and Wang, J. and Wu, B. and Li, M. and Wang, Q. , title =. arXiv preprint arXiv:2509.23735 , year =

work page arXiv
[30]

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

Gao, H.-a. and Geng, J. and Hua, W. and Hu, M. and Juan, X. and Liu, H. and Liu, S. and Qiu, J. and Qi, X. and Ren, Q. and Wu, Y. and Wang, H. and Xiao, H. and Zhou, Y. and Zhang, S. and Zhang, J. and Xiang, J. and Fang, Y. and Zhao, Q. and Liu, D. and Qian, C. and Wang, Z. and Hu, M. and Wang, H. and Wu, Q. and Ji, H. and Wang, M. , title =. arXiv prepri...

work page internal anchor Pith review Pith/arXiv arXiv
[31]

and Peng, Y

Fang, J. and Peng, Y. and Zhang, X. and Wang, Y. and Yi, X. and Zhang, G. and Xu, Y. and Wu, B. and Liu, S. and Li, Z. and Ren, Z. and Aletras, N. and Wang, X. and Zhou, H. and Meng, Z. , title =. arXiv preprint arXiv:2508.07407 , year =

work page arXiv
[32]

and Wu, W

Wang, Y. and Wu, W. and Wang, J. and Wang, Q. , title =. arXiv preprint arXiv:2602.23701 , year =

work page arXiv
[33]

and Qian, C

Dang, Y. and Qian, C. and others , title =. Proc. NeurIPS , year =

work page
[34]

Where llm agents fail and how they can learn from failures.arXiv preprint arXiv:2509.25370, 2025

Zhu, K. and Liu, Z. and Li, B. and Tian, M. and Yang, Y. and Zhang, J. and Han, P. and Xie, Q. and Cui, F. and Zhang, W. and Ma, X. and Yu, X. and Ramesh, G. and Wu, J. and Liu, Z. and Lu, P. and Zou, J. and You, J. , title =. arXiv preprint arXiv:2509.25370 , year =

work page arXiv
[35]

and Dai, Q

Zhang, Z. and Dai, Q. and Bo, X. and Ma, C. and Li, R. and Chen, X. and others , title =. ACM Transactions on Information Systems , volume =. 2025 , note =

work page 2025
[36]

and Zhang, Z

Wei, H. and Zhang, Z. and He, S. and Xia, T. and Pan, S. and Liu, F. , title =. Proc. ACL , pages =

work page
[38]

and Wang, S

Li, X. and Wang, S. and Zeng, S. and Wu, Y. and Yang, Y. , title =. Vicinagearth , volume =. 2024 , note =

work page 2024
[39]

Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Tran, K.-T. and others , title =. arXiv preprint arXiv:2501.06322 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[41]

and others , title =

Han, S. and others , title =. arXiv preprint arXiv:2402.03578 , year =

work page arXiv
[42]

GPT-4 Technical Report

OpenAI , title =. arXiv preprint arXiv:2303.08774 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[43]

and Lavril, T

Touvron, H. and Lavril, T. and Izacard, G. and Martinet, X. and Lachaux, M.-A. and Lacroix, T. and Rozi\`. LLaMA: Open and efficient foundation language models , journal =

work page
[44]

and Xu, F

Zhou, S. and Xu, F. F. and Zhu, H. and Zhou, X. and Lo, R. and Sridhar, A. and Cheng, X. and Ou, T. and Bisk, Y. and Fried, D. and Alon, U. and Neubig, G. , title =. Proc. ICLR , year =

work page
[45]

Mathematics Into Type , howpublished =

work page
[46]

The Rise and Potential of Large Language Model Based Agents: A Survey

Xi, Z. and Chen, W. and Guo, X. and He, W. and Ding, Y. and Hong, B. and Zhang, M. and Wang, J. and Jin, S. and Zhou, E. and Zheng, R. and Fan, X. and Wang, X. and Xiong, L. and Zhou, Y. and Wang, W. and C. Jiang and Zou, Y. and Liu, X. and Yin, Z. and Dou, S. and Weng, R. and Cheng, W. and Zhang, Q. and Qin, Y. and Zheng, Y. and Qiu, X. and Huang, X. and...

work page internal anchor Pith review Pith/arXiv arXiv
[47]

Chaundy, T. W. and Barrett, P. R. and Batey, C. , title =. 1954 , publisher =

work page 1954
[48]

and Goossens, M

Mittelbach, F. and Goossens, M. , title =. 2004 , publisher =

work page 2004
[49]

More Math Into LaTeX , year =

Gr\". More Math Into LaTeX , year =

work page
[50]

and Sharp, J

Letourneau, M. and Sharp, J. W. , title =

work page
[51]

, title =

Sira-Ramirez, H. , title =. Systems & Control Letters , volume =

work page
[52]

, title =

Levant, A. , title =. Proc. IEEE CDC , pages =. 2006 , address =

work page 2006
[53]

and Join, C

Fliess, M. and Join, C. and Sira-Ramirez, H. , title =. International Journal of Modelling, Identification and Control , volume =

work page
[54]

and Astolfi, A

Ortega, R. and Astolfi, A. and Bastin, G. and Rodriguez, H. , title =. Proc. ACC , pages =. 2000 , address =

work page 2000
[55]

Findings of ACL , pages =

Jie Huang and Kevin Chen-Chuan Chang , title =. Findings of ACL , pages =

work page
[56]

From System 1 to System 2: A Survey of Reasoning Large Language Models

Fei Sun and Chaochao Chen and Shuai Li and others , title =. arXiv preprint arXiv:2502.17419 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[57]

Retrieval-Augmented Generation for Knowledge-Intensive

Patrick Lewis and Ethan Perez and Aleksandra Piktus and Fabio Petroni and Vladimir Karpukhin and Naman Goyal and Heinrich K\". Retrieval-Augmented Generation for Knowledge-Intensive. Proc. NeurIPS , volume =

work page
[58]

Akari Asai and Zeqiu Wu and Yizhong Wang and Avirup Sil and Hannaneh Hajishirzi , title =. Proc. ICLR , year =

work page
[59]

Mufei Li and Siqi Miao and Pan Li , title =. Proc. ICLR , year =

work page
[60]

Yudi Zhang and Pei Xiao and Lu Wang and Chaoyun Zhang and Meng Fang and Yali Du and Yevgeniy Puzyrev and Randolph Yao and Si Qin and Qingwei Lin and Mykola Pechenizkiy and Dongmei Zhang and Saravanakumar Rajmohan and Qi Zhang , title =. Proc. ICLR , year =

work page
[61]

Transactions on Machine Learning Research , year =

Zhuosheng Zhang and Aston Zhang and Mu Li and Hai Zhao and George Karypis and Alex Smola , title =. Transactions on Machine Learning Research , year =

work page
[62]

Smith and Ranjay Krishna , title =

Yushi Hu and Weijia Shi and Xingyu Fu and Dan Roth and Mari Ostendorf and Luke Zettlemoyer and Noah A. Smith and Ranjay Krishna , title =. Proc. NeurIPS , volume =

work page
[63]

Ji Qi and Ming Ding and Weihan Wang and Yushi Bai and Qingsong Lv and Wenyi Hong and Bin Xu and Lei Hou and Juanzi Li and Yuxiao Dong and Jie Tang , title =. Proc. ICLR , year =

work page
[64]

Besta, Maciej and Blach, Nils and Kubicek, Ales and Gerstenberger, Robert and Podstawski, Michal and Gianinazzi, Lukas and Gajda, Joanna and Lehmann, Tomasz and Niewiadomski, Hubert and Nyczyk, Piotr and Hoefler, Torsten , title =. Proc. AAAI , volume =

work page
[65]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Snell, Charlie and Lee, Jaehoon and Xu, Kelvin and Kumar, Aviral , title =. arXiv preprint arXiv:2408.03314 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[66]

Wang, Peiyi and Li, Lei and Shao, Zhihong and Xu, Runxin and Dai, Damai and Li, Yifei and Chen, Deli and Wu, Yu and Sui, Zhifang , title =. Proc. ACL , pages =

work page
[67]

Kojima, Takeshi and Gu, Shixiang Shane and Reid, Machel and Matsuo, Yutaka and Iwasawa, Yusuke , title =. Proc. NeurIPS , volume =

work page
[68]

Yao, Shunyu and Yu, Dian and Zhao, Jeffrey and Shafran, Izhak and Griffiths, Tom and Cao, Yuan and Narasimhan, Karthik , title =. Proc. NeurIPS , volume =

work page
[69]

Wang, Xuezhi and Wei, Jason and Schuurmans, Dale and Le, Quoc and Chi, Ed and Narang, Sharan and Chowdhery, Aakanksha and Zhou, Denny , title =. Proc. ICLR , year =

work page
[70]

Madaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and Gupta, Shashank and Majumder, Bodhisattwa Prasad and Hermann, Katherine and Welleck, Sean and Yazdanbakhsh, Amir and Clark, Peter , title =. Proc. NeurIPS , volume =

work page
[71]

Lightman, Hunter and Kosaraju, Vineet and Burda, Yuri and Edwards, Harrison and Baker, Bowen and Lee, Teddy and Leike, Jan and Schulman, John and Sutskever, Ilya and Cobbe, Karl , title =. Proc. ICLR , year =

work page
[72]

Nature , year =

DeepSeek-R1: Incentivizing Reasoning Capability in. Nature , year =

work page
[73]

Zhang, Xuan and Du, Chao and Pang, Tianyu and Liu, Qian and Gao, Wei and Lin, Min , title =. Proc. NeurIPS , volume =

work page
[74]

Luo, Haipeng and Sun, Qingfeng and Xu, Can and Zhao, Pu and Lou, Jian-Guang and Tao, Chongyang and Geng, Xiubo and Lin, Qingwei and Chen, Shifeng and Tang, Yansong and Zhang, Dongmei , title =. Proc. ICLR , year =

work page
[75]

ACM Computing Surveys , volume =

Ji, Ziwei and Lee, Nayeon and Frieske, Rita and Yu, Tiezheng and Su, Dan and Xu, Yan and Ishii, Etsuko and Bang, Yejin and Chen, Delong and Dai, Wenliang and Chan, Ho Shu and Madotto, Andrea and Fung, Pascale , title =. ACM Computing Surveys , volume =

work page
[76]

Min, Sewon and Krishna, Kalpesh and Lyu, Xinxi and Lewis, Mike and Yih, Wen-tau and Koh, Pang Wei and Iyyer, Mohit and Zettlemoyer, Luke and Hajishirzi, Hannaneh , title =. Proc. EMNLP , pages =

work page
[77]

arXiv preprint arXiv:2307.13528 , year =

Chern, I-Chun and Chern, Steffi and Chen, Shiqi and Yuan, Weizhe and Feng, Kehua and Zhou, Chunting and He, Junxian and Neubig, Graham and Liu, Pengfei , title =. arXiv preprint arXiv:2307.13528 , year =

work page arXiv
[78]

Findings of ACL , year =

Yuxia Wang and Revanth Gangi Reddy and Zain Muhammad Mujahid and Arnav Arora and Aleksandr Rubashevskii and Jiahui Geng and Osama Mohammed Afzal and Liangming Pan and Nadav Borenstein and Aditya Pillai and Isabelle Augenstein and Iryna Gurevych and Preslav Nakov , title =. Findings of ACL , year =

work page
[79]

Manakul, Potsawee and Liusie, Adian and Gales, Mark , title =. Proc. EMNLP , pages =

work page
[80]

Kuhn, Lorenz and Gal, Yarin and Farquhar, Sebastian , title =. Proc. ICLR , year =

work page
[81]

Language Models (Mostly) Know What They Know

Kadavath, Saurav and Conerly, Tom and Askell, Amanda and Henighan, Tom and Drain, Dawn and Perez, Ethan and Schiefer, Nicholas and Hatfield-Dodds, Zac and DasSarma, Nova and Tran-Johnson, Eli and others , title =. arXiv preprint arXiv:2207.05221 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[82]

Du, Xuefeng and Xiao, Chaowei and Li, Yixuan , title =. Proc. NeurIPS , year =

work page

Showing first 80 references.