pith. machine review for the scientific record. sign in

arxiv: 2605.14892 · v1 · submitted 2026-05-14 · 💻 cs.AI

Recognition: no theorem link

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-15 03:03 UTC · model grok-4.3

classification 💻 cs.AI
keywords LLM-based multi-agent systemscollaborationfailure attributionself-evolutioncollective intelligenceautonomous agents
0
0 comments X

The pith

LLM multi-agent systems advance through four causally linked stages from foundations to self-evolution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a unified framework for understanding LLM-based multi-agent systems by organizing existing research into four causally dependent stages known as the LIFE progression. This addresses the gap where prior surveys examined collaboration, failure handling, and self-evolution in isolation without exploring how they influence each other. By providing taxonomies for each stage and characterizing the dependencies, the survey shows how better collaboration enables more accurate fault attribution, which in turn supports autonomous evolution. Readers would care because this causal view can inform the design of systems that continuously improve their collective performance by learning from propagated errors rather than treating stages independently.

Core claim

The central claim is that multi-agent LLM systems follow the LIFE progression consisting of laying the capability foundation for individual agents, integrating them through collaboration, finding faults through attribution, and evolving through autonomous self-improvement, with each stage depending on the outputs of the previous and imposing constraints on the subsequent ones, as formalized through systematic taxonomies and boundary challenges.

What carries the argument

The LIFE progression, a sequence of four stages that causally connect individual agent capabilities to collaborative structures, failure diagnosis, and self-improving behaviors.

If this is right

  • Error propagation in multi-agent interactions can be diagnosed by tracing back through collaboration mechanisms to initial capability gaps.
  • Self-evolution mechanisms become more effective when informed by structured failure attribution across multiple agents and rounds.
  • Research agendas should prioritize closed-loop systems that reorganize agent structures based on attributed faults.
  • Existing coordination frameworks can be extended to support self-organizing collective intelligence by incorporating stage dependencies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach could extend to identifying similar causal chains in other types of intelligent systems beyond LLMs.
  • Testing the framework might involve simulating multi-agent tasks where failure attribution is disabled to observe impacts on evolution.
  • Developers could use the stage model to prioritize which capabilities to strengthen first in building new multi-agent applications.

Load-bearing premise

The LIFE stages accurately represent the causal dependencies among collaboration, failure attribution, and self-evolution without overlooking significant literature at the boundaries between stages.

What would settle it

An example of a multi-agent system achieving effective self-evolution despite lacking robust failure attribution mechanisms would falsify the necessity of the proposed stage dependencies.

read the original abstract

LLM-based autonomous agents have demonstrated strong capabilities in reasoning, planning, and tool use, yet remain limited when tasks require sustained coordination across roles, tools, and environments. Multi-agent systems address this through structured collaboration among specialized agents, but tighter coordination also amplifies a less explored risk: errors can propagate across agents and interaction rounds, producing failures that are difficult to diagnose and rarely translate into structural self-improvement. Existing surveys cover individual agent capabilities, multi-agent collaboration, or agent self-evolution separately, leaving the causal dependencies among them unexamined. This survey provides a unified review organized around four causally linked stages, which we term the LIFE progression: Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement. For each stage, we provide systematic taxonomies and formally characterize the dependencies between adjacent stages, revealing how each stage both depends on and constrains the next. Beyond synthesizing existing work, we identify open challenges at stage boundaries and propose a cross-stage research agenda for closed-loop multi-agent systems capable of continuously diagnosing failures, reorganizing structures, and refining agent behaviors, extending current coordination frameworks toward more self-organizing forms of collective intelligence. By bridging these previously fragmented research threads, this survey aims to offer both a systematic reference and a conceptual roadmap toward autonomous, self-improving multi-agent intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper surveys LLM-based multi-agent systems and organizes the literature into a unified LIFE progression framework consisting of four causally linked stages: Lay the capability foundation (individual agent capabilities), Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement. It supplies systematic taxonomies for each stage, formally characterizes dependencies between adjacent stages (each depending on and constraining the next), identifies open challenges at stage boundaries, and proposes a cross-stage research agenda for closed-loop self-organizing multi-agent systems.

Significance. If the taxonomies prove comprehensive and the dependency characterizations hold, the survey would meaningfully bridge previously separate threads on individual agents, collaboration, failure attribution, and self-evolution, providing both a systematic reference and a conceptual roadmap toward autonomous, self-improving collective intelligence in LLM-based systems.

major comments (2)
  1. LIFE progression section: the central claim that stages are causally linked (each depending on and constraining the next) is load-bearing for the unification thesis, yet the characterization remains largely descriptive; concrete literature examples demonstrating measurable constraints (e.g., how collaboration structures amplify or limit attribution accuracy) should be added with citations to substantiate the causal framing over a purely organizational one.
  2. Taxonomy for the Find faults stage: the coverage of cross-agent error propagation and diagnosis mechanisms appears incomplete relative to the claimed dependency from the Integrate stage; explicit inclusion of distributed failure models from the surveyed literature is needed to support the progression's causal structure.
minor comments (2)
  1. The abstract and introduction should state the literature selection criteria and approximate number of papers reviewed to allow readers to assess taxonomy completeness.
  2. Figure or table summarizing the LIFE stages and their dependencies would improve clarity and help readers trace the claimed causal links.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. The comments highlight important opportunities to strengthen the causal claims in the LIFE framework, and we will revise the manuscript to address them directly.

read point-by-point responses
  1. Referee: [—] LIFE progression section: the central claim that stages are causally linked (each depending on and constraining the next) is load-bearing for the unification thesis, yet the characterization remains largely descriptive; concrete literature examples demonstrating measurable constraints (e.g., how collaboration structures amplify or limit attribution accuracy) should be added with citations to substantiate the causal framing over a purely organizational one.

    Authors: We agree that concrete examples are needed to substantiate the causal linkages rather than leaving them descriptive. In the revised manuscript, we will expand the LIFE progression section with specific literature examples illustrating measurable constraints, such as how hierarchical versus decentralized collaboration structures affect attribution accuracy (citing relevant works on multi-agent coordination and fault diagnosis from the surveyed literature). This will reinforce the causal framing over a purely organizational one. revision: yes

  2. Referee: [—] Taxonomy for the Find faults stage: the coverage of cross-agent error propagation and diagnosis mechanisms appears incomplete relative to the claimed dependency from the Integrate stage; explicit inclusion of distributed failure models from the surveyed literature is needed to support the progression's causal structure.

    Authors: We acknowledge this gap in the Find faults taxonomy. We will revise the section to explicitly incorporate distributed failure models and cross-agent error propagation mechanisms drawn from the surveyed literature. These additions will directly support the claimed dependency on the Integrate stage by demonstrating how collaboration structures shape error diagnosis capabilities. revision: yes

Circularity Check

0 steps flagged

No significant circularity in survey organization

full rationale

This is a literature survey that introduces the LIFE progression as an organizing framework for existing external research on LLM agents, collaboration, fault attribution, and self-evolution. No equations, fitted parameters, predictions, or derivations appear in the provided text. The four stages and their claimed dependencies are presented as a descriptive taxonomy drawn from prior work rather than computed or fitted from the paper's own content. No self-citation chains, ansatzes, or renamings reduce any central claim to the paper's inputs by construction. The synthesis is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The survey rests on standard domain assumptions about current LLM agent limitations and proposes a new organizational framework without introducing fitted parameters or new entities.

axioms (1)
  • domain assumption LLM-based autonomous agents have demonstrated strong capabilities in reasoning, planning, and tool use yet remain limited when tasks require sustained coordination
    Presented as established background in the abstract to motivate the need for multi-agent approaches.

pith-pipeline@v0.9.0 · 5602 in / 1267 out tokens · 83917 ms · 2026-05-15T03:03:42.476724+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

298 extracted references · 298 canonical work pages · 17 internal anchors

  1. [1]

    2025 , month = aug, howpublished =

    Introducing. 2025 , month = aug, howpublished =

  2. [2]

    arXiv preprint arXiv:2412.19437 , year =

  3. [3]

    Qwen3 Technical Report

    Yang, A. and others , title =. arXiv preprint arXiv:2505.09388 , year =

  4. [4]

    2026 , month = feb, howpublished =

    Introducing. 2026 , month = feb, howpublished =

  5. [5]

    and Yang, D

    Guo, D. and Yang, D. and Zhang, H. and others , title =. Nature , volume =

  6. [6]

    and Bosma, M

    Wei, J. and Bosma, M. and Zhao, V. Y. and Guu, K. and Yu, A. W. and Lester, B. and Du, N. and Dai, A. M. and Le, Q. V. , title =. Proc. ICLR , year =

  7. [7]

    and Wang, X

    Wei, J. and Wang, X. and Schuurmans, D. and Bosma, M. and Ichter, B. and Xia, F. and Chi, E. H. and Le, Q. V. and Zhou, D. , title =. Proc. NeurIPS , volume =

  8. [8]

    and Gu, S

    Kojima, T. and Gu, S. S. and Reid, M. and Matsuo, Y. and Iwasawa, Y. , title =. Proc. NeurIPS , volume =

  9. [9]

    2026 , month = mar, howpublished =

    Introducing. 2026 , month = mar, howpublished =

  10. [10]

    and Ma, C

    Wang, L. and Ma, C. and Feng, X. and Zhang, Z. and Yang, H. and Zhang, J. and Chen, Z. and Tang, J. and Chen, X. and Lin, Y. and Zhao, W. X. and Wei, Z. and Wen, J.-R. , title =. Frontiers of Computer Science , volume =. 2024 , note =

  11. [11]

    and Chen, W

    Xi, Z. and Chen, W. and Guo, X. and He, W. and Ding, Y. and Hong, B. and Zhang, M. and Wang, J. and Jin, S. and Zhou, E. and Zheng, R. and Fan, X. and Wang, X. and Xiong, L. and Zhou, Y. and Wang, W. and Jiang, C. and Zou, Y. and Liu, X. and Yin, Z. and Dou, S. and Weng, R. and Cheng, W. and Zhang, Q. and Qin, Y. and Zheng, Y. and Qiu, X. and Huang, X. an...

  12. [12]

    Jimenez, C. E. and Yang, J. and Wettig, A. and Yao, S. and Pei, K. and Press, O. and Narasimhan, K. , title =. Proc. ICLR , year =

  13. [13]

    Qian, Chen and Liu, Wei and Liu, Hongzhang and Chen, Nuo and Dang, Yufan and Li, Jiahao and Yang, Cheng and Chen, Weize and Su, Yusheng and Cong, Xin and Xu, Juyuan and Li, Dahai and Liu, Zhiyuan and Sun, Maosong , title =. Proc. ACL , pages =. 2024 , publisher =

  14. [14]

    Boiko, D. A. and MacKnight, R. and Kline, B. and Gomes, G. , title =. Nature , volume =

  15. [15]

    Bran, A. M. and Cox, S. and Schilter, O. and Baldassari, C. and White, A. D. and Schwaller, P. , title =. Nature Machine Intelligence , volume =

  16. [16]

    and Joshi, T

    Zitkovich, B. and Joshi, T. J. and Irpan, A. and Ichter, B. and Hsu, J. and Herzog, A. and Hausman, K. and Gopalakrishnan, K. and Fu, C. and Florence, P. and Finn, C. and Dubey, K. A. and Driess, D. and Ding, T. and Choromanski, K. M. and Chen, X. and Chebotar, Y. and Carbajal, J. and Brown, N. and Brohan, A. and Arenas, M. G. and Han, K. , title =. Proc....

  17. [17]

    Huang, Wenlong and Xia, Fei and Xiao, Ted and Chan, Harris and Liang, Jacky and Florence, Pete and Zeng, Andy and Tompson, Jonathan and Mordatch, Igor and Chebotar, Yevgen and Sermanet, Pierre and Jackson, Tomas and Brown, Noah and Luu, Linda and Levine, Sergey and Hausman, Karol and Ichter, Brian , title =. Proc. CoRL , volume =. 2023 , publisher =

  18. [18]

    and Zhao, J

    Yao, S. and Zhao, J. and Yu, D. and Du, N. and Shafran, I. and Narasimhan, K. and Cao, Y. , title =. Proc. ICLR , year =

  19. [19]

    and Cassano, F

    Shinn, N. and Cassano, F. and Gopinath, A. and Narasimhan, K. and Yao, S. , title =. Proc. NeurIPS , volume =

  20. [20]

    Why Do Multi-Agent LLM Systems Fail?

    Cemri, Mert and Pan, Melissa Z. and Yang, Shuyi and Agrawal, Lakshya A. and Chopra, Bhavya and Tiwari, Rishabh and Keutzer, Kurt and Parameswaran, Aditya and Klein, Dan and Ramchandran, Kannan and Zaharia, Matei and Gonzalez, Joseph E. and Stoica, Ion , title =. arXiv preprint arXiv:2503.13657 , year =

  21. [21]

    and Zhuge, M

    Hong, S. and Zhuge, M. and Chen, J. and Zheng, X. and Cheng, Y. and Wang, J. and Zhang, C. and Wang, Z. and Yau, S. K. S. and Lin, Z. and Zhou, L. and Ran, C. and Xiao, L. and Wu, C. and Schmidhuber, J. , title =. Proc. ICLR , year =

  22. [22]

    and Chen, X

    Guo, T. and Chen, X. and Wang, Y. and Chang, R. and Pei, S. and Chawla, N. V. and Wiest, O. and Zhang, X. , title =. Proc. IJCAI , pages =

  23. [23]

    and Bansal, G

    Wu, Q. and Bansal, G. and Zhang, J. and Wu, Y. and Li, B. and Zhu, E. and Jiang, L. and Zhang, X. and Zhang, S. and Liu, J. and Awadallah, A. H. and White, R. W. and Burger, D. and Wang, C. , title =. Proc. COLM , year =

  24. [24]

    and Han, S

    Park, C. and Han, S. and Guo, X. and Ozdaglar, A. and Zhang, K. and Kim, J.-K. , title =. Proc. ACL , pages =

  25. [25]

    2024 , month = nov, howpublished =

    Introducing the. 2024 , month = nov, howpublished =

  26. [26]

    2025 , month = apr, howpublished =

    Announcing the. 2025 , month = apr, howpublished =

  27. [27]

    and Yin, M

    Zhang, S. and Yin, M. and Zhang, J. and Liu, J. and Han, Z. and Zhang, J. and Li, B. and Wang, C. and Wang, H. and Chen, Y. and Wu, Q. , title =. Proc. ICML , pages =

  28. [28]

    TRAIL: Trace reasoning and agentic issue localization.arXiv preprint arXiv:2505.08638, 2025

    Deshpande, D. and Gangal, V. and Mehta, H. and Krishnan, J. and Kannappan, A. and Qian, R. , title =. arXiv preprint arXiv:2505.08638 , year =

  29. [29]

    and Xie, X

    Ma, X. and Xie, X. and Wang, Y. and Wang, J. and Wu, B. and Li, M. and Wang, Q. , title =. arXiv preprint arXiv:2509.23735 , year =

  30. [30]

    A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

    Gao, H.-a. and Geng, J. and Hua, W. and Hu, M. and Juan, X. and Liu, H. and Liu, S. and Qiu, J. and Qi, X. and Ren, Q. and Wu, Y. and Wang, H. and Xiao, H. and Zhou, Y. and Zhang, S. and Zhang, J. and Xiang, J. and Fang, Y. and Zhao, Q. and Liu, D. and Qian, C. and Wang, Z. and Hu, M. and Wang, H. and Wu, Q. and Ji, H. and Wang, M. , title =. arXiv prepri...

  31. [31]

    and Peng, Y

    Fang, J. and Peng, Y. and Zhang, X. and Wang, Y. and Yi, X. and Zhang, G. and Xu, Y. and Wu, B. and Liu, S. and Li, Z. and Ren, Z. and Aletras, N. and Wang, X. and Zhou, H. and Meng, Z. , title =. arXiv preprint arXiv:2508.07407 , year =

  32. [32]

    and Wu, W

    Wang, Y. and Wu, W. and Wang, J. and Wang, Q. , title =. arXiv preprint arXiv:2602.23701 , year =

  33. [33]

    and Qian, C

    Dang, Y. and Qian, C. and others , title =. Proc. NeurIPS , year =

  34. [34]

    Where llm agents fail and how they can learn from failures.arXiv preprint arXiv:2509.25370, 2025

    Zhu, K. and Liu, Z. and Li, B. and Tian, M. and Yang, Y. and Zhang, J. and Han, P. and Xie, Q. and Cui, F. and Zhang, W. and Ma, X. and Yu, X. and Ramesh, G. and Wu, J. and Liu, Z. and Lu, P. and Zou, J. and You, J. , title =. arXiv preprint arXiv:2509.25370 , year =

  35. [35]

    and Dai, Q

    Zhang, Z. and Dai, Q. and Bo, X. and Ma, C. and Li, R. and Chen, X. and others , title =. ACM Transactions on Information Systems , volume =. 2025 , note =

  36. [36]

    and Zhang, Z

    Wei, H. and Zhang, Z. and He, S. and Xia, T. and Pan, S. and Liu, F. , title =. Proc. ACL , pages =

  37. [38]

    and Wang, S

    Li, X. and Wang, S. and Zeng, S. and Wu, Y. and Yang, Y. , title =. Vicinagearth , volume =. 2024 , note =

  38. [39]

    Multi-Agent Collaboration Mechanisms: A Survey of LLMs

    Tran, K.-T. and others , title =. arXiv preprint arXiv:2501.06322 , year =

  39. [41]

    and others , title =

    Han, S. and others , title =. arXiv preprint arXiv:2402.03578 , year =

  40. [42]

    GPT-4 Technical Report

    OpenAI , title =. arXiv preprint arXiv:2303.08774 , year =

  41. [43]

    and Lavril, T

    Touvron, H. and Lavril, T. and Izacard, G. and Martinet, X. and Lachaux, M.-A. and Lacroix, T. and Rozi\`. LLaMA: Open and efficient foundation language models , journal =

  42. [44]

    and Xu, F

    Zhou, S. and Xu, F. F. and Zhu, H. and Zhou, X. and Lo, R. and Sridhar, A. and Cheng, X. and Ou, T. and Bisk, Y. and Fried, D. and Alon, U. and Neubig, G. , title =. Proc. ICLR , year =

  43. [45]

    Mathematics Into Type , howpublished =

  44. [46]

    The Rise and Potential of Large Language Model Based Agents: A Survey

    Xi, Z. and Chen, W. and Guo, X. and He, W. and Ding, Y. and Hong, B. and Zhang, M. and Wang, J. and Jin, S. and Zhou, E. and Zheng, R. and Fan, X. and Wang, X. and Xiong, L. and Zhou, Y. and Wang, W. and C. Jiang and Zou, Y. and Liu, X. and Yin, Z. and Dou, S. and Weng, R. and Cheng, W. and Zhang, Q. and Qin, Y. and Zheng, Y. and Qiu, X. and Huang, X. and...

  45. [47]

    Chaundy, T. W. and Barrett, P. R. and Batey, C. , title =. 1954 , publisher =

  46. [48]

    and Goossens, M

    Mittelbach, F. and Goossens, M. , title =. 2004 , publisher =

  47. [49]

    More Math Into LaTeX , year =

    Gr\". More Math Into LaTeX , year =

  48. [50]

    and Sharp, J

    Letourneau, M. and Sharp, J. W. , title =

  49. [51]

    , title =

    Sira-Ramirez, H. , title =. Systems & Control Letters , volume =

  50. [52]

    , title =

    Levant, A. , title =. Proc. IEEE CDC , pages =. 2006 , address =

  51. [53]

    and Join, C

    Fliess, M. and Join, C. and Sira-Ramirez, H. , title =. International Journal of Modelling, Identification and Control , volume =

  52. [54]

    and Astolfi, A

    Ortega, R. and Astolfi, A. and Bastin, G. and Rodriguez, H. , title =. Proc. ACC , pages =. 2000 , address =

  53. [55]

    Findings of ACL , pages =

    Jie Huang and Kevin Chen-Chuan Chang , title =. Findings of ACL , pages =

  54. [56]

    From System 1 to System 2: A Survey of Reasoning Large Language Models

    Fei Sun and Chaochao Chen and Shuai Li and others , title =. arXiv preprint arXiv:2502.17419 , year =

  55. [57]

    Retrieval-Augmented Generation for Knowledge-Intensive

    Patrick Lewis and Ethan Perez and Aleksandra Piktus and Fabio Petroni and Vladimir Karpukhin and Naman Goyal and Heinrich K\". Retrieval-Augmented Generation for Knowledge-Intensive. Proc. NeurIPS , volume =

  56. [58]

    Akari Asai and Zeqiu Wu and Yizhong Wang and Avirup Sil and Hannaneh Hajishirzi , title =. Proc. ICLR , year =

  57. [59]

    Mufei Li and Siqi Miao and Pan Li , title =. Proc. ICLR , year =

  58. [60]

    Yudi Zhang and Pei Xiao and Lu Wang and Chaoyun Zhang and Meng Fang and Yali Du and Yevgeniy Puzyrev and Randolph Yao and Si Qin and Qingwei Lin and Mykola Pechenizkiy and Dongmei Zhang and Saravanakumar Rajmohan and Qi Zhang , title =. Proc. ICLR , year =

  59. [61]

    Transactions on Machine Learning Research , year =

    Zhuosheng Zhang and Aston Zhang and Mu Li and Hai Zhao and George Karypis and Alex Smola , title =. Transactions on Machine Learning Research , year =

  60. [62]

    Smith and Ranjay Krishna , title =

    Yushi Hu and Weijia Shi and Xingyu Fu and Dan Roth and Mari Ostendorf and Luke Zettlemoyer and Noah A. Smith and Ranjay Krishna , title =. Proc. NeurIPS , volume =

  61. [63]

    Ji Qi and Ming Ding and Weihan Wang and Yushi Bai and Qingsong Lv and Wenyi Hong and Bin Xu and Lei Hou and Juanzi Li and Yuxiao Dong and Jie Tang , title =. Proc. ICLR , year =

  62. [64]

    Besta, Maciej and Blach, Nils and Kubicek, Ales and Gerstenberger, Robert and Podstawski, Michal and Gianinazzi, Lukas and Gajda, Joanna and Lehmann, Tomasz and Niewiadomski, Hubert and Nyczyk, Piotr and Hoefler, Torsten , title =. Proc. AAAI , volume =

  63. [65]

    Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

    Snell, Charlie and Lee, Jaehoon and Xu, Kelvin and Kumar, Aviral , title =. arXiv preprint arXiv:2408.03314 , year =

  64. [66]

    Wang, Peiyi and Li, Lei and Shao, Zhihong and Xu, Runxin and Dai, Damai and Li, Yifei and Chen, Deli and Wu, Yu and Sui, Zhifang , title =. Proc. ACL , pages =

  65. [67]

    Kojima, Takeshi and Gu, Shixiang Shane and Reid, Machel and Matsuo, Yutaka and Iwasawa, Yusuke , title =. Proc. NeurIPS , volume =

  66. [68]

    Yao, Shunyu and Yu, Dian and Zhao, Jeffrey and Shafran, Izhak and Griffiths, Tom and Cao, Yuan and Narasimhan, Karthik , title =. Proc. NeurIPS , volume =

  67. [69]

    Wang, Xuezhi and Wei, Jason and Schuurmans, Dale and Le, Quoc and Chi, Ed and Narang, Sharan and Chowdhery, Aakanksha and Zhou, Denny , title =. Proc. ICLR , year =

  68. [70]

    Madaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and Gupta, Shashank and Majumder, Bodhisattwa Prasad and Hermann, Katherine and Welleck, Sean and Yazdanbakhsh, Amir and Clark, Peter , title =. Proc. NeurIPS , volume =

  69. [71]

    Lightman, Hunter and Kosaraju, Vineet and Burda, Yuri and Edwards, Harrison and Baker, Bowen and Lee, Teddy and Leike, Jan and Schulman, John and Sutskever, Ilya and Cobbe, Karl , title =. Proc. ICLR , year =

  70. [72]

    Nature , year =

    DeepSeek-R1: Incentivizing Reasoning Capability in. Nature , year =

  71. [73]

    Zhang, Xuan and Du, Chao and Pang, Tianyu and Liu, Qian and Gao, Wei and Lin, Min , title =. Proc. NeurIPS , volume =

  72. [74]

    Luo, Haipeng and Sun, Qingfeng and Xu, Can and Zhao, Pu and Lou, Jian-Guang and Tao, Chongyang and Geng, Xiubo and Lin, Qingwei and Chen, Shifeng and Tang, Yansong and Zhang, Dongmei , title =. Proc. ICLR , year =

  73. [75]

    ACM Computing Surveys , volume =

    Ji, Ziwei and Lee, Nayeon and Frieske, Rita and Yu, Tiezheng and Su, Dan and Xu, Yan and Ishii, Etsuko and Bang, Yejin and Chen, Delong and Dai, Wenliang and Chan, Ho Shu and Madotto, Andrea and Fung, Pascale , title =. ACM Computing Surveys , volume =

  74. [76]

    Min, Sewon and Krishna, Kalpesh and Lyu, Xinxi and Lewis, Mike and Yih, Wen-tau and Koh, Pang Wei and Iyyer, Mohit and Zettlemoyer, Luke and Hajishirzi, Hannaneh , title =. Proc. EMNLP , pages =

  75. [77]

    arXiv preprint arXiv:2307.13528 , year =

    Chern, I-Chun and Chern, Steffi and Chen, Shiqi and Yuan, Weizhe and Feng, Kehua and Zhou, Chunting and He, Junxian and Neubig, Graham and Liu, Pengfei , title =. arXiv preprint arXiv:2307.13528 , year =

  76. [78]

    Findings of ACL , year =

    Yuxia Wang and Revanth Gangi Reddy and Zain Muhammad Mujahid and Arnav Arora and Aleksandr Rubashevskii and Jiahui Geng and Osama Mohammed Afzal and Liangming Pan and Nadav Borenstein and Aditya Pillai and Isabelle Augenstein and Iryna Gurevych and Preslav Nakov , title =. Findings of ACL , year =

  77. [79]

    Manakul, Potsawee and Liusie, Adian and Gales, Mark , title =. Proc. EMNLP , pages =

  78. [80]

    Kuhn, Lorenz and Gal, Yarin and Farquhar, Sebastian , title =. Proc. ICLR , year =

  79. [81]

    Language Models (Mostly) Know What They Know

    Kadavath, Saurav and Conerly, Tom and Askell, Amanda and Henighan, Tom and Drain, Dawn and Perez, Ethan and Schiefer, Nicholas and Hatfield-Dodds, Zac and DasSarma, Nova and Tran-Johnson, Eli and others , title =. arXiv preprint arXiv:2207.05221 , year =

  80. [82]

    Du, Xuefeng and Xiao, Chaowei and Li, Yixuan , title =. Proc. NeurIPS , year =

Showing first 80 references.