pith. sign in

arxiv: 2606.22610 · v1 · pith:4RV5GIBCnew · submitted 2026-06-21 · 💻 cs.AI

PaperClaw: Harnessing Agents for Autonomous Research and Human-in-the-Loop Refinement

Pith reviewed 2026-06-26 10:34 UTC · model grok-4.3

classification 💻 cs.AI
keywords multi-agent systemsautonomous researchlarge language modelsresearch automationhuman-in-the-looppaper generationhypothesis testing
0
0 comments X

The pith

PAPERCLAW is a multi-agent system that autonomously carries a research project from domain curation through hypothesis testing to a finished venue-compliant paper.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PAPERCLAW as a harnessed multi-agent system that automates the full research lifecycle from selecting a field of study to producing a finished paper. It curates a domain using live literature, datasets, and code, formulates an idea under a pre-registered main-result contract, and runs an iterative propose-test-reflect loop on a hypothesis map that grows only from measured verdicts until the evidence supports stopping. A full-lifecycle memory keeps every stage in one living record so the process can be paused, inspected, or resumed, and the same interface allows a human to step in at any point for refinement. All outputs stay grounded by citing only references validated against open indexes and reporting results that actually ran. An evaluation using an LLM judge finds that the system produces strong papers both when running fully autonomously and when refined with human input.

Core claim

PAPERCLAW harnesses agents to curate a domain from a field's live literature, datasets, and code; brainstorm an idea with a pre-registered main-result contract; drive a stoppable hypothesis map through an iterative propose-test-reflect loop that advances only on measured verdicts and halts once evidence supports the idea; then write a venue-compliant paper, all while maintaining a single full-lifecycle memory record that supports pausing, inspection, resumption, and human-in-the-loop refinement at any stage.

What carries the argument

The in-cycle research assistant with research tools and skills, operating inside the full-lifecycle memory that maintains one living record of the entire project.

If this is right

  • A research project can proceed from field selection to finished paper without continuous human direction.
  • Any stage of the process remains inspectable and resumable through the single living memory record.
  • Human researchers can intervene at chosen points to strengthen an autonomous draft without restarting the project.
  • All citations and results in the output remain traceable to validated external sources and actual executions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same interface that supports full autonomy could also serve as a scaffold for teams that want to offload routine steps while retaining oversight.
  • The grounding requirement (validated citations and executed results) limits the system to domains where open data and code already exist.
  • Extending the hypothesis map to include real-world lab experiments would require additional tool interfaces beyond the current literature and code capabilities.

Load-bearing premise

An LLM judge can provide a reliable and unbiased evaluation of the quality and validity of the generated research papers.

What would settle it

A side-by-side comparison of the same set of PAPERCLAW papers rated by domain-expert humans versus the LLM judge to measure agreement on quality and validity.

Figures

Figures reproduced from arXiv: 2606.22610 by Dongyuan Li, Hangchen Liu, Renhe Jiang, Weiwei Ye.

Figure 1
Figure 1. Figure 1: Method overview. PAPERCLAW turns a curated domain (papers, datasets, code, venues) into an idea specification, then drives a hypothesis map through the iterative propose→ test →reflect loop, growing the map only from measured verdicts (green supported, red refuted, amber inconclusive), until the evidence is sufficient to compile a paper. Throughout, an in-cycle research assistant lets a user step in at any… view at source ↗
Figure 2
Figure 2. Figure 2: Full-lifecycle memory. Each level owns a canonical, persisted record. Stages [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The hypothesis map grows one verdicted level at a time. Children may be added only after the parent has an experimental verdict, so H1.1 and H1.2 are motivated by H1’s measured result, never invented up front. A refuted branch (H1.1) simply stops; a supported one (H1.2) can be expanded further within the budget. Pinning those criteria before the experiment runs is pre-registration at the level of each hypo… view at source ↗
Figure 4
Figure 4. Figure 4: The in-cycle research assistant in human-in-the-loop use. The chat (left) lets a user collaborate with the [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Architecture. Three surfaces (web, desktop, command line) call [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
read the original abstract

Large language models have become capable reasoners and tool users that write and run code and search the literature, which makes automating the research process itself a realistic goal. We present PAPERCLAW, a harnessed multi-agent system that carries a project autonomously, from a field of study to a finished paper. PAPERCLAW curates a domain from a field's live literature, datasets, and code; brainstorms it into an idea with a pre-registered main-result contract; and drives a stoppable hypothesis map through an iterative propose, test, reflect loop that grows only from measured verdicts and halts once the evidence supports the idea, at which point it writes a venue-compliant paper. A full-lifecycle memory keeps each stage in a single living record, so a long run can be paused, inspected, and resumed without losing context. At the centre is an in-cycle research assistant with research tools and skills: it can drive the whole pipeline on its own, while the same interface lets a person step in at any stage, turning a first autonomous draft into a stronger paper through human-in-the-loop refinement. Throughout, PAPERCLAW keeps its output grounded and checkable, citing only references validated against open scholarly indexes and reporting results that genuinely ran. An evaluation with an LLM judge finds that PAPERCLAW produces strong papers both fully autonomously and with human-in-the-loop refinement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript presents PAPERCLAW, a multi-agent harness that automates the research pipeline from domain curation (live literature, datasets, code) through idea generation with pre-registered result contracts, iterative propose-test-reflect hypothesis mapping, and production of venue-compliant papers. A full-lifecycle memory supports pausing/resuming and human-in-the-loop intervention at any stage. The central empirical claim, stated in the abstract, is that an LLM-judge evaluation finds the system produces strong papers both fully autonomously and with human refinement, while maintaining grounding via validated citations and executed results.

Significance. If the evaluation were shown to be reliable, the work would offer a concrete, tool-using agent architecture for end-to-end autonomous research with built-in human collaboration points; the emphasis on verifiable references and stoppable loops distinguishes it from purely generative approaches. The absence of any methodological detail on the judge, however, prevents assessing whether this contribution is substantiated.

major comments (1)
  1. [Abstract and Evaluation section] Abstract and Evaluation section: The headline result—that PAPERCLAW produces strong papers autonomously and with human-in-the-loop—is asserted solely via “an evaluation with an LLM judge.” No judge prompt, scoring rubric, number of papers evaluated, inter-annotator agreement, human-expert correlation, or non-LLM baseline is supplied. Because this is the only empirical support offered for the central claim, the evaluation methodology is load-bearing and its current description renders the claim unverifiable.
minor comments (1)
  1. [Abstract] The abstract packs multiple system components into a single long sentence; splitting the description of the memory, research assistant interface, and grounding mechanisms would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review. We agree that the evaluation methodology is central to the paper's claims and that its current description is insufficient for verification. We will revise the manuscript to address this.

read point-by-point responses
  1. Referee: [Abstract and Evaluation section] Abstract and Evaluation section: The headline result—that PAPERCLAW produces strong papers autonomously and with human-in-the-loop—is asserted solely via “an evaluation with an LLM judge.” No judge prompt, scoring rubric, number of papers evaluated, inter-annotator agreement, human-expert correlation, or non-LLM baseline is supplied. Because this is the only empirical support offered for the central claim, the evaluation methodology is load-bearing and its current description renders the claim unverifiable.

    Authors: We agree with the referee that the Evaluation section does not supply the requested methodological details and that this renders the central empirical claim unverifiable as currently written. In the revised manuscript we will expand the Evaluation section to include the complete LLM judge prompt, the scoring rubric, the number of papers evaluated, inter-annotator agreement statistics (where applicable), correlation with human-expert judgments, and at least one non-LLM baseline comparison. These additions will make the evaluation transparent and allow proper assessment of the reported results. revision: yes

Circularity Check

0 steps flagged

No circularity; no derivations or self-referential reductions present

full rationale

The paper is a systems description of a multi-agent harness for autonomous paper generation. It contains no equations, no fitted parameters, no derivation chain, and no mathematical claims that could reduce to inputs by construction. The evaluation statement relies on an LLM judge but supplies no details that create a self-definitional loop or fitted-input prediction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The result is therefore self-contained against external benchmarks with no circular steps to flag.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract contains no explicit free parameters, axioms, or invented entities; the system is described at a conceptual level without technical internals.

pith-pipeline@v0.9.1-grok · 5785 in / 1066 out tokens · 41301 ms · 2026-06-26T10:34:31.933424+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

87 extracted references · 61 linked inside Pith

  1. [1]

    Gomez and Łukasz Kaiser and Illia Polosukhin , year =

    Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Łukasz Kaiser and Illia Polosukhin , year =. arXiv preprint arXiv:1706.03762 , doi =

  2. [2]

    arXiv preprint arXiv:1810.04805 , doi =

    Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova , year =. arXiv preprint arXiv:1810.04805 , doi =

  3. [3]

    T. B. Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared Kaplan and Prafulla Dhariwal and Arvind Neelakantan and Pranav Shyam and Girish Sastry and Amanda Askell and Sandhini Agarwal and Ariel Herbert-Voss and others , year =. arXiv preprint arXiv:2005.14165 , doi =

  4. [4]

    Wainwright and Pamela Mishkin and Chong Zhang and Sandhini Agarwal and Katarina Slama and Alex Ray and John Schulman and Jacob Hilton and others , year =

    Long Ouyang and Jeff Wu and Xu Jiang and Diogo Almeida and Carroll L. Wainwright and Pamela Mishkin and Chong Zhang and Sandhini Agarwal and Katarina Slama and Alex Ray and John Schulman and Jacob Hilton and others , year =. arXiv preprint arXiv:2203.02155 , doi =

  5. [5]

    and Tatsunori Hashimoto and others , year =

    Wei, Jason and Yi Tay and Rishi Bommasani and Colin Raffel and Barret Zoph and Sebastian Borgeaud and Dani Yogatama and Maarten Bosma and Denny Zhou and Donald Metzler and Ed H. and Tatsunori Hashimoto and others , year =. arXiv preprint arXiv:2206.07682 , doi =

  6. [6]

    Brown and Benjamin Chess and Rewon Child and Scott Gray and Alec Radford and Jeffrey Wu and Dario Amodei , year =

    Jared Kaplan and Sam McCandlish and Tom Henighan and Tom B. Brown and Benjamin Chess and Rewon Child and Scott Gray and Alec Radford and Jeffrey Wu and Dario Amodei , year =. arXiv preprint arXiv:2001.08361 , doi =

  7. [7]

    arXiv preprint arXiv:2203.15556 , doi =

    Jordan Hoffmann and Sebastian Borgeaud and Arthur Mensch and Elena Buchatskaya and Trevor Cai and Eliza Rutherford and Diego de Las Casas and Lisa Anne Hendricks and Johannes Welbl and Aidan Clark and Tom Hennigan and Eric Noland and others , year =. arXiv preprint arXiv:2203.15556 , doi =

  8. [8]

    arXiv preprint arXiv:2204.02311 , doi =

    Aakanksha Chowdhery and Sharan Narang and Jacob Devlin and Maarten Bosma and Gaurav Mishra and Adam Roberts and Paul Barham and Hyung Won Chung and Charles Sutton and Sebastian Gehrmann and Parker Schuh and Kensen Shi and others , year =. arXiv preprint arXiv:2204.02311 , doi =

  9. [9]

    Stone and Peter J

    Hugo Touvron and Louis Martin and Kevin H. Stone and Peter J. Albert and Amjad Almahairi and Yasmine Babaei and Nikolay Bashlykov and Soumya Batra and Prajjwal Bhargava and Shruti Bhosale and Dan Bikel and Lukas Blecher and others , year =. arXiv preprint arXiv:2307.09288 , doi =

  10. [10]

    arXiv preprint arXiv:2212.08073 , doi =

    Yuntao Bai and Saurav Kadavath and Sandipan Kundu and Amanda Askell and Jackson Kernion and Andy Jones and Anna Chen and Anna Goldie and Azalia Mirhoseini and Cameron McKinnon and Carol Chen and Catherine Olsson and others , year =. arXiv preprint arXiv:2212.08073 , doi =

  11. [11]

    Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J

    Colin Raffel and Noam Shazeer and Adam P. Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , year =. arXiv preprint arXiv:1910.10683 , doi =

  12. [12]

    arXiv preprint arXiv:2303.08774 , doi =

    OpenAI and Achiam, Josh and Adler, Steven and Agarwal, Sandhini and Ahmad, Lama and Akkaya, Ilge and Aleman, Florencia Leoni and Almeida, Diogo and Altenschmidt, Janko and Altman, Sam and Anadkat, Shyamal and Avila, Red and others , year =. arXiv preprint arXiv:2303.08774 , doi =

  13. [13]

    Chi and Quoc V

    Jason Wei and Xuezhi Wang and Dale Schuurmans and Maarten Bosma and Brian Ichter and Fei Xia and Ed H. Chi and Quoc V. Le and Denny Zhou , year =. arXiv preprint arXiv:2201.11903 , doi =

  14. [14]

    Le and Ed H

    Xuezhi Wang and Wei, Jason and Dale Schuurmans and Quoc V. Le and Ed H. and Narang, Sharan and Chowdhery, Aakanksha and Zhou, Denny , year =. arXiv preprint arXiv:2203.11171 , doi =

  15. [15]

    Le and Ed H

    Denny Zhou and Nathanael Schärli and Le Hou and Wei, Jason and Nathan Scales and Xuezhi Wang and Dale Schuurmans and Claire Cui and Olivier Bousquet and Quoc V. Le and Ed H. , year =. arXiv preprint arXiv:2205.10625 , doi =

  16. [16]

    arXiv preprint arXiv:2205.11916 , doi =

    Takeshi Kojima and Shixiang Gu and Machel Reid and Yutaka Matsuo and Yusuke Iwasawa , year =. arXiv preprint arXiv:2205.11916 , doi =

  17. [17]

    arXiv preprint arXiv:2210.03629 , doi =

    Shunyu Yao and Jeffrey Zhao and Dian Yu and Nan Du and Izhak Shafran and Karthik Narasimhan and Yuan Cao , year =. arXiv preprint arXiv:2210.03629 , doi =

  18. [18]

    Griffiths and Yuan Cao and Karthik Narasimhan , year =

    Shunyu Yao and Dian Yu and Jeffrey Zhao and Izhak Shafran and Thomas L. Griffiths and Yuan Cao and Karthik Narasimhan , year =. arXiv preprint arXiv:2305.10601 , doi =

  19. [19]

    Niewiadomski and Piotr Nyczyk and Torsten Hoefler , year =

    Maciej Besta and Nils Blach and Ales Kubicek and Robert Gerstenberger and Michał Podstawski and Lukas Gianinazzi and Joanna Gajda and Tomasz Lehmann and H. Niewiadomski and Piotr Nyczyk and Torsten Hoefler , year =. arXiv preprint arXiv:2308.09687 , doi =

  20. [20]

    Le and Ed H

    Pei Zhou and Jay Pujara and Xiang Ren and Xinyun Chen and Heng-Tze Cheng and Quoc V. Le and Ed H. and Denny Zhou and Swaroop Mishra and Huaixiu Zheng , year =. arXiv preprint arXiv:2402.03620 , doi =

  21. [21]

    arXiv preprint arXiv:2303.17651 , doi =

    Aman Madaan and Niket Tandon and Prakhar Gupta and Skyler Hallinan and Luyu Gao and Sarah Wiegreffe and Uri Alon and Nouha Dziri and Shrimai Prabhumoye and Yiming Yang and Shashank Gupta and Bodhisattwa Prasad Majumder and others , year =. arXiv preprint arXiv:2303.17651 , doi =

  22. [22]

    arXiv preprint arXiv:2303.11366 , doi =

    Noah Shinn and Cassano, Federico and Berman, Edward and Gopinath, Ashwin and Narasimhan, Karthik and Yao, Shunyu , year =. arXiv preprint arXiv:2303.11366 , doi =

  23. [23]

    arXiv preprint arXiv:2305.11738 , doi =

    Zhibin Gou and Zhihong Shao and Yeyun Gong and Yelong Shen and Yujiu Yang and Nan Duan and Weizhu Chen , year =. arXiv preprint arXiv:2305.11738 , doi =

  24. [24]

    arXiv preprint arXiv:2304.05128 , doi =

    Xinyun Chen and Maxwell Lin and Nathanael Schärli and Denny Zhou , year =. arXiv preprint arXiv:2304.05128 , doi =

  25. [25]

    arXiv preprint arXiv:2302.04761 , doi =

    Timo Schick and Jane Dwivedi-Yu and Roberto Dessì and Roberta Raileanu and Maria Lomeli and Luke Zettlemoyer and Nicola Cancedda and Thomas Scialom , year =. arXiv preprint arXiv:2302.04761 , doi =

  26. [26]

    arXiv preprint arXiv:2211.10435 , doi =

    Luyu Gao and Aman Madaan and Shuyan Zhou and Uri Alon and Pengfei Liu and Yiming Yang and Jamie Callan and Graham Neubig , year =. arXiv preprint arXiv:2211.10435 , doi =

  27. [27]

    Cohen , year =

    Wenhu Chen and Xueguang Ma and Xinyi Wang and William W. Cohen , year =. arXiv preprint arXiv:2211.12588 , doi =

  28. [28]

    Patil and Tianjun Zhang and Xin Wang and Joseph E

    Shishir G. Patil and Tianjun Zhang and Xin Wang and Joseph E. Gonzalez , year =. arXiv preprint arXiv:2305.15334 , doi =

  29. [29]

    arXiv preprint arXiv:2307.16789 , doi =

    Yujia Qin and Shihao Liang and Yining Ye and Kunlun Zhu and Lan Yan and Yaxi Lu and Yankai Lin and Xin Cong and Xiangru Tang and Bill Qian and Sihan Zhao and Lauren Hong and others , year =. arXiv preprint arXiv:2307.16789 , doi =

  30. [30]

    arXiv preprint arXiv:2303.17580 , doi =

    Yongliang Shen and Kaitao Song and Xu Tan and Dongsheng Li and Weiming Lü and Yueting Zhuang , year =. arXiv preprint arXiv:2303.17580 , doi =

  31. [31]

    Saunders and Xu Jiang and Karl Cobbe and others , year =

    Reiichiro Nakano and Jacob Hilton and Suchir Balaji and Jeff Wu and Long Ouyang and Christina Kim and Christopher Hesse and Shantanu Jain and Vineet Kosaraju and William H. Saunders and Xu Jiang and Karl Cobbe and others , year =. arXiv preprint arXiv:2112.09332 , doi =

  32. [32]

    arXiv preprint arXiv:2305.16291 , doi =

    Guanzhi Wang and Yuqi Xie and Yunfan Jiang and Ajay Mandlekar and Chaowei Xiao and Yuke Zhu and Linxi Fan and Anima Anandkumar , year =. arXiv preprint arXiv:2305.16291 , doi =

  33. [33]

    Cai and Meredith Ringel Morris and Percy Liang and Michael S

    Joon Sung Park and Joseph O’Brien and Carrie J. Cai and Meredith Ringel Morris and Percy Liang and Michael S. Bernstein , year =

  34. [34]

    arXiv preprint arXiv:2308.08155 , doi =

    Qingyun Wu and Gagan Bansal and Jieyu Zhang and Yiran Wu and Beibin Li and Erkang Zhu and Li Jiang and Xiaoyun Zhang and Shaokun Zhang and Jiale Liu and Ahmed Hassan Awadallah and Ryen W White and others , year =. arXiv preprint arXiv:2308.08155 , doi =

  35. [35]

    arXiv preprint arXiv:2308.00352 , doi =

    Sirui Hong and Zhuge, Mingchen and Chen, Jiaqi and Xiawu Zheng and Yuheng Cheng and Ceyao Zhang and Jinlin Wang and Zili Wang and Steven Ka Shing Yau and Zijuan Lin and Liyang Zhou and Chenyu Ran and others , year =. arXiv preprint arXiv:2308.00352 , doi =

  36. [36]

    arXiv preprint arXiv:2303.17760 , doi =

    Guohao Li and Hasan Abed Al Kader Hammoud and Hani Itani and Dmitrii Khizbullin and Bernard Ghanem , year =. arXiv preprint arXiv:2303.17760 , doi =

  37. [37]

    arXiv preprint arXiv:2307.07924 , doi =

    Chen Qian and Liu, Wei and Liu, Hongzhang and Chen, Nuo and Dang, Yufan and Li, Jiahao and Cheng Yang and Weize Chen and Yusheng Su and Xin Cong and Juyuan Xu and Li, Dahai and others , year =. arXiv preprint arXiv:2307.07924 , doi =

  38. [38]

    arXiv preprint arXiv:2308.10848 , doi =

    Weize Chen and Su, Yusheng and Jingwei Zuo and Cheng Yang and Chenfei Yuan and Chi-Min Chan and Yu, Heyang and Yaxi Lu and Hung, Yi-Hsin and Chen Qian and Yujia Qin and Cong, Xin and others , year =. arXiv preprint arXiv:2308.10848 , doi =

  39. [39]

    arXiv preprint arXiv:2308.11432 , doi =

    Lei Wang and Chen Ma and Xueyang Feng and Zeyu Zhang and Hao Yang and Jingsen Zhang and Zhiyuan Chen and Jiakai Tang and Xu Chen and Yankai Lin and Wayne Xin Zhao and Zhewei Wei and others , year =. arXiv preprint arXiv:2308.11432 , doi =

  40. [40]

    Zhiheng Xi and Wen-Xiang Chen and Xin Hua Guo and Wei He and Yiwen Ding and Boyang Hong and Ming Zhang and Junzhe Wang and Senjie Jin and Enyu Zhou and Rui Zheng and Xiaoran Fan and others , year =

  41. [41]

    Lieret and Shunyu Yao and Karthik Narasimhan and Ofir Press , year =

    John Yang and Carlos Jimenez-Gomez and Alexander Wettig and K. Lieret and Shunyu Yao and Karthik Narasimhan and Ofir Press , year =. arXiv preprint arXiv:2405.15793 , doi =

  42. [42]

    arXiv preprint arXiv:2310.06770 , doi =

    Carlos Jimenez-Gomez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik Narasimhan , year =. arXiv preprint arXiv:2310.06770 , doi =

  43. [43]

    arXiv preprint arXiv:2310.03302 , doi =

    Qian Huang and Jian Vora and Percy Liang and Jure Leskovec , year =. arXiv preprint arXiv:2310.03302 , doi =

  44. [44]

    arXiv preprint arXiv:2107.03374 , doi =

    Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Pondé de Oliveira Pinto and Jared Kaplan and Harrison Edwards and Yuri Burda and Nicholas Joseph and Greg Brockman and Alex Ray and Raul Puri and others , year =. arXiv preprint arXiv:2107.03374 , doi =

  45. [45]

    arXiv preprint arXiv:2203.07814 , doi =

    Yujia Li and David Choi and Jun‐Young Chung and Nate Kushman and Julian Schrittwieser and Rémi Leblond and Tom Eccles and James Keeling and Felix Gimeno and Agustin Dal Lago and Thomas Hübert and Peter Choy and others , year =. arXiv preprint arXiv:2203.07814 , doi =

  46. [46]

    Xu and Xiangru Tang and Mingchen Zhuge and Jiayi Pan and Yueqi Song and Bowen Li and Jaskirat Singh and Tran, Hoang H

    Xingyao Wang and Boxuan Li and Yufan Song and Frank F. Xu and Xiangru Tang and Mingchen Zhuge and Jiayi Pan and Yueqi Song and Bowen Li and Jaskirat Singh and Tran, Hoang H. and Fuqiang Li and others , year =. arXiv preprint arXiv:2407.16741 , doi =

  47. [47]

    arXiv preprint arXiv:2407.01489 , doi =

    Chunqiu Steven Xia and Yinlin Deng and Dunn, Soren and Lingming Zhang , year =. arXiv preprint arXiv:2407.01489 , doi =

  48. [48]

    arXiv preprint arXiv:2108.07732 , doi =

    Jacob Austin and Augustus Odena and Maxwell Nye and Maarten Bosma and Henryk Michalewski and David Dohan and Ellen Jiang and Carrie Cai and Michael Terry and Quoc Le and Charles Sutton , year =. arXiv preprint arXiv:2108.07732 , doi =

  49. [49]

    Patil and Stoica, Ion and Joseph E

    Charles Packer and Sarah Wooders and Kevin Lin and Vivian Fang and Shishir G. Patil and Stoica, Ion and Joseph E. Gonzalez , year =. arXiv preprint arXiv:2310.08560 , doi =

  50. [50]

    arXiv preprint arXiv:2005.11401 , doi =

    Patrick Lewis and Ethan Perez and Aleksandra Piktus and Fabio Petroni and Vladimir Karpukhin and Naman Goyal and Heinrich Küttler and Mike Lewis and Wen-tau Yih and Tim Rocktäschel and Sebastian Riedel and Douwe Kiela , year =. arXiv preprint arXiv:2005.11401 , doi =

  51. [51]

    arXiv preprint arXiv:2002.08909 , doi =

    Kelvin Guu and Kenton Lee and Zora Tung and Panupong Pasupat and Ming‐Wei Chang , year =. arXiv preprint arXiv:2002.08909 , doi =

  52. [52]

    arXiv preprint arXiv:2004.04906 , doi =

    Vladimir Karpukhin and Barlas Oğuz and Sewon Min and Patrick Lewis and Ledell Wu and Sergey Edunov and Danqi Chen and Wen-tau Yih , year =. arXiv preprint arXiv:2004.04906 , doi =

  53. [53]

    arXiv preprint arXiv:2310.11511 , doi =

    Akari Asai and Zeqiu Wu and Yizhong Wang and Avirup Sil and Hannaneh Hajishirzi , year =. arXiv preprint arXiv:2310.11511 , doi =

  54. [54]

    arXiv preprint arXiv:2112.04426 , doi =

    Sebastian Borgeaud and Arthur Mensch and Jordan Hoffmann and Trevor Cai and Eliza Rutherford and Katie Millican and George van den Driessche and Jean-Baptiste Lespiau and Bogdan Damoc and Aidan Clark and Diego de Las Casas and Aurelia Guy and others , year =. arXiv preprint arXiv:2112.04426 , doi =

  55. [55]

    arXiv preprint arXiv:2312.10997 , doi =

    Yunfan Gao and Yun Xiong and Xinyu Gao and Kangxiang Jia and Jinliu Pan and Yuxi Bi and Yi Dai and Jiawei Sun and Wang, Meng and Wang, Haofen , year =. arXiv preprint arXiv:2312.10997 , doi =

  56. [56]

    arXiv preprint arXiv:2305.10250 , doi =

    Wanjun Zhong and Lianghong Guo and Qiqi Gao and He Wang and Yanlin Wang , year =. arXiv preprint arXiv:2305.10250 , doi =

  57. [57]

    Chris Lu and Cong Lu and R. T. Lange and Jakob Foerster and Jeff Clune and David Ha , year =. arXiv preprint arXiv:2408.06292 , doi =

  58. [58]

    Boiko and Robert MacKnight and Ben Kline and Gabriel dos Passos Gomes , year =

    Daniil A. Boiko and Robert MacKnight and Ben Kline and Gabriel dos Passos Gomes , year =

  59. [59]

    arXiv preprint arXiv:2404.07738 , doi =

    Jinheon Baek and Sunil Kumar Jauhar and Silviu Cucerzan and Sung Ju Hwang , year =. arXiv preprint arXiv:2404.07738 , doi =

  60. [60]

    arXiv preprint arXiv:2409.04109 , doi =

    Chenglei Si and Diyi Yang and Tatsunori Hashimoto , year =. arXiv preprint arXiv:2409.04109 , doi =

  61. [61]

    arXiv preprint arXiv:2304.05376 , doi =

    Andres M Bran and Sam Cox and Schilter, Oliver and Baldassari, Carlo and White, Andrew D and Schwaller, Philippe , year =. arXiv preprint arXiv:2304.05376 , doi =

  62. [62]

    arXiv preprint arXiv:2305.14259 , doi =

    Qingyun Wang and Doug Downey and Heng Ji and Tom Hope , year =. arXiv preprint arXiv:2305.14259 , doi =

  63. [63]

    arXiv preprint arXiv:2309.02726 , doi =

    Zonglin Yang and Xinya Du and Junxian Li and Jie Zheng and Soujanya Poria and Erik Cambria , year =. arXiv preprint arXiv:2309.02726 , doi =

  64. [64]

    Bergen and others , year =

    Hanchen Wang and Tianfan Fu and Yuanqi Du and Wenhao Gao and Kexin Huang and Ziming Liu and Payal Chandak and Shengchao Liu and Peter Van Katwyk and Andreea Deac and Anima Anandkumar and Karianne J. Bergen and others , year =

  65. [65]

    arXiv preprint arXiv:2501.04227 , doi =

    Samuel Schmidgall and Yusheng Su and Ze Wang and Ximeng Sun and Jialian Wu and Xiaodong Yu and Jiang Liu and Zicheng Liu and Emad Barsoum , year =. arXiv preprint arXiv:2501.04227 , doi =

  66. [66]

    Bodhisattwa Prasad Majumder and Harshit Surana and D. P. Agarwal and Bhavana Dalvi Mishra and Abhijeetsingh Meena and Aryan Prakhar and Tirth Vora and Tushar Khot and Ashish Sabharwal and Peter E. Clark , year =. arXiv preprint arXiv:2407.01725 , doi =

  67. [67]

    arXiv preprint arXiv:2406.06769 , doi =

    Peter Jansen and Marc-Alexandre Côté and Tushar Khot and Erin Bransom and Bhavana Dalvi Mishra and Bodhisattwa Prasad Majumder and Oyvind Tafjord and Peter Clark , year =. arXiv preprint arXiv:2406.06769 , doi =

  68. [68]

    arXiv preprint arXiv:2502.18864 , doi =

    Juraj Gottweis and Wei‐Hung Weng and Alexander Daryin and Tao Tu and Anil Palepu and Petar Sirkovic and Anatoly Myaskovsky and Felix Weissenberger and Rong, Keran and Ryutaro Tanno and Khaled Saab and Dan Popovici and others , year =. arXiv preprint arXiv:2502.18864 , doi =

  69. [69]

    Hartshorn and Elvis Saravia and Andrew M

    Ross Taylor and Marcin Kardas and Guillem Cucurull and Thomas Scialom and Anthony S. Hartshorn and Elvis Saravia and Andrew M. Poulton and Viktor Kerkez and Robert Stojnic , year =. arXiv preprint arXiv:2211.09085 , doi =

  70. [70]

    Orr , year =

    Jason R Priem and Heather Piwowar and Richard A. Orr , year =

  71. [71]

    arXiv preprint arXiv:1911.02782 , doi =

    Kyle Lo and Lucy Lu Wang and Mark E Neumann and Rodney Kinney and Dan Weld , year =. arXiv preprint arXiv:1911.02782 , doi =

  72. [72]

    Weld , year =

    Arman Cohan and Sergey Feldman and Iz Beltagy and Doug Downey and Daniel S. Weld , year =. arXiv preprint arXiv:2004.07180 , doi =

  73. [73]

    Wei-Lin Chiang and Joseph Gonzalez and Dacheng Li and Zhuohan Li and Zi Lin and Ying Sheng and Ion Stoica and Zhanghao Wu and Eric Xing and Hao Zhang and Lianmin Zheng and Siyuan Zhuang and others , year =

  74. [74]

    Newman and Binhang Yuan and others , year =

    Percy Liang and Rishi Bommasani and Tong Lee and Dimitris Tsipras and Dilara Soylu and Michihiro Yasunaga and Yian Zhang and Deepak Narayanan and Yuhuai Wu and Ananya Kumar and Benjamin T. Newman and Binhang Yuan and others , year =. arXiv preprint arXiv:2211.09110 , doi =

  75. [75]

    arXiv preprint arXiv:2009.03300 , doi =

    Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt , year =. arXiv preprint arXiv:2009.03300 , doi =

  76. [76]

    Rao and Abu Awal Shoeb and Abubakar Abid and Adam Fisch and Adam R

    Aarohi Srivastava and Abhinav Rastogi and Abhishek S. Rao and Abu Awal Shoeb and Abubakar Abid and Adam Fisch and Adam R. Brown and Adam Santoro and Aditya Gupta and Adrià Garriga-Alonso and Agnieszka Kluska and Aitor Lewkowycz and others , year =. arXiv preprint arXiv:2206.04615 , doi =

  77. [77]

    arXiv preprint arXiv:2202.03629 , doi =

    Ziwei Ji and Nayeon Lee and Rita Frieske and Tiezheng Yu and Dan Su and Yan Xu and Etsuko Ishii and Yejin Bang and Chen, Delong and Dai, Wenliang and Chan, Ho Shu and Madotto, Andrea and others , year =. arXiv preprint arXiv:2202.03629 , doi =

  78. [78]

    Lei Huang and Weijiang Yu and Weitao Ma and Weihong Zhong and Zhangyin Feng and Haotian Wang and Qianglong Chen and Weihua Peng and Xiaocheng Feng and Bing Qin and Ting Liu , year =

  79. [79]

    Sewon Min and Kalpesh Krishna and Xinxi Lyu and Mike Lewis and Wen-tau Yih and Pang Wei Koh and Mohit Iyyer and Luke Zettlemoyer and Hannaneh Hajishirzi , year =

  80. [80]

    arXiv preprint arXiv:2005.00661 , doi =

    Joshua Maynez and Shashi Narayan and Bernd Bohnet and Ryan McDonald , year =. arXiv preprint arXiv:2005.00661 , doi =

Showing first 80 references.