Autodata: An agentic data scientist to create high quality synthetic data

Chenxi Whitehouse; Eryk Helenowski; Han Fang; Ilia Kulikov; Jack Lanchantin; Jakob Foerster; Jason Weston; Olga Golovneva; Sainbayar Sukhbaatar; Swarnadeep Saha

arxiv: 2606.25996 · v2 · pith:HUCA6C6Tnew · submitted 2026-06-24 · 💻 cs.AI · cs.CL· cs.LG

Autodata: An agentic data scientist to create high quality synthetic data

Ilia Kulikov , Chenxi Whitehouse , Tianhao Wu , Yixin Nie , Swarnadeep Saha , Eryk Helenowski , Weizhe Yuan , Olga Golovneva

show 7 more authors

Jack Lanchantin Yoram Bachrach Jakob Foerster Xian Li Han Fang Sainbayar Sukhbaatar Jason Weston

This is my paper

Pith reviewed 2026-06-26 05:14 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LG

keywords synthetic dataAI agentsmeta-optimizationdata generationagentic systemsself-instructmodel training

0 comments

The pith

Meta-optimizing an AI agent that acts as a data scientist produces higher-quality synthetic data than fixed methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Autodata as a method that lets AI agents function as data scientists to construct training and evaluation datasets. It shows how to meta-optimize these agents so they learn to generate even stronger data. Experiments on computer science research, legal reasoning, and mathematical object tasks demonstrate gains over classical synthetic dataset creation. Meta-optimizing the agent itself yields larger performance improvements. The approach frames increased inference computation as a route to better model training data.

Core claim

Autodata enables AI agents to act as data scientists who build high quality training and evaluation data. A practical implementation called Agentic Self-Instruct is given. On computer science research tasks, legal reasoning tasks and reasoning with mathematical objects the method obtains improved results compared to classical synthetic dataset creation methods. Meta-optimizing the data scientist agent itself delivers an even larger performance uplift. Agentic data creation provides a way to convert increased inference compute into higher quality model training.

What carries the argument

Meta-optimization of the data scientist agent, which iteratively improves the agent's ability to produce stronger synthetic data through self-directed generation.

If this is right

Synthetic data from the meta-optimized agent outperforms data from standard creation methods on computer science, legal, and mathematical reasoning tasks.
Meta-optimizing the agent produces larger gains than using a fixed agent.
Increased inference-time computation can be redirected into measurable improvements in training data quality.
The method offers a general route to building higher-quality datasets for model training without manual curation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same meta-optimization loop could be applied repeatedly to generate successive generations of stronger data.
Domains with scarce labeled data might see the largest relative gains if the agent can discover task-specific patterns during optimization.
If inference compute continues to grow, this framing suggests data quality could scale with compute in a way that supplements traditional scaling laws.

Load-bearing premise

The meta-optimization process improves data quality in ways that transfer to new downstream tasks without introducing artifacts or overfitting to the meta-training distribution.

What would settle it

Downstream models trained on data from the meta-optimized agent show no performance gain over models trained on data from a non-meta-optimized agent or classical synthetic methods.

read the original abstract

We introduce Autodata, a general method that enables AI agents to act as data scientists who build high quality training and evaluation data. We show how to train (meta-optimize) such a data scientist agent, so that it learns to create even stronger data. We describe the overall formulation, and a specific practical implementation, Agentic Self-Instruct. We conduct experiments on computer science research tasks, legal reasoning tasks and reasoning with mathematical objects, where we obtain improved results compared to classical synthetic dataset creation methods. Further, meta-optimizing the data scientist agent itself delivers an even larger performance uplift. Agentic data creation provides a way to convert increased inference compute into higher quality model training. Overall, we believe this direction has the potential to change the way we build AI data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Autodata frames an agent as a meta-trainable data scientist for synthetic data but the abstract supplies no numbers, baselines or controls to support the claimed uplifts.

read the letter

The main things to know are that the paper introduces an agent acting as a data scientist to generate synthetic data and adds a meta-optimization stage to improve that agent, with experiments claimed on CS research, legal reasoning and math-object tasks. It reports gains over classical synthetic methods and even larger gains from the meta step, positioning the approach as a way to spend inference compute for better training data.

What is new is the explicit meta-optimization loop on the data-creation agent itself rather than just generating data once. The implementation is called Agentic Self-Instruct and extends prior Self-Instruct style work by treating the agent policy as trainable. The multi-domain experiments and the emphasis on converting extra inference into data quality are reasonable framing choices.

The soft spots are the absence of any quantitative evidence. The abstract asserts improved results and a larger uplift from meta-optimization but gives no numbers, no baselines, no error bars and no description of the meta-training procedure or task splits. This makes it impossible to check the claims. The stress-test point about possible overfitting is on target: without a clear statement that meta-training tasks are disjoint from evaluation tasks, any measured gains could reflect fitting to the meta-distribution rather than learning a general data-creation policy. If the full paper contains those controls and numbers, the picture changes; based on the abstract alone the central transfer claim rests on an unverified assumption.

This is for researchers working on synthetic data pipelines or agentic workflows for LLM training. A reader looking for new formulations in that space would get value from the setup, but only after seeing the actual experiments. The work shows clear thinking on the problem structure, so it deserves a serious referee to examine the results, the task separation, and whether the meta-optimization delivers transferable quality gains.

Referee Report

2 major / 0 minor

Summary. The paper introduces Autodata, a general method enabling AI agents to function as data scientists for constructing high-quality synthetic training and evaluation data. It presents an overall formulation along with a practical implementation termed Agentic Self-Instruct, and describes how to meta-optimize the agent to produce stronger data. Experiments are reported on computer science research tasks, legal reasoning tasks, and reasoning with mathematical objects, claiming improved results relative to classical synthetic dataset creation methods, with an even larger uplift obtained from meta-optimizing the agent itself. The work positions agentic data creation as a means to convert increased inference compute into higher-quality model training data.

Significance. If the experimental claims are substantiated with rigorous controls, this direction could meaningfully advance synthetic data practices by demonstrating how agentic workflows and meta-optimization can systematically improve data quality. The paper is credited for formulating a new agent-based paradigm for data creation and for explicitly linking inference-time compute to training data improvements.

major comments (2)

[Abstract] Abstract: The central claims of 'improved results' over classical methods and an 'even larger performance uplift' from meta-optimizing the agent are asserted without any quantitative metrics, baselines, error bars, task descriptions, or methodological details, rendering the primary empirical contribution unverifiable and preventing assessment of whether the reported gains are load-bearing or artifactual.
[Abstract] Abstract (meta-optimization description): No statement is provided on whether meta-training tasks, prompts, or evaluation criteria are disjoint from the reported test sets on CS research, legal reasoning, and math-object tasks. This omission directly undermines the transfer claim, as any uplift could arise from optimization fitting to shared task specifics rather than learning a general data-creation policy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments on the abstract below and will make the requested revisions to improve verifiability and clarity.

read point-by-point responses

Referee: [Abstract] Abstract: The central claims of 'improved results' over classical methods and an 'even larger performance uplift' from meta-optimizing the agent are asserted without any quantitative metrics, baselines, error bars, task descriptions, or methodological details, rendering the primary empirical contribution unverifiable and preventing assessment of whether the reported gains are load-bearing or artifactual.

Authors: We agree that the abstract, in its current form, would benefit from quantitative details to allow immediate assessment of the claims. The body of the manuscript reports the specific metrics, baselines (including classical synthetic data methods), error bars, task descriptions, and methodological details for the CS research, legal reasoning, and math-object experiments. We will revise the abstract to include key quantitative results and brief task/method summaries while preserving its high-level nature. revision: yes
Referee: [Abstract] Abstract (meta-optimization description): No statement is provided on whether meta-training tasks, prompts, or evaluation criteria are disjoint from the reported test sets on CS research, legal reasoning, and math-object tasks. This omission directly undermines the transfer claim, as any uplift could arise from optimization fitting to shared task specifics rather than learning a general data-creation policy.

Authors: We acknowledge the importance of this clarification for supporting the transfer claim. Our meta-optimization used tasks, prompts, and evaluation criteria that are disjoint from the test sets. We will revise the abstract to explicitly state this disjointness, and we will expand the description in the methods section to detail the separation. revision: yes

Circularity Check

0 steps flagged

No circularity detected; paper contains no derivations or equations.

full rationale

The manuscript presents an empirical method (Agentic Self-Instruct) and reports experimental uplifts from meta-optimization on CS, legal, and math tasks. No equations, first-principles derivations, or self-referential definitions appear in the provided abstract or description. Claims rest on experimental comparisons rather than any reduction of outputs to inputs by construction. Self-citations, if present, are not load-bearing for a mathematical chain. This is the expected non-finding for a purely empirical agent paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities can be identified from the provided text.

pith-pipeline@v0.9.1-grok · 5723 in / 1099 out tokens · 21720 ms · 2026-06-26T05:14:03.368525+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

106 extracted references · 12 linked inside Pith

[1]

2025 , url=

Whitehouse, Chenxi and Wang, Tianlu and Yu, Ping and Li, Xian and Weston, Jason and Kulikov, Ilia and Saha, Swarnadeep , journal=. 2025 , url=

2025
[2]

arXiv preprint arXiv:2512.23707 , year=

Training AI Co-Scientists Using Rubric Rewards , author=. arXiv preprint arXiv:2512.23707 , year=

arXiv
[4]

Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

S2ORC: The semantic scholar open research corpus , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=
[5]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

Enhancing chat language models by scaling high-quality instructional conversations , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

2023
[6]

International Conference on Learning Representations , volume=

Magpie: Alignment data synthesis from scratch by prompting aligned llms with nothing , author=. International Conference on Learning Representations , volume=
[8]

International Conference on Learning Representations , volume=

Metamath: Bootstrap your own mathematical questions for large language models , author=. International Conference on Learning Representations , volume=
[9]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Data interpreter: An llm agent for data science , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025
[11]

Advances in Neural Information Processing Systems , volume=

Autodata: A multi-agent system for open web data collection , author=. Advances in Neural Information Processing Systems , volume=
[12]

International Conference on Learning Representations , volume=

Openmathinstruct-2: Accelerating ai for math with massive open-source instruction data , author=. International Conference on Learning Representations , volume=
[14]

International Conference on Learning Representations , volume=

Mammoth: Building math generalist models through hybrid instruction tuning , author=. International Conference on Learning Representations , volume=
[15]

Ultrafeedback: Boosting language models with high-quality feedback , author=
[16]

International Conference on Learning Representations , volume=

WizardLM: Empowering large pre-trained language models to follow complex instructions , author=. International Conference on Learning Representations , volume=
[18]

2025 , url=

Moshkov, Ivan and Hanley, Darragh and Sorokin, Ivan and Toshniwal, Shubham and Henkel, Christof and Schifferer, Benedikt and Du, Wei and Gitman, Igor , journal=. 2025 , url=

2025
[19]

2024 , url=

Lambert, Nathan and Morrison, Jacob and Pyatkin, Valentina and Huang, Shengyi and Ivison, Hamish and Brahman, Faeze and Miranda, Lester James V and Liu, Alisa and Dziri, Nouha and Lyu, Shane and others , journal=. 2024 , url=

2024
[20]

2022 , url=

Ouyang, Long and Wu, Jeffrey and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others , journal=. 2022 , url=

2022
[21]

2025 , url=

Yang, An and Li, Anfeng and Yang, Baosong and Zhang, Beichen and Hui, Binyuan and Zheng, Bo and Yu, Bowen and Gao, Chang and Huang, Chengen and Lv, Chenxu and others , journal=. 2025 , url=

2025
[22]

arXiv preprint arXiv:2501.12948 , year=

Deepseek-r1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. arXiv preprint arXiv:2501.12948 , year=

Pith/arXiv arXiv
[23]

2022 , url=

Bai, Yuntao and Jones, Andy and Ndousse, Kamal and Askell, Amanda and Chen, Anna and DasSarma, Nova and Drain, Dawn and Fort, Stanislav and Ganguli, Deep and Henighan, Tom and others , journal=. 2022 , url=

2022
[24]

2017 , url=

Christiano, Paul F and Leike, Jan and Brown, Tom and Martic, Miljan and Legg, Shane and Amodei, Dario , journal=. 2017 , url=

2017
[25]

2019 , url=

Ziegler, Daniel M and Stiennon, Nisan and Wu, Jeffrey and Brown, Tom B and Radford, Alec and Amodei, Dario and Christiano, Paul and Irving, Geoffrey , journal=. 2019 , url=

2019
[26]

2019 , url=

Jaques, Natasha and Ghandeharioun, Asma and Shen, Judy Hanwen and Ferguson, Craig and Lapedriza, Agata and Jones, Noah and Gu, Shixiang and Picard, Rosalind , journal=. 2019 , url=

2019
[27]

2017 , url=

Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg , journal=. 2017 , url=

2017
[28]

2024 , url=

Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, YK and Wu, Yang and others , journal=. 2024 , url=

2024
[29]

2024 , url=

Sheng, Guangming and Zhang, Chi and Ye, Zilingfeng and Wu, Xibin and Zhang, Wang and Zhang, Ru and Peng, Yanghua and Lin, Haibin and Wu, Chuan , journal=. 2024 , url=

2024
[30]

Tao, Leitian and Kulikov, Ilia and Saha, Swarnadeep and Wang, Tianlu and Xu, Jing and Li, Sharon and Weston, Jason E and Yu, Ping , journal=
[31]

Shao, Zhihong and Luo, Yuxiang and Lu, Chengda and Ren, ZZ and Hu, Jiewen and Ye, Tian and Gou, Zhibin and Ma, Shirong and Zhang, Xiaokang , journal=
[32]

2023 , url=

Gao, Leo and Schulman, John and Hilton, Jacob , booktitle=. 2023 , url=

2023
[33]

2024 , url=

Wenting Zhao and Xiang Ren and Jack Hessel and Claire Cardie and Yejin Choi and Yuntian Deng , booktitle=. 2024 , url=

2024
[34]

arXiv preprint arXiv:2509.26601 , year=

Whitehouse, Chenxi and Ruder, Sebastian and Lin, Tony and Kurylo, Oksana and Takagi, Haruka and Lam, Janice and Busetto, Nicol. arXiv preprint arXiv:2509.26601 , year=

arXiv
[35]

Can Balioglu and Alexander Erben and Martin Gleize and Artyom Kozhevnikov and Ilia Kulikov and Julien Yao , title =
[36]

Gonzalez and Hao Zhang and Ion Stoica , booktitle=

Woosuk Kwon and Zhuohan Li and Siyuan Zhuang and Ying Sheng and Lianmin Zheng and Cody Hao Yu and Joseph E. Gonzalez and Hao Zhang and Ion Stoica , booktitle=. 2023 , url=

2023
[37]

Hashimoto , title =

Xuechen Li and Tianyi Zhang and Yann Dubois and Rohan Taori and Ishaan Gulrajani and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =. GitHub repository , howpublished =. 2023 , month =

2023
[39]

Li, Tianle and Chiang, Wei-Lin and Frick, Evan and Dunlap, Lisa and Wu, Tianhao and Zhu, Banghua and Gonzalez, Joseph E and Stoica, Ion , year=
[40]

arXiv preprint arXiv:2506.21495 , year=

Lanchantin, Jack and Chen, Angelica and Lan, Janice and Li, Xian and Saha, Swarnadeep and Wang, Tianlu and Xu, Jing and Yu, Ping and Yuan, Weizhe and Weston, Jason E and others , url=. arXiv preprint arXiv:2506.21495 , year=

arXiv
[41]

arXiv preprint arXiv:2507.01352 , year=

Liu, Chris Yuhao and Zeng, Liang and Xiao, Yuzhen and He, Jujie and Liu, Jiacai and Wang, Chaojie and Yan, Rui and Shen, Wei and Zhang, Fuxiang and Xu, Jiacheng and others , url=. arXiv preprint arXiv:2507.01352 , year=

Pith/arXiv arXiv
[42]

Frick, Evan and Jin, Peter and Li, Tianle and Ganesan, Karthik and Zhang, Jian and Jiao, Jiantao and Zhu, Banghua , month =
[43]

arXiv preprint , year=

Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei , url=. arXiv preprint , year=
[44]

General-Reasoner: Advancing

Xueguang Ma and Qian Liu and Dongfu Jiang and Ge Zhang and Zejun MA and Wenhu Chen , booktitle=. General-Reasoner: Advancing. 2025 , url=

2025
[45]

Malik, Saumya and Pyatkin, Valentina and Land, Sander and Morrison, Jacob and Smith, Noah A and Hajishirzi, Hannaneh and Lambert, Nathan , journal=
[46]

arXiv preprint arXiv:2505.23281 , url=

Balunovi. arXiv preprint arXiv:2505.23281 , url=

Pith/arXiv arXiv
[47]

Wang, Zengzhi and Zhou, Fan and Li, Xuefeng and Liu, Pengfei , journal=
[48]

2024 , url=

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking , author=. 2024 , url=

2024
[49]

arXiv preprint arXiv:1606.06565 , url=

Concrete problems in AI safety , author=. arXiv preprint arXiv:1606.06565 , url=

Pith/arXiv arXiv
[50]

Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

Rewardbench: Evaluating reward models for language modeling , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

2025
[51]

2024 , journal=

RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style , author=. 2024 , journal=. 2410.16184 , archivePrefix=

arXiv 2024
[52]

2025 , url=

Sijun Tan and Siyuan Zhuang and Kyle Montgomery and William Yuan Tang and Alejandro Cuadron and Chenguang Wang and Raluca Popa and Ion Stoica , booktitle=. 2025 , url=

2025
[53]

Gonzalez and Ion Stoica , booktitle=

Evan Frick and Tianle Li and Connor Chen and Wei-Lin Chiang and Anastasios Nikolas Angelopoulos and Jiantao Jiao and Banghua Zhu and Joseph E. Gonzalez and Ion Stoica , booktitle=. 2025 , url=

2025
[54]

arXiv preprint arXiv:2410.12832 , url=

Mahan, Dakota and Van Phung, Duy and Rafailov, Rafael and Blagden, Chase and Lile, Nathan and Castricato, Louis and Fr. arXiv preprint arXiv:2410.12832 , url=

arXiv
[55]

2025 , url=

Lunjun Zhang and Arian Hosseini and Hritik Bansal and Mehran Kazemi and Aviral Kumar and Rishabh Agarwal , booktitle=. 2025 , url=

2025
[56]

2024 , url=

Zachary Ankner and Mansheej Paul and Brandon Cui and Jonathan Daniel Chang and Prithviraj Ammanabrolu , booktitle=. 2024 , url=

2024
[57]

2025 , url=

Liu, Zijun and Wang, Peiyi and Xu, Runxin and Ma, Shirong and Ruan, Chong and Li, Peng and Liu, Yang and Wu, Yu , journal=. 2025 , url=

2025
[58]

Advances in Neural Information Processing Systems , volume=

Star: Bootstrapping reasoning with reasoning , author=. Advances in Neural Information Processing Systems , volume=
[59]

2025 , url=

Swarnadeep Saha and Xian Li and Marjan Ghazvininejad and Jason E Weston and Tianlu Wang , booktitle=. 2025 , url=

2025
[60]

Wang, Tianlu and Kulikov, Ilia and Golovneva, Olga and Yu, Ping and Yuan, Weizhe and Dwivedi-Yu, Jane and Pang, Richard Yuanzhe and Fazel-Zarandi, Maryam and Weston, Jason and Li, Xian , journal=
[61]

Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and others , journal=
[62]

2025 , url=

Lu, Xun , journal=. 2025 , url=

2025
[63]

2025 , url=

Guo, Jiaxin and Chi, Zewen and Dong, Li and Dong, Qingxiu and Wu, Xun and Huang, Shaohan and Wei, Furu , journal=. 2025 , url=

2025
[64]

Chen, Xiusi and Li, Gaotang and Wang, Ziqi and Jin, Bowen and Qian, Cheng and Wang, Yu and Wang, Hongru and Zhang, Yu and Zhang, Denghui and Zhang, Tong and others , journal=
[65]

RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback , author=

RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback , author=. International Conference on Machine Learning , pages=. 2024 , organization=

2024
[66]

Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric and others , journal=
[67]

2024 , url=

Jaech, Aaron and Kalai, Adam and Lerer, Adam and Richardson, Adam and El-Kishky, Ahmed and Low, Aiden and Helyar, Alec and Madry, Aleksander and Beutel, Alex and Carney, Alex and others , journal=. 2024 , url=

2024
[68]

2024 , url=

Hurst, Aaron and Lerer, Adam and Goucher, Adam P and Perelman, Adam and Ramesh, Aditya and Clark, Aidan and Ostrow, AJ and Welihinda, Akila and Hayes, Alan and Radford, Alec and others , journal=. 2024 , url=

2024
[69]

2024 , url=

Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and Al-Dahle, Ahmad and Letman, Aiesha and Mathur, Akhil and Schelten, Alan and Vaughan, Alex and others , journal=. 2024 , url=

2024
[70]

Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers) , pages=

Self-instruct: Aligning language models with self-generated instructions , author=. Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers) , pages=
[72]

Advances in neural information processing systems , volume=

Self-refine: Iterative refinement with self-feedback , author=. Advances in neural information processing systems , volume=
[73]

International Conference on Learning Representations , volume=

Large language models as optimizers , author=. International Conference on Learning Representations , volume=
[80]

Forty-first International Conference on Machine Learning , year=

Self-rewarding language models , author=. Forty-first International Conference on Machine Learning , year=
[86]

2026 , howpublished =

Andrej Karpathy , title =. 2026 , howpublished =

2026
[88]

Advances in Neural Information Processing Systems , volume=

Pile of law: Learning responsible data filtering from the law and a 256gb open-source legal dataset , author=. Advances in Neural Information Processing Systems , volume=
[89]

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Pranjal Aggarwal, Marjan Ghazvininejad, Seungone Kim, Ilia Kulikov, Jack Lanchantin, Xian Li, Tianjian Li, Bo Liu, Graham Neubig, Anaelia Ovalle, et al. Reasoning over mathematical objects: on-policy reward modeling and test time aggregation. arXiv preprint arXiv:2603.18886, 2026

arXiv 2026
[90]

Gepa: Reflective prompt evolution can outperform reinforcement learning

Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, et al. Gepa: Reflective prompt evolution can outperform reinforcement learning. arXiv preprint arXiv:2507.19457, 2025

Pith/arXiv arXiv 2025
[91]

Prbench: Large-scale expert rubrics for evaluating high-stakes professional reasoning

Afra Feyza Aky \"u rek, Advait Gosai, Chen Bo Calvin Zhang, Vipul Gupta, Jaehwan Jeong, Anisha Gunjal, Tahseen Rabbani, Maria Mazzone, David Randolph, Mohammad Mahmoudi Meymand, et al. Prbench: Large-scale expert rubrics for evaluating high-stakes professional reasoning. arXiv preprint arXiv:2511.11562, 2025

arXiv 2025
[92]

Ultrafeedback: Boosting language models with high-quality feedback

Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Qiang Yue, Yuan Ni, Guotong Xie, Zhiyuan Liu, and Maosong Sun. Ultrafeedback: Boosting language models with high-quality feedback. 2023

2023
[93]

Enhancing chat language models by scaling high-quality instructional conversations

Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Shengding Hu, Zhiyuan Liu, Maosong Sun, and Bowen Zhou. Enhancing chat language models by scaling high-quality instructional conversations. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3029--3051, 2023

2023
[94]

Promptbreeder: Self-referential self-improvement via prompt evolution

Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, and Tim Rockt \"a schel. Promptbreeder: Self-referential self-improvement via prompt evolution. arXiv preprint arXiv:2309.16797, 2023

Pith/arXiv arXiv 2023
[95]

Ds-agent: Automated data science by empowering large language models with case-based reasoning

Siyuan Guo, Cheng Deng, Ying Wen, Hechang Chen, Yi Chang, and Jun Wang. Ds-agent: Automated data science by empowering large language models with case-based reasoning. arXiv preprint arXiv:2402.17453, 2024

arXiv 2024
[96]

Pile of law: Learning responsible data filtering from the law and a 256gb open-source legal dataset

Peter Henderson, Mark Krass, Lucia Zheng, Neel Guha, Christopher D Manning, Dan Jurafsky, and Daniel Ho. Pile of law: Learning responsible data filtering from the law and a 256gb open-source legal dataset. Advances in Neural Information Processing Systems, 35: 0 29217--29234, 2022

2022
[97]

Data interpreter: An llm agent for data science

Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Ceyao Zhang, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, et al. Data interpreter: An llm agent for data science. In Findings of the Association for Computational Linguistics: ACL 2025, pages 19796--19821, 2025

2025
[98]

autoresearch: Ai agents running research on single-gpu nanochat training automatically

Andrej Karpathy. autoresearch: Ai agents running research on single-gpu nanochat training automatically. https://github.com/karpathy/autoresearch, 2026. GitHub repository

2026
[99]

Meta-harness: End-to-end optimization of model harnesses

Yoonho Lee, Roshen Nair, Qizheng Zhang, Kangwook Lee, Omar Khattab, and Chelsea Finn. Meta-harness: End-to-end optimization of model harnesses. arXiv preprint arXiv:2603.28052, 2026

Pith/arXiv arXiv 2026

Showing first 80 references.

[1] [1]

2025 , url=

Whitehouse, Chenxi and Wang, Tianlu and Yu, Ping and Li, Xian and Weston, Jason and Kulikov, Ilia and Saha, Swarnadeep , journal=. 2025 , url=

2025

[2] [2]

arXiv preprint arXiv:2512.23707 , year=

Training AI Co-Scientists Using Rubric Rewards , author=. arXiv preprint arXiv:2512.23707 , year=

arXiv

[3] [4]

Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

S2ORC: The semantic scholar open research corpus , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

[4] [5]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

Enhancing chat language models by scaling high-quality instructional conversations , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

2023

[5] [6]

International Conference on Learning Representations , volume=

Magpie: Alignment data synthesis from scratch by prompting aligned llms with nothing , author=. International Conference on Learning Representations , volume=

[6] [8]

International Conference on Learning Representations , volume=

Metamath: Bootstrap your own mathematical questions for large language models , author=. International Conference on Learning Representations , volume=

[7] [9]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Data interpreter: An llm agent for data science , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025

[8] [11]

Advances in Neural Information Processing Systems , volume=

Autodata: A multi-agent system for open web data collection , author=. Advances in Neural Information Processing Systems , volume=

[9] [12]

International Conference on Learning Representations , volume=

Openmathinstruct-2: Accelerating ai for math with massive open-source instruction data , author=. International Conference on Learning Representations , volume=

[10] [14]

International Conference on Learning Representations , volume=

Mammoth: Building math generalist models through hybrid instruction tuning , author=. International Conference on Learning Representations , volume=

[11] [15]

Ultrafeedback: Boosting language models with high-quality feedback , author=

[12] [16]

International Conference on Learning Representations , volume=

WizardLM: Empowering large pre-trained language models to follow complex instructions , author=. International Conference on Learning Representations , volume=

[13] [18]

2025 , url=

Moshkov, Ivan and Hanley, Darragh and Sorokin, Ivan and Toshniwal, Shubham and Henkel, Christof and Schifferer, Benedikt and Du, Wei and Gitman, Igor , journal=. 2025 , url=

2025

[14] [19]

2024 , url=

Lambert, Nathan and Morrison, Jacob and Pyatkin, Valentina and Huang, Shengyi and Ivison, Hamish and Brahman, Faeze and Miranda, Lester James V and Liu, Alisa and Dziri, Nouha and Lyu, Shane and others , journal=. 2024 , url=

2024

[15] [20]

2022 , url=

Ouyang, Long and Wu, Jeffrey and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others , journal=. 2022 , url=

2022

[16] [21]

2025 , url=

Yang, An and Li, Anfeng and Yang, Baosong and Zhang, Beichen and Hui, Binyuan and Zheng, Bo and Yu, Bowen and Gao, Chang and Huang, Chengen and Lv, Chenxu and others , journal=. 2025 , url=

2025

[17] [22]

arXiv preprint arXiv:2501.12948 , year=

Deepseek-r1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. arXiv preprint arXiv:2501.12948 , year=

Pith/arXiv arXiv

[18] [23]

2022 , url=

Bai, Yuntao and Jones, Andy and Ndousse, Kamal and Askell, Amanda and Chen, Anna and DasSarma, Nova and Drain, Dawn and Fort, Stanislav and Ganguli, Deep and Henighan, Tom and others , journal=. 2022 , url=

2022

[19] [24]

2017 , url=

Christiano, Paul F and Leike, Jan and Brown, Tom and Martic, Miljan and Legg, Shane and Amodei, Dario , journal=. 2017 , url=

2017

[20] [25]

2019 , url=

Ziegler, Daniel M and Stiennon, Nisan and Wu, Jeffrey and Brown, Tom B and Radford, Alec and Amodei, Dario and Christiano, Paul and Irving, Geoffrey , journal=. 2019 , url=

2019

[21] [26]

2019 , url=

Jaques, Natasha and Ghandeharioun, Asma and Shen, Judy Hanwen and Ferguson, Craig and Lapedriza, Agata and Jones, Noah and Gu, Shixiang and Picard, Rosalind , journal=. 2019 , url=

2019

[22] [27]

2017 , url=

Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg , journal=. 2017 , url=

2017

[23] [28]

2024 , url=

Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, YK and Wu, Yang and others , journal=. 2024 , url=

2024

[24] [29]

2024 , url=

Sheng, Guangming and Zhang, Chi and Ye, Zilingfeng and Wu, Xibin and Zhang, Wang and Zhang, Ru and Peng, Yanghua and Lin, Haibin and Wu, Chuan , journal=. 2024 , url=

2024

[25] [30]

Tao, Leitian and Kulikov, Ilia and Saha, Swarnadeep and Wang, Tianlu and Xu, Jing and Li, Sharon and Weston, Jason E and Yu, Ping , journal=

[26] [31]

Shao, Zhihong and Luo, Yuxiang and Lu, Chengda and Ren, ZZ and Hu, Jiewen and Ye, Tian and Gou, Zhibin and Ma, Shirong and Zhang, Xiaokang , journal=

[27] [32]

2023 , url=

Gao, Leo and Schulman, John and Hilton, Jacob , booktitle=. 2023 , url=

2023

[28] [33]

2024 , url=

Wenting Zhao and Xiang Ren and Jack Hessel and Claire Cardie and Yejin Choi and Yuntian Deng , booktitle=. 2024 , url=

2024

[29] [34]

arXiv preprint arXiv:2509.26601 , year=

Whitehouse, Chenxi and Ruder, Sebastian and Lin, Tony and Kurylo, Oksana and Takagi, Haruka and Lam, Janice and Busetto, Nicol. arXiv preprint arXiv:2509.26601 , year=

arXiv

[30] [35]

Can Balioglu and Alexander Erben and Martin Gleize and Artyom Kozhevnikov and Ilia Kulikov and Julien Yao , title =

[31] [36]

Gonzalez and Hao Zhang and Ion Stoica , booktitle=

Woosuk Kwon and Zhuohan Li and Siyuan Zhuang and Ying Sheng and Lianmin Zheng and Cody Hao Yu and Joseph E. Gonzalez and Hao Zhang and Ion Stoica , booktitle=. 2023 , url=

2023

[32] [37]

Hashimoto , title =

Xuechen Li and Tianyi Zhang and Yann Dubois and Rohan Taori and Ishaan Gulrajani and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =. GitHub repository , howpublished =. 2023 , month =

2023

[33] [39]

Li, Tianle and Chiang, Wei-Lin and Frick, Evan and Dunlap, Lisa and Wu, Tianhao and Zhu, Banghua and Gonzalez, Joseph E and Stoica, Ion , year=

[34] [40]

arXiv preprint arXiv:2506.21495 , year=

Lanchantin, Jack and Chen, Angelica and Lan, Janice and Li, Xian and Saha, Swarnadeep and Wang, Tianlu and Xu, Jing and Yu, Ping and Yuan, Weizhe and Weston, Jason E and others , url=. arXiv preprint arXiv:2506.21495 , year=

arXiv

[35] [41]

arXiv preprint arXiv:2507.01352 , year=

Liu, Chris Yuhao and Zeng, Liang and Xiao, Yuzhen and He, Jujie and Liu, Jiacai and Wang, Chaojie and Yan, Rui and Shen, Wei and Zhang, Fuxiang and Xu, Jiacheng and others , url=. arXiv preprint arXiv:2507.01352 , year=

Pith/arXiv arXiv

[36] [42]

Frick, Evan and Jin, Peter and Li, Tianle and Ganesan, Karthik and Zhang, Jian and Jiao, Jiantao and Zhu, Banghua , month =

[37] [43]

arXiv preprint , year=

Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei , url=. arXiv preprint , year=

[38] [44]

General-Reasoner: Advancing

Xueguang Ma and Qian Liu and Dongfu Jiang and Ge Zhang and Zejun MA and Wenhu Chen , booktitle=. General-Reasoner: Advancing. 2025 , url=

2025

[39] [45]

Malik, Saumya and Pyatkin, Valentina and Land, Sander and Morrison, Jacob and Smith, Noah A and Hajishirzi, Hannaneh and Lambert, Nathan , journal=

[40] [46]

arXiv preprint arXiv:2505.23281 , url=

Balunovi. arXiv preprint arXiv:2505.23281 , url=

Pith/arXiv arXiv

[41] [47]

Wang, Zengzhi and Zhou, Fan and Li, Xuefeng and Liu, Pengfei , journal=

[42] [48]

2024 , url=

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking , author=. 2024 , url=

2024

[43] [49]

arXiv preprint arXiv:1606.06565 , url=

Concrete problems in AI safety , author=. arXiv preprint arXiv:1606.06565 , url=

Pith/arXiv arXiv

[44] [50]

Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

Rewardbench: Evaluating reward models for language modeling , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

2025

[45] [51]

2024 , journal=

RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style , author=. 2024 , journal=. 2410.16184 , archivePrefix=

arXiv 2024

[46] [52]

2025 , url=

Sijun Tan and Siyuan Zhuang and Kyle Montgomery and William Yuan Tang and Alejandro Cuadron and Chenguang Wang and Raluca Popa and Ion Stoica , booktitle=. 2025 , url=

2025

[47] [53]

Gonzalez and Ion Stoica , booktitle=

Evan Frick and Tianle Li and Connor Chen and Wei-Lin Chiang and Anastasios Nikolas Angelopoulos and Jiantao Jiao and Banghua Zhu and Joseph E. Gonzalez and Ion Stoica , booktitle=. 2025 , url=

2025

[48] [54]

arXiv preprint arXiv:2410.12832 , url=

Mahan, Dakota and Van Phung, Duy and Rafailov, Rafael and Blagden, Chase and Lile, Nathan and Castricato, Louis and Fr. arXiv preprint arXiv:2410.12832 , url=

arXiv

[49] [55]

2025 , url=

Lunjun Zhang and Arian Hosseini and Hritik Bansal and Mehran Kazemi and Aviral Kumar and Rishabh Agarwal , booktitle=. 2025 , url=

2025

[50] [56]

2024 , url=

Zachary Ankner and Mansheej Paul and Brandon Cui and Jonathan Daniel Chang and Prithviraj Ammanabrolu , booktitle=. 2024 , url=

2024

[51] [57]

2025 , url=

Liu, Zijun and Wang, Peiyi and Xu, Runxin and Ma, Shirong and Ruan, Chong and Li, Peng and Liu, Yang and Wu, Yu , journal=. 2025 , url=

2025

[52] [58]

Advances in Neural Information Processing Systems , volume=

Star: Bootstrapping reasoning with reasoning , author=. Advances in Neural Information Processing Systems , volume=

[53] [59]

2025 , url=

Swarnadeep Saha and Xian Li and Marjan Ghazvininejad and Jason E Weston and Tianlu Wang , booktitle=. 2025 , url=

2025

[54] [60]

Wang, Tianlu and Kulikov, Ilia and Golovneva, Olga and Yu, Ping and Yuan, Weizhe and Dwivedi-Yu, Jane and Pang, Richard Yuanzhe and Fazel-Zarandi, Maryam and Weston, Jason and Li, Xian , journal=

[55] [61]

Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and others , journal=

[56] [62]

2025 , url=

Lu, Xun , journal=. 2025 , url=

2025

[57] [63]

2025 , url=

Guo, Jiaxin and Chi, Zewen and Dong, Li and Dong, Qingxiu and Wu, Xun and Huang, Shaohan and Wei, Furu , journal=. 2025 , url=

2025

[58] [64]

Chen, Xiusi and Li, Gaotang and Wang, Ziqi and Jin, Bowen and Qian, Cheng and Wang, Yu and Wang, Hongru and Zhang, Yu and Zhang, Denghui and Zhang, Tong and others , journal=

[59] [65]

RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback , author=

RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback , author=. International Conference on Machine Learning , pages=. 2024 , organization=

2024

[60] [66]

Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric and others , journal=

[61] [67]

2024 , url=

Jaech, Aaron and Kalai, Adam and Lerer, Adam and Richardson, Adam and El-Kishky, Ahmed and Low, Aiden and Helyar, Alec and Madry, Aleksander and Beutel, Alex and Carney, Alex and others , journal=. 2024 , url=

2024

[62] [68]

2024 , url=

Hurst, Aaron and Lerer, Adam and Goucher, Adam P and Perelman, Adam and Ramesh, Aditya and Clark, Aidan and Ostrow, AJ and Welihinda, Akila and Hayes, Alan and Radford, Alec and others , journal=. 2024 , url=

2024

[63] [69]

2024 , url=

Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and Al-Dahle, Ahmad and Letman, Aiesha and Mathur, Akhil and Schelten, Alan and Vaughan, Alex and others , journal=. 2024 , url=

2024

[64] [70]

Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers) , pages=

Self-instruct: Aligning language models with self-generated instructions , author=. Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers) , pages=

[65] [72]

Advances in neural information processing systems , volume=

Self-refine: Iterative refinement with self-feedback , author=. Advances in neural information processing systems , volume=

[66] [73]

International Conference on Learning Representations , volume=

Large language models as optimizers , author=. International Conference on Learning Representations , volume=

[67] [80]

Forty-first International Conference on Machine Learning , year=

Self-rewarding language models , author=. Forty-first International Conference on Machine Learning , year=

[68] [86]

2026 , howpublished =

Andrej Karpathy , title =. 2026 , howpublished =

2026

[69] [88]

Advances in Neural Information Processing Systems , volume=

Pile of law: Learning responsible data filtering from the law and a 256gb open-source legal dataset , author=. Advances in Neural Information Processing Systems , volume=

[70] [89]

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Pranjal Aggarwal, Marjan Ghazvininejad, Seungone Kim, Ilia Kulikov, Jack Lanchantin, Xian Li, Tianjian Li, Bo Liu, Graham Neubig, Anaelia Ovalle, et al. Reasoning over mathematical objects: on-policy reward modeling and test time aggregation. arXiv preprint arXiv:2603.18886, 2026

arXiv 2026

[71] [90]

Gepa: Reflective prompt evolution can outperform reinforcement learning

Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, et al. Gepa: Reflective prompt evolution can outperform reinforcement learning. arXiv preprint arXiv:2507.19457, 2025

Pith/arXiv arXiv 2025

[72] [91]

Prbench: Large-scale expert rubrics for evaluating high-stakes professional reasoning

Afra Feyza Aky \"u rek, Advait Gosai, Chen Bo Calvin Zhang, Vipul Gupta, Jaehwan Jeong, Anisha Gunjal, Tahseen Rabbani, Maria Mazzone, David Randolph, Mohammad Mahmoudi Meymand, et al. Prbench: Large-scale expert rubrics for evaluating high-stakes professional reasoning. arXiv preprint arXiv:2511.11562, 2025

arXiv 2025

[73] [92]

Ultrafeedback: Boosting language models with high-quality feedback

Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Qiang Yue, Yuan Ni, Guotong Xie, Zhiyuan Liu, and Maosong Sun. Ultrafeedback: Boosting language models with high-quality feedback. 2023

2023

[74] [93]

Enhancing chat language models by scaling high-quality instructional conversations

Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Shengding Hu, Zhiyuan Liu, Maosong Sun, and Bowen Zhou. Enhancing chat language models by scaling high-quality instructional conversations. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3029--3051, 2023

2023

[75] [94]

Promptbreeder: Self-referential self-improvement via prompt evolution

Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, and Tim Rockt \"a schel. Promptbreeder: Self-referential self-improvement via prompt evolution. arXiv preprint arXiv:2309.16797, 2023

Pith/arXiv arXiv 2023

[76] [95]

Ds-agent: Automated data science by empowering large language models with case-based reasoning

Siyuan Guo, Cheng Deng, Ying Wen, Hechang Chen, Yi Chang, and Jun Wang. Ds-agent: Automated data science by empowering large language models with case-based reasoning. arXiv preprint arXiv:2402.17453, 2024

arXiv 2024

[77] [96]

Pile of law: Learning responsible data filtering from the law and a 256gb open-source legal dataset

Peter Henderson, Mark Krass, Lucia Zheng, Neel Guha, Christopher D Manning, Dan Jurafsky, and Daniel Ho. Pile of law: Learning responsible data filtering from the law and a 256gb open-source legal dataset. Advances in Neural Information Processing Systems, 35: 0 29217--29234, 2022

2022

[78] [97]

Data interpreter: An llm agent for data science

Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Ceyao Zhang, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, et al. Data interpreter: An llm agent for data science. In Findings of the Association for Computational Linguistics: ACL 2025, pages 19796--19821, 2025

2025

[79] [98]

autoresearch: Ai agents running research on single-gpu nanochat training automatically

Andrej Karpathy. autoresearch: Ai agents running research on single-gpu nanochat training automatically. https://github.com/karpathy/autoresearch, 2026. GitHub repository

2026

[80] [99]

Meta-harness: End-to-end optimization of model harnesses

Yoonho Lee, Roshen Nair, Qizheng Zhang, Kangwook Lee, Omar Khattab, and Chelsea Finn. Meta-harness: End-to-end optimization of model harnesses. arXiv preprint arXiv:2603.28052, 2026

Pith/arXiv arXiv 2026