Evolving Roles of LLMs in Scientific Innovation: Assistant, Collaborator, Scientist, and Evaluator

arxiv: 2507.11810 · v2 · submitted 2025-07-16 · 💻 cs.DL · cs.AI

Evolving Roles of LLMs in Scientific Innovation: Assistant, Collaborator, Scientist, and Evaluator

Haoxuan Zhang , Ruochi Li , Yang Zhang , Ting Xiao , Jiangping Chen , Junhua Ding , Haihua Chen This is my paper

Pith reviewed 2026-05-19 05:12 UTC · model grok-4.3

classification 💻 cs.DL cs.AI

keywords large language modelsscientific innovationroles frameworkAI in scienceautonomy levelshypothesis generationresearch evaluationsurvey

0 comments p. Extension

The pith

Large language models in science are best understood through four roles: Assistant, Collaborator, Scientist, and Evaluator.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a four-role framework to organize how LLMs contribute to scientific work. The roles are distinguished by three dimensions: autonomy level, cognitive function, and scientific innovation. This separation matters because it clarifies the difference between tools that aid routine research and systems aimed at genuine discovery. The survey examines methods, benchmarks, and limitations for each role, noting that Assistants are mature at retrieval but unreliable in open tasks, while Scientists automate workflows yet face safety problems. It argues that real progress requires attention to evaluation, oversight, and institutional fit beyond raw model capability.

Core claim

The central claim is that LLMs in scientific innovation can be classified into four roles—Assistant, Collaborator, Scientist, and Evaluator—by combining autonomy level, cognitive function, and scientific innovation. This framework separates research-oriented support from frontier-oriented discovery. Literature review shows Assistants excel at retrieval and synthesis but falter in open-ended use; Collaborators broaden hypothesis options yet trade off novelty against grounding; Scientists automate research but hit reliability and safety limits; Evaluators aid verification yet remain weak at novelty judgment. Advancement in AI for science therefore hinges on evaluation practices, human control,

What carries the argument

The four-role framework that classifies LLM systems by integrating autonomy level, cognitive function, and scientific innovation.

If this is right

Assistant systems reach maturity in literature tasks but still need human oversight for open-ended scientific applications.
Collaborator systems enlarge the space of possible hypotheses yet must resolve trade-offs between novelty and grounding in known facts.
Scientist systems increasingly automate full research workflows but remain constrained by reliability and safety bottlenecks.
Evaluator systems support review and verification but continue to underperform when judging true novelty.
Progress across all roles depends on developing better evaluation methods, stronger oversight, accountability structures, and institutional integration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Research funders could use the four-role lens to decide whether a project targets support tools or discovery engines.
Benchmark designers might create role-specific tests rather than general science benchmarks that mix different autonomy levels.
Institutions could develop role-tailored oversight policies, such as stricter safety reviews for Scientist-level systems.
The same three dimensions might later classify non-LLM AI tools in science to track broader trends.

Load-bearing premise

The body of existing literature on LLMs in science can be partitioned into these four roles with limited overlap, and the three dimensions provide a stable way to separate routine research support from frontier discovery.

What would settle it

A systematic review of recent LLM papers in science that reveals frequent unclassifiable cases, high role overlap, or inconsistent separation of support versus discovery tasks along the three dimensions would undermine the framework.

Figures

Figures reproduced from arXiv: 2507.11810 by Haihua Chen, Haoxuan Zhang, Jiangping Chen, Junhua Ding, Ruochi Li, Ting Xiao, Yang Zhang.

**Figure 1.** Figure 1: Trends in annual publication counts for traditional AI-driven versus LLM-driven scientific innovation. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: The dual pathways of scientific innovation-scientific research and discovery, and the evolving roles of [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The pyramidal framework of large language models’ roles in scientific innovation: evaluators, collaborators, [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The evolution of large language models’ roles in scientific innovation with demonstration of existing [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Closed-loop workflow of LLMs as Evaluators. Multimodal embeddings underpin SKS (blue) and SLQA [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: LLMs as collaborators in scientific innovation. LLMs transforming raw knowledge into actionable hypotheses, [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 7.** Figure 7: Taxonomy of LLMs as Scientists. The upper (blue) panel organizes ASR into three strata—fully autonomous [PITH_FULL_IMAGE:figures/full_fig_p037_7.png] view at source ↗

read the original abstract

Large language models (LLMs) are increasingly used in scientific research and discovery, supporting tasks ranging from literature retrieval and synthesis to hypothesis generation, autonomous experimentation, and research evaluation. Existing surveys often conflate scientific research with scientific discovery and typically organize systems by domain, task, or autonomy level alone. In this survey, we propose a four-role framework for understanding LLMs in scientific innovation: Assistant, Collaborator, Scientist, and Evaluator. The framework integrates three complementary dimensions: autonomy level, cognitive function, and scientific innovation, to distinguish research-oriented support from frontier-oriented discovery. We review representative methods, benchmarks, and evaluation practices for each role, examining their capabilities, limitations, and human oversight requirements. Across the literature, Assistant systems are comparatively mature in retrieval and synthesis but remain unreliable in open-ended applications; Collaborator systems expand the space of candidate hypotheses yet struggle with novelty-grounding trade-offs; Scientist systems increasingly automate research workflows but face reliability and safety bottlenecks; and Evaluator systems support review and verification while remaining weak in novelty assessment. We argue that progress in AI for science depends not only on model capability, but also on evaluation, oversight, accountability, and institutional integration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The four-role taxonomy organizes existing LLM-in-science work but leaves role boundaries fuzzy enough that assignments stay subjective.

read the letter

This paper's main contribution is a four-role taxonomy for LLMs in science that adds cognitive function and innovation level to the usual autonomy measures. It separates Assistant and Collaborator on the support side from Scientist and Evaluator on the discovery side. It does well at summarizing what each role can and cannot do right now. The sections on limitations, such as novelty grounding for collaborators or safety for scientist systems, give a practical sense of where things stand. The call for better evaluation and institutional integration is also on point. The soft spots are in the framework itself. The three dimensions do not come with explicit rules for handling overlap, so a hypothesis generator that also checks its own outputs could land in more than one role. Without demarcation criteria or inter-rater checks, the taxonomy stays dependent on the authors' judgment. The literature coverage looks reasonable but the abstract gives no details on search methods or selection criteria. This is for readers who follow AI-for-science work and want an organizing scheme to think about different levels of autonomy and oversight. It would help someone designing benchmarks or thinking about policy. I recommend sending it for peer review. The ideas are clear enough to benefit from referee comments on the boundaries and the scope of the review.

Referee Report

2 major / 2 minor

Summary. The paper proposes a four-role framework for LLMs in scientific innovation—Assistant, Collaborator, Scientist, and Evaluator—integrating three dimensions (autonomy level, cognitive function, and scientific innovation) to distinguish routine research support from frontier-oriented discovery. It reviews representative methods, benchmarks, and evaluation practices for each role, discusses their capabilities and limitations (including human oversight needs), and argues that progress in AI for science requires advances in evaluation, oversight, accountability, and institutional integration beyond model capability alone.

Significance. If the taxonomy can be shown to be stable and reproducible, the framework would provide a useful organizing lens that improves on prior autonomy-only surveys by incorporating cognitive function and innovation dimensions. The structured review of capabilities, limitations, and oversight requirements across roles could help identify specific gaps, such as weak novelty assessment in Evaluator systems and safety bottlenecks in Scientist systems.

major comments (2)

[Framework definition section] Section on the four-role framework: The three dimensions are presented as complementary for role separation, but no explicit demarcation rules, decision criteria, thresholds, or handling of boundary cases (e.g., a hypothesis-generation system that also performs self-evaluation) are supplied. Role assignments therefore depend on author judgment rather than reproducible thresholds, which directly affects whether the taxonomy reliably distinguishes research-oriented support from frontier discovery.
[Literature review / methods] Methods or literature review section: No description is given of the literature search strategy, inclusion/exclusion criteria, or process for selecting representative methods and benchmarks for each role. This absence makes it impossible to evaluate selection bias or coverage, which is load-bearing for the survey's claims about comparative maturity, limitations, and trends across the four roles.

minor comments (2)

[Introduction] The abstract states that existing surveys 'often conflate scientific research with scientific discovery,' but the introduction or related-work section should cite specific prior surveys to ground this contrast.
Figure or table summarizing the three dimensions and role mappings would improve clarity; currently the distinctions are described only in prose.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: [Framework definition section] Section on the four-role framework: The three dimensions are presented as complementary for role separation, but no explicit demarcation rules, decision criteria, thresholds, or handling of boundary cases (e.g., a hypothesis-generation system that also performs self-evaluation) are supplied. Role assignments therefore depend on author judgment rather than reproducible thresholds, which directly affects whether the taxonomy reliably distinguishes research-oriented support from frontier discovery.

Authors: We appreciate the referee's point that the current presentation of the three dimensions does not include explicit demarcation rules, decision criteria, or boundary-case handling. While the manuscript uses the dimensions to conceptually separate roles, we agree that the absence of reproducible thresholds leaves role assignment open to author judgment. In the revised version, we will add a dedicated subsection to the framework definition that specifies decision criteria, provides thresholds where feasible, and includes explicit examples of boundary cases such as hybrid hypothesis-generation and self-evaluation systems. This addition will improve the taxonomy's reproducibility and better demonstrate how it distinguishes routine support from frontier discovery. revision: yes
Referee: [Literature review / methods] Methods or literature review section: No description is given of the literature search strategy, inclusion/exclusion criteria, or process for selecting representative methods and benchmarks for each role. This absence makes it impossible to evaluate selection bias or coverage, which is load-bearing for the survey's claims about comparative maturity, limitations, and trends across the four roles.

Authors: The referee correctly identifies that the manuscript does not describe the literature search strategy, inclusion/exclusion criteria, or selection process for representative methods and benchmarks. This omission limits the ability to assess coverage and potential bias. We will add a new 'Survey Methodology' subsection (placed early in the paper) that details the search strategy, databases and repositories queried, keywords and Boolean strings used, inclusion and exclusion criteria, and the rationale for selecting the representative examples discussed for each role. This revision will increase transparency and allow readers to better evaluate the comparative claims across roles. revision: yes

Circularity Check

0 steps flagged

No significant circularity: framework is an organizing lens drawn from cited literature

full rationale

The paper is a survey that proposes a four-role taxonomy (Assistant, Collaborator, Scientist, Evaluator) by integrating three dimensions (autonomy level, cognitive function, scientific innovation) to partition existing work. No equations, fitted parameters, predictions, or derivations are present. The central claim does not reduce to quantities defined by the authors' own prior work or by construction. Self-citations, if present, are not load-bearing for the taxonomy itself; the framework is presented as a synthesis of the broader literature rather than a self-referential loop. This matches the default expectation for non-mathematical survey papers where the contribution is conceptual organization without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper introduces no free parameters, no new physical or mathematical axioms, and no invented entities. It rests on the domain assumption that LLMs in science can be usefully classified by the stated dimensions and that the reviewed literature is representative.

axioms (1)

domain assumption Existing LLM systems in science can be partitioned into four distinct roles with limited overlap using the dimensions of autonomy level, cognitive function, and scientific innovation.
This assumption underpins the entire framework and is invoked when the authors distinguish research-oriented support from frontier-oriented discovery.

pith-pipeline@v0.9.0 · 5762 in / 1413 out tokens · 30839 ms · 2026-05-19T05:12:45.998745+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a four-role framework ... integrates three complementary dimensions: autonomy level, cognitive function, and scientific innovation
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

pyramidal framework ... Evaluator, Collaborator, and Scientist

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

237 extracted references · 237 canonical work pages · 8 internal anchors

[1]

Litllm: A toolkit for scientific literature review

Shubham Agarwal, Issam Hadj Laradji, Laurent Charlin, and Christopher Pal. Litllm: A toolkit for scientific literature review. ArXiv, abs/2402.01788, 2024

work page arXiv 2024
[2]

Cellvoyager: Ai compbio agent generates new insights by autonomously analyzing biological data

Samuel Alber, Bowen Chen, Eric Sun, Alina Isakova, Aaron James Wilk, and James Zou. Cellvoyager: Ai compbio agent generates new insights by autonomously analyzing biological data. bioRxiv, pages 2025–06, 2025

work page 2025
[3]

A survey on hypothesis generation for scientific discovery in the era of large language models

Atilla Kaan Alkan, Shashwat Sourav, Maja Jablonska, Simone Astarita, Rishabh Chakrabarty, Nikhil Garuda, Pranav Khetarpal, Maciej Pióro, Dimitrios Tanoglidis, Kartheik G Iyer, et al. A survey on hypothesis generation for scientific discovery in the era of large language models. arXiv preprint arXiv:2504.05496, 2025

work page arXiv 2025
[4]

Beyond citations: Measuring novel scientific ideas and their impact in publication text

Sam Arts, Nicola Melluso, and Reinhilde Veugelers. Beyond citations: Measuring novel scientific ideas and their impact in publication text. Review of Economics and Statistics, 2023. doi: https://doi.org/10.1162/rest_a_01561

work page doi:10.1162/rest_a_01561 2023
[5]

PPTAgent: Generating and evaluating presentations beyond text-to-slides

Jinheon Baek, Sujay Kumar Jauhar, Silviu Cucerzan, and Sung Ju Hwang. ResearchAgent: Iterative research idea generation over scientific literature with large language models. In Luis Chiruzzo, Alan Ritter, and Lu Wang, editors, Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human...

work page doi:10.18653/v1/2025 2025
[6]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[7]

Superintelligent agents pose catastrophic risks: Can scientist ai offer a safer path? arXiv preprint arXiv:2502.15657, 2025

Yoshua Bengio, Michael Cohen, Damiano Fornasiere, Joumana Ghosn, Pietro Greiner, Matt MacDermott, Sören Mindermann, Adam Oberman, Jesse Richardson, Oliver Richardson, et al. Superintelligent agents pose catastrophic risks: Can scientist ai offer a safer path? arXiv preprint arXiv:2502.15657, 2025

work page arXiv 2025
[8]

Reasoning language models: A blueprint

Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, et al. Reasoning language models: A blueprint. arXiv preprint arXiv:2501.11223, 2025

work page arXiv 2025
[9]

Super: Evaluating agents on setting up and executing tasks from research repositories

Ben Bogin, Kejuan Yang, Shashank Gupta, Kyle Richardson, Erin Bransom, Peter Clark, Ashish Sabharwal, and Tushar Khot. Super: Evaluating agents on setting up and executing tasks from research repositories. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 12622–12645, 2024

work page 2024
[10]

A., MacKnight, R., & Gomes, G

Daniil A Boiko, Robert MacKnight, and Gabe Gomes. Emergent autonomous scientific research capabilities of large language models. arXiv preprint arXiv:2304.05332, 2023

work page arXiv 2023
[11]

Autonomous chemical research with large language models

Daniil A Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. Autonomous chemical research with large language models. Nature, 624(7992):570–578, 2023. doi: 10.1038/s41586-023-06792-0

work page doi:10.1038/s41586-023-06792-0 2023
[12]

Generative adversarial reviews: When llms become the critic

Nicolas Bougie and Narimasa Watanabe. Generative adversarial reviews: When llms become the critic. arXiv preprint arXiv:2412.10415, 2024

work page arXiv 2024
[13]

Generative retrieval-augmented ontologic graph and multiagent strategies for interpretive large language model-based materials design

Markus J Buehler. Generative retrieval-augmented ontologic graph and multiagent strategies for interpretive large language model-based materials design. ACS Engineering Au , 4(2):241–277, 2024. doi: 10.1021/ acsengineeringau.3c00058

work page 2024
[14]

Nirschl, Laura Bravo-Sánchez, Alejandro Lozano, Sanket Rajan Gupte, Jesus G

James Burgess, Jeffrey J Nirschl, Laura Bravo-Sánchez, Alejandro Lozano, Sanket Rajan Gupte, Jesus G Galaz-Montoya, Yuhui Zhang, Yuchang Su, Disha Bhowmik, Zachary Coman, et al. Microvqa: A multimodal reasoning benchmark for microscopy-based scientific research. arXiv preprint arXiv:2503.13399, 2025

work page arXiv 2025
[15]

Eaira: Establishing a methodology for evaluating ai models as scientific research assistants

Franck Cappello, Sandeep Madireddy, Robert Underwood, Neil Getty, Nicholas Lee-Ping Chia, Nesar Ramachan- dra, Josh Nguyen, Murat Keçeli, Tanwi Mallick, Zilinghan Li, et al. Eaira: Establishing a methodology for evaluating ai models as scientific research assistants. CoRR, 2025

work page 2025
[16]

MLE-bench: Evaluating machine learning agents on machine learning engineering

Jun Shern Chan, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Aleksander Madry, and Lilian Weng. MLE-bench: Evaluating machine learning agents on machine learning engineering. In The Thirteenth International Conference on Learning Representations, 2025

work page 2025
[17]

A joint framework for identifying the type and ar- guments of scientific contribution

Wenhan Chao, Mengyuan Chen, Xian Zhou, and Zhunchen Luo. A joint framework for identifying the type and ar- guments of scientific contribution. Scientometrics, 128(6):3347–3376, 2023. doi: 10.1007/s11192-023-04694-6. 50

work page doi:10.1007/s11192-023-04694-6 2023
[18]

Structuring scientific innovation: A framework for modeling and discovering impactful knowledge combinations

Junlan Chen, Kexin Zhang, Daifeng Li, Yangyang Feng, Yuxuan Zhang, and Bowen Deng. Structuring scientific innovation: A framework for modeling and discovering impactful knowledge combinations. arXiv preprint arXiv:2503.18865, 2025

work page arXiv 2025
[19]

Ai4research: A survey of artificial intelligence for scientific research

Qiguang Chen, Mingda Yang, Libo Qin, Jinhao Liu, Zheng Yan, Jiannan Guan, Dengyun Peng, Yiyan Ji, Hanjing Li, Mengkang Hu, et al. Ai4research: A survey of artificial intelligence for scientific research. arXiv preprint arXiv:2507.01903, 2025

work page arXiv 2025
[20]

Baker, Benjamin Burns, Daniel Adu-Ampratwum, Xuhui Huang, Xia Ning, Song Gao, Yu Su, and Huan Sun

Ziru Chen, Shijie Chen, Yuting Ning, Qianheng Zhang, Boshi Wang, Botao Yu, Yifei Li, Zeyi Liao, Chen Wei, Zitong Lu, Vishal Dey, Mingyi Xue, Frazier N. Baker, Benjamin Burns, Daniel Adu-Ampratwum, Xuhui Huang, Xia Ning, Song Gao, Yu Su, and Huan Sun. Scienceagentbench: Toward rigorous assessment of language agents for data-driven scientific discovery. In ...

work page 2025
[21]

The theoretical and policy implications of knowledge codification

Patrick Cohendet and Frieder Meyer-Krahmer. The theoretical and policy implications of knowledge codification. Research policy, 30(9):1563–1591, 2001. doi: 10.1016/S0048-7333(01)00168-8

work page doi:10.1016/s0048-7333(01)00168-8 2001
[22]

Curie: Evaluating llms on multitask scientific long-context understanding and reasoning

Hao Cui, Zahra Shamsi, Gowoon Cheon, Xuejian Ma, Shutong Li, Maria Tikhanovskaya, Peter Christian Norgaard, Nayantara Mudur, Martyna Beata Plomecka, Paul Raccuglia, et al. Curie: Evaluating llms on multitask scientific long-context understanding and reasoning. In The Thirteenth International Conference on Learning Representations, 2025

work page 2025
[23]

Structured information extraction from scientific text with large language models

John Dagdelen, Alexander Dunn, Sanghoon Lee, Nicholas Walker, Andrew S Rosen, Gerbrand Ceder, Kristin A Persson, and Anubhav Jain. Structured information extraction from scientific text with large language models. Nature Communications, 15(1):1418, 2024. doi: 10.1038/s41467-024-45563-x

work page doi:10.1038/s41467-024-45563-x 2024
[24]

Marg: Multi-agent review generation for scientific papers

Mike D’Arcy, Tom Hope, Larry Birnbaum, and Doug Downey. Marg: Multi-agent review generation for scientific papers. ArXiv, abs/2401.04259, 2024

work page arXiv 2024
[25]

Organa: a robotic assistant for automated chemistry experimentation and characterization

Kourosh Darvish, Marta Skreta, Yuchi Zhao, Naruki Yoshikawa, Sagnik Som, Miroslav Bogdanovic, Yang Cao, Han Hao, Haoping Xu, Alán Aspuru-Guzik, et al. Organa: a robotic assistant for automated chemistry experimentation and characterization. Matter, 8(2), 2025. doi: 10.1016/j.matt.2024.10.015

work page doi:10.1016/j.matt.2024.10.015 2025
[26]

Empowering ai as autonomous researchers: Evaluating llms in generating novel research ideas through automated metrics

Debajyoti Dasgupta, Arijit Mondal, and Partha Pratim Chakrabarti. Empowering ai as autonomous researchers: Evaluating llms in generating novel research ideas through automated metrics. In 2nd AI4Research Workshop: Towards a Knowledge-grounded Scientific Research Lifecycle, 2025

work page 2025
[27]

Matexpert: Decomposing materials discovery by mimicking human experts

Qianggang Ding, Santiago Miret, and Bang Liu. Matexpert: Decomposing materials discovery by mimicking human experts. In The Thirteenth International Conference on Learning Representations, 2024

work page 2024
[28]

Llms assist nlp researchers: Critique paper (meta-) reviewing

Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, et al. Llms assist nlp researchers: Critique paper (meta-) reviewing. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5081–5099, 2024

work page 2024
[29]

Llm4ed: Large language models for automatic equation discovery

Mengge Du, Yuntian Chen, Zhongzheng Wang, Longfeng Nie, and Dongxiao Zhang. Llm4ed: Large language models for automatic equation discovery. CoRR, 2024

work page 2024
[30]

The path to superintelligence: A critical analysis of openai’s five levels of ai progression

Tom Duenas and Diana Ruiz. The path to superintelligence: A critical analysis of openai’s five levels of ai progression. ResearchGate, 2024b. doi, 10, 2024. doi: http://dx.doi.org/10.13140/RG.2.2.33794.70085

work page doi:10.13140/rg.2.2.33794.70085 2024
[31]

Agent ai: Surveying the horizons of multimodal interaction

Zane Durante, Qiuyuan Huang, Naoki Wake, Ran Gong, Jae Sung Park, Bidipta Sarkar, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Yejin Choi, et al. Agent ai: Surveying the horizons of multimodal interaction. CoRR, 2024

work page 2024
[32]

Nlpeer: A unified resource for the computational study of peer review

Nils Dycke, Ilia Kuznetsov, and Iryna Gurevych. Nlpeer: A unified resource for the computational study of peer review. In Annual Meeting of the Association for Computational Linguistics, 2022

work page 2022
[33]

mclm: A function-infused and synthesis-friendly modular chemical language model

Carl Edwards, Chi Han, Gawon Lee, Thao Nguyen, Bowen Jin, Chetan Kumar Prasad, Sara Szymku´c, Bartosz A Grzybowski, Ying Diao, Jiawei Han, et al. mclm: A function-infused and synthesis-friendly modular chemical language model. arXiv preprint arXiv:2505.12565, 2025

work page arXiv 2025
[34]

Steffen Eger, Yong Cao, Jennifer D’Souza, Andreas Geiger, Christian Greisinger, Stephanie Gross, Yufang Hou, Brigitte Krenn, Anne Lauscher, Yizhi Li, et al. Transforming science with large language models: A survey on ai-assisted scientific discovery, experimentation, content generation, and evaluation.arXiv preprint arXiv:2502.05151, 2025

work page arXiv 2025
[35]

Science of science

Santo Fortunato, Carl T Bergstrom, Katy Börner, James A Evans, Dirk Helbing, Staša Milojevi´c, Alexander M Petersen, Filippo Radicchi, Roberta Sinatra, Brian Uzzi, et al. Science of science. Science, 359(6379):eaao0185,

work page
[36]

doi: 10.1126/science.aao0185. 51

work page doi:10.1126/science.aao0185
[37]

Tradition and innovation in scientists’ research strategies

Jacob G Foster, Andrey Rzhetsky, and James A Evans. Tradition and innovation in scientists’ research strategies. American sociological review, 80(5):875–908, 2015. doi: 10.1177/0003122415601618

work page doi:10.1177/0003122415601618 2015
[38]

Boxinggym: Benchmarking progress in automated experimental design and model discovery

Kanishk Gandhi, Michael Y Li, Lyle Goodyear, Louise Li, Aditi Bhaskar, Mohammed Zaman, and Noah D Goodman. Boxinggym: Benchmarking progress in automated experimental design and model discovery. arXiv preprint arXiv:2501.01540, 2025

work page arXiv 2025
[39]

Empowering biomedical discovery with ai agents

Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, and Marinka Zitnik. Empowering biomedical discovery with ai agents. Cell, 187 (22):6125–6151, 2024. doi: https://doi.org/10.1016/j.cell.2024.09.022

work page doi:10.1016/j.cell.2024.09.022 2024
[40]

Reviewagents: Bridging the gap between human and ai-generated paper reviews

Xian Gao, Jiacheng Ruan, Jingsheng Gao, Ting Liu, and Yuzhuo Fu. Reviewagents: Bridging the gap between human and ai-generated paper reviews. CoRR, 2025

work page 2025
[41]

Reviewer2: Optimizing review generation through prompt generation

Zhaolin Gao, Kianté Brantley, and Thorsten Joachims. Reviewer2: Optimizing review generation through prompt generation. arXiv preprint arXiv:2402.10886, 2024

work page arXiv 2024
[42]

Atomagents: Alloy design and discovery through physics-aware multi-modal multi-agent artificial intelligence

Alireza Ghafarollahi and Markus J Buehler. Atomagents: Alloy design and discovery through physics-aware multi-modal multi-agent artificial intelligence. arXiv preprint arXiv:2407.10022, 2024

work page arXiv 2024
[43]

Sciagents: Automating scientific discovery through bioinspired multi- agent intelligent graph reasoning

Alireza Ghafarollahi and Markus J Buehler. Sciagents: Automating scientific discovery through bioinspired multi- agent intelligent graph reasoning. Advanced Materials, page 2413523, 2024. doi: 10.1002/adma.202413523

work page doi:10.1002/adma.202413523 2024
[44]

Automating alloy design and discovery with physics-aware multimodal multiagent ai

Alireza Ghafarollahi and Markus J Buehler. Automating alloy design and discovery with physics-aware multimodal multiagent ai. Proceedings of the National Academy of Sciences, 122(4):e2414074122, 2025. doi: 10.1073/pnas.2414074122

work page doi:10.1073/pnas.2414074122 2025
[45]

Towards an AI co-scientist

Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Fe- lix Weissenberger, Keran Rong, Ryutaro Tanno, et al. Towards an ai co-scientist.arXiv preprint arXiv:2502.18864, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[46]

The concept of entropy in scientometrics and innovation research: An indicator for institutional involvement in scientific and technological developments

Hariolf Grupp. The concept of entropy in scientometrics and innovation research: An indicator for institutional involvement in scientific and technological developments. Scientometrics, 18(3-4):219–239, 1990

work page 1990
[47]

Llms can realize combinatorial creativity: generating creative ideas via llms for scientific research

Tianyang Gu, Jingjin Wang, Zhihao Zhang, and HaoHong Li. Llms can realize combinatorial creativity: generating creative ideas via llms for scientific research. arXiv preprint arXiv:2412.14141, 2024

work page arXiv 2024
[48]

Interesting scientific idea generation using knowledge graphs and llms: Evaluations with 100 research group leaders

Xuemei Gu and Mario Krenn. Interesting scientific idea generation using knowledge graphs and llms: Evaluations with 100 research group leaders. arXiv preprint arXiv:2405.17044, 2024

work page arXiv 2024
[49]

Ideabench: Benchmarking large language models for research idea generation

Sikun Guo, Amir Hassan Shariatmadari, Guangzhi Xiong, Albert Huang, Eric Xie, Stefan Bekiranov, and Aidong Zhang. Ideabench: Benchmarking large language models for research idea generation. arXiv preprint arXiv:2411.02429, 2024

work page arXiv 2024
[50]

De novo generation of sars-cov-2 antibody cdrh3 with a pre-trained generative large language model

Haohuai He, Bing He, Lei Guan, Yu Zhao, Feng Jiang, Guanxing Chen, Qingge Zhu, Calvin Yu-Chian Chen, Ting Li, and Jianhua Yao. De novo generation of sars-cov-2 antibody cdrh3 with a pre-trained generative large language model. Nature Communications, 15(1):6867, 2024. doi: 10.1038/s41467-024-50903-y

work page doi:10.1038/s41467-024-50903-y 2024
[51]

Scisight: Com- bining faceted navigation and research group detection for covid-19 exploratory scientific search

Tom Hope, J Portenoy, K Vasan, J Borchardt, Eric Horvitz, DS Weld, MA Hearst, and Jevin West. Scisight: Com- bining faceted navigation and research group detection for covid-19 exploratory scientific search. Proceedings of the 2020 EMNLP (Systems Demonstrations), Association for Computational Linguistics, 2020

work page 2020
[52]

A computational inflection for scientific discovery

Tom Hope, Doug Downey, Daniel S Weld, Oren Etzioni, and Eric Horvitz. A computational inflection for scientific discovery. Communications of the ACM, 66(8):62–73, 2023. doi: 10.1145/3576896

work page doi:10.1145/3576896 2023
[53]

A new method for measuring the originality of academic articles based on knowledge units in semantic networks

Jianhua Hou, Dongyi Wang, and Jing Li. A new method for measuring the originality of academic articles based on knowledge units in semantic networks. Journal of Informetrics, 16(3):101306, 2022. doi: 10.1016/j.joi.2022. 101306

work page doi:10.1016/j.joi.2022 2022
[54]

Chime: Llm-assisted hierarchical organization of scientific studies for literature review support

Chao-Chun Hsu, Erin Bransom, Jenna Sparks, Bailey Kuehl, Chenhao Tan, David Wadden, Lucy Lu Wang, and Aakanksha Naik. Chime: Llm-assisted hierarchical organization of scientific studies for literature review support. In Findings of the Association for Computational Linguistics ACL 2024, pages 118–132, 2024

work page 2024
[55]

A multi-agent framework for materials laws discovery

Bo Hu, Siyu Liu, Beilin Ye, Yun Hao, and Tongqi Wen. A multi-agent framework for materials laws discovery. arXiv preprint arXiv:2411.16416, 2024

work page arXiv 2024
[56]

Nova: An iterative planning and search approach to enhance novelty and diversity of llm generated ideas

Xiang Hu, Hongyu Fu, Jinge Wang, Yifeng Wang, Zhikun Li, Renjun Xu, Yu Lu, Yaochu Jin, Lili Pan, and Zhenzhong Lan. Nova: An iterative planning and search approach to enhance novelty and diversity of llm generated ideas. arXiv preprint arXiv:2410.14255, 2024

work page arXiv 2024
[57]

Hireview: Hierarchical taxonomy-driven automatic literature review generation

Yuntong Hu, Zhuofeng Li, Zheng Zhang, Chen Ling, Raasikh Kanjiani, Boxin Zhao, and Liang Zhao. Hireview: Hierarchical taxonomy-driven automatic literature review generation. arXiv preprint arXiv:2410.03761, 2024. 52

work page arXiv 2024
[58]

From detection to application: Recent advances in understanding scientific tables and figures

Jiani Huang, Haihua Chen, Fengchang Yu, and Wei Lu. From detection to application: Recent advances in understanding scientific tables and figures. ACM Computing Surveys, 56(10):1–39, 2024. doi: 10.1145/3657285

work page doi:10.1145/3657285 2024
[59]

Crispr-gpt: An llm agent for automated design of gene-editing experiments

Kaixuan Huang, Yuanhao Qu, Henry Cousins, William A Johnson, Di Yin, Mihir Shah, Denny Zhou, Russ Altman, Mengdi Wang, and Le Cong. Crispr-gpt: An llm agent for automated design of gene-editing experiments. arXiv preprint arXiv:2404.18021, 2024

work page arXiv 2024
[60]

Mlagentbench: Evaluating language agents on machine learning experimentation

Qian Huang, Jian V ora, Percy Liang, and Jure Leskovec. Mlagentbench: Evaluating language agents on machine learning experimentation. In Forty-first International Conference on Machine Learning, 2023

work page 2023
[61]

Data multiplexed and hardware reused architecture for deep neural network accelerator,

Shengzhi Huang, Yong Huang, Yinpeng Liu, Zhuoran Luo, and Wei Lu. Are large language models qualified reviewers in originality evaluation? Information Processing & Management, 62(3):103973, 2025. doi: 10.1016/j. ipm.2024.103973

work page doi:10.1016/j 2025
[62]

Papereval: A universal, quantitative, and explainable paper evaluation method powered by a multi-agent system

Shengzhi Huang, Qicong Wang, Wei Lu, Lingyu Liu, Zhenzhen Xu, and Yong Huang. Papereval: A universal, quantitative, and explainable paper evaluation method powered by a multi-agent system. Information Processing & Management, 62(6):104225, 2025

work page 2025
[63]

Olympicarena: Benchmarking multi-discipline cognitive reasoning for superintelligent ai

Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, et al. Olympicarena: Benchmarking multi-discipline cognitive reasoning for superintelligent ai. Advances in Neural Information Processing Systems, 37:19209–19253, 2024

work page 2024
[64]

Openreviewer: A specialized large language model for generating critical scientific paper reviews

Maximilian Idahl and Zahra Ahmadi. Openreviewer: A specialized large language model for generating critical scientific paper reviews. arXiv preprint arXiv:2412.11948, 2024

work page arXiv 2024
[65]

Autonomous llm-driven research—from data to human-verifiable research papers

Tal Ifargan, Lukas Hafner, Maor Kern, Ori Alcalay, and Roy Kishony. Autonomous llm-driven research—from data to human-verifiable research papers. NEJM AI, 2(1):AIoa2400555, 2025. doi: 10.1056/AIoa2400555

work page doi:10.1056/aioa2400555 2025
[66]

Zochi technical report

Intology. Zochi technical report. arXiv, 2025

work page 2025
[67]

Scirex: A challenge dataset for document-level information extraction

Sarthak Jain, Madeleine van Zuylen, Hannaneh Hajishirzi, and Iz Beltagy. Scirex: A challenge dataset for document-level information extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7506–7516, 2020

work page 2020
[68]

Discoveryworld: A virtual environment for developing and evaluating automated scientific discovery agents

Peter Jansen, Marc-Alexandre Côté, Tushar Khot, Erin Bransom, Bhavana Dalvi Mishra, Bodhisattwa Prasad Majumder, Oyvind Tafjord, and Peter Clark. Discoveryworld: A virtual environment for developing and evaluating automated scientific discovery agents. Advances in Neural Information Processing Systems , 37: 10088–10116, 2024

work page 2024
[69]

Weld, and Peter Clark

Peter Jansen, Oyvind Tafjord, Marissa Radensky, Pao Siangliulue, Tom Hope, Bhavana Dalvi Mishra, Bod- hisattwa Prasad Majumder, Daniel S Weld, and Peter Clark. Codescientist: End-to-end semi-automated scientific discovery with code-based experimentation. arXiv preprint arXiv:2503.22708, 2025

work page arXiv 2025
[70]

Llmatdesign: Autonomous materials discovery with large language models

Shuyi Jia, Chao Zhang, and Victor Fung. Llmatdesign: Autonomous materials discovery with large language models. arXiv preprint arXiv:2406.13163, 2024

work page arXiv 2024
[71]

Hegta: Leveraging heterogeneous graph-enhanced large language models for few-shot complex table understanding

Rihui Jin, Yu Li, Guilin Qi, Nan Hu, Yuan-Fang Li, Jiaoyan Chen, Jianan Wang, Yongrui Chen, Dehai Min, and Sheng Bi. Hegta: Leveraging heterogeneous graph-enhanced large language models for few-shot complex table understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 24294–24302, 2025

work page 2025
[72]

Agentreview: Exploring peer review dynamics with llm agents

Yiqiao Jin, Qinlin Zhao, Yiyang Wang, Hao Chen, Kaijie Zhu, Yijia Xiao, and Jindong Wang. Agentreview: Exploring peer review dynamics with llm agents. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 1208–1226, 2024

work page 2024
[73]

DSBench: How far are data science agents from becoming data science experts? In The Thirteenth International Conference on Learning Representations, 2025

Liqiang Jing, Zhehui Huang, Xiaoyang Wang, Wenlin Yao, Wenhao Yu, Kaixin Ma, Hongming Zhang, Xinya Du, and Dong Yu. DSBench: How far are data science agents from becoming data science experts? In The Thirteenth International Conference on Learning Representations, 2025

work page 2025
[74]

Researcharena: Benchmarking llms’ ability to collect and organize information as research agents

Hao Kang and Chenyan Xiong. Researcharena: Benchmarking llms’ ability to collect and organize information as research agents. arXiv preprint arXiv:2406.10291, 2024

work page arXiv 2024
[75]

Chatmof: an artificial intelligence system for predicting and generating metal-organic frameworks using large language models

Yeonghun Kang and Jihan Kim. Chatmof: an artificial intelligence system for predicting and generating metal-organic frameworks using large language models. Nature communications, 15(1):4705, 2024. doi: 10.1038/s41467-024-48998-4

work page doi:10.1038/s41467-024-48998-4 2024
[76]

Scireviewgen: A large-scale dataset for automatic literature review generation

Tetsu Kasanishi, Masaru Isonuma, Junichiro Mori, and Ichiro Sakata. Scireviewgen: A large-scale dataset for automatic literature review generation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6695–6715, 2023

work page 2023
[77]

Sci-idea: Context- aware scientific ideation using token and sentence embeddings

Farhana Keya, Gollam Rabby, Prasenjit Mitra, Sahar Vahdati, Sören Auer, and Yaser Jaradeh. Sci-idea: Context- aware scientific ideation using token and sentence embeddings. arXiv preprint arXiv:2503.19257, 2025. 53

work page arXiv 2025
[78]

Curie: Toward rigorous and automated scientific experimentation with ai agents

Patrick Tser Jern Kon, Jiachen Liu, Qiuyi Ding, Yiming Qiu, Zhenning Yang, Yibo Huang, Jayanth Srini- vasa, Myungjin Lee, Mosharaf Chowdhury, and Ang Chen. Curie: Toward rigorous and automated scientific experimentation with ai agents. CoRR, 2025

work page 2025
[79]

Hypothesis generation for materials discovery and design using goal-driven and constraint-guided LLM agents

Shrinidhi Kumbhar, Venkatesh Mishra, Kevin Coutinho, Divij Handa, Ashif Iquebal, and Chitta Baral. Hypothesis generation for materials discovery and design using goal-driven and constraint-guided LLM agents. In Luis Chiruzzo, Alan Ritter, and Lu Wang, editors, Findings of the Association for Computational Linguistics: NAACL 2025, pages 7524–7555, Albuquer...

work page doi:10.18653/v1/2025.findings-naacl.420 2025
[80]

Transformer-based highlights extraction from scientific papers.Knowledge- Based Systems, 252:109382, 2022

Moreno La Quatra and Luca Cagliero. Transformer-based highlights extraction from scientific papers.Knowledge- Based Systems, 252:109382, 2022. doi: 10.1016/j.knosys.2022.109382

work page doi:10.1016/j.knosys.2022.109382 2022

Showing first 80 references.

[1] [1]

Litllm: A toolkit for scientific literature review

Shubham Agarwal, Issam Hadj Laradji, Laurent Charlin, and Christopher Pal. Litllm: A toolkit for scientific literature review. ArXiv, abs/2402.01788, 2024

work page arXiv 2024

[2] [2]

Cellvoyager: Ai compbio agent generates new insights by autonomously analyzing biological data

Samuel Alber, Bowen Chen, Eric Sun, Alina Isakova, Aaron James Wilk, and James Zou. Cellvoyager: Ai compbio agent generates new insights by autonomously analyzing biological data. bioRxiv, pages 2025–06, 2025

work page 2025

[3] [3]

A survey on hypothesis generation for scientific discovery in the era of large language models

Atilla Kaan Alkan, Shashwat Sourav, Maja Jablonska, Simone Astarita, Rishabh Chakrabarty, Nikhil Garuda, Pranav Khetarpal, Maciej Pióro, Dimitrios Tanoglidis, Kartheik G Iyer, et al. A survey on hypothesis generation for scientific discovery in the era of large language models. arXiv preprint arXiv:2504.05496, 2025

work page arXiv 2025

[4] [4]

Beyond citations: Measuring novel scientific ideas and their impact in publication text

Sam Arts, Nicola Melluso, and Reinhilde Veugelers. Beyond citations: Measuring novel scientific ideas and their impact in publication text. Review of Economics and Statistics, 2023. doi: https://doi.org/10.1162/rest_a_01561

work page doi:10.1162/rest_a_01561 2023

[5] [5]

PPTAgent: Generating and evaluating presentations beyond text-to-slides

Jinheon Baek, Sujay Kumar Jauhar, Silviu Cucerzan, and Sung Ju Hwang. ResearchAgent: Iterative research idea generation over scientific literature with large language models. In Luis Chiruzzo, Alan Ritter, and Lu Wang, editors, Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human...

work page doi:10.18653/v1/2025 2025

[6] [6]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[7] [7]

Superintelligent agents pose catastrophic risks: Can scientist ai offer a safer path? arXiv preprint arXiv:2502.15657, 2025

Yoshua Bengio, Michael Cohen, Damiano Fornasiere, Joumana Ghosn, Pietro Greiner, Matt MacDermott, Sören Mindermann, Adam Oberman, Jesse Richardson, Oliver Richardson, et al. Superintelligent agents pose catastrophic risks: Can scientist ai offer a safer path? arXiv preprint arXiv:2502.15657, 2025

work page arXiv 2025

[8] [8]

Reasoning language models: A blueprint

Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, et al. Reasoning language models: A blueprint. arXiv preprint arXiv:2501.11223, 2025

work page arXiv 2025

[9] [9]

Super: Evaluating agents on setting up and executing tasks from research repositories

Ben Bogin, Kejuan Yang, Shashank Gupta, Kyle Richardson, Erin Bransom, Peter Clark, Ashish Sabharwal, and Tushar Khot. Super: Evaluating agents on setting up and executing tasks from research repositories. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 12622–12645, 2024

work page 2024

[10] [10]

A., MacKnight, R., & Gomes, G

Daniil A Boiko, Robert MacKnight, and Gabe Gomes. Emergent autonomous scientific research capabilities of large language models. arXiv preprint arXiv:2304.05332, 2023

work page arXiv 2023

[11] [11]

Autonomous chemical research with large language models

Daniil A Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. Autonomous chemical research with large language models. Nature, 624(7992):570–578, 2023. doi: 10.1038/s41586-023-06792-0

work page doi:10.1038/s41586-023-06792-0 2023

[12] [12]

Generative adversarial reviews: When llms become the critic

Nicolas Bougie and Narimasa Watanabe. Generative adversarial reviews: When llms become the critic. arXiv preprint arXiv:2412.10415, 2024

work page arXiv 2024

[13] [13]

Generative retrieval-augmented ontologic graph and multiagent strategies for interpretive large language model-based materials design

Markus J Buehler. Generative retrieval-augmented ontologic graph and multiagent strategies for interpretive large language model-based materials design. ACS Engineering Au , 4(2):241–277, 2024. doi: 10.1021/ acsengineeringau.3c00058

work page 2024

[14] [14]

Nirschl, Laura Bravo-Sánchez, Alejandro Lozano, Sanket Rajan Gupte, Jesus G

James Burgess, Jeffrey J Nirschl, Laura Bravo-Sánchez, Alejandro Lozano, Sanket Rajan Gupte, Jesus G Galaz-Montoya, Yuhui Zhang, Yuchang Su, Disha Bhowmik, Zachary Coman, et al. Microvqa: A multimodal reasoning benchmark for microscopy-based scientific research. arXiv preprint arXiv:2503.13399, 2025

work page arXiv 2025

[15] [15]

Eaira: Establishing a methodology for evaluating ai models as scientific research assistants

Franck Cappello, Sandeep Madireddy, Robert Underwood, Neil Getty, Nicholas Lee-Ping Chia, Nesar Ramachan- dra, Josh Nguyen, Murat Keçeli, Tanwi Mallick, Zilinghan Li, et al. Eaira: Establishing a methodology for evaluating ai models as scientific research assistants. CoRR, 2025

work page 2025

[16] [16]

MLE-bench: Evaluating machine learning agents on machine learning engineering

Jun Shern Chan, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Aleksander Madry, and Lilian Weng. MLE-bench: Evaluating machine learning agents on machine learning engineering. In The Thirteenth International Conference on Learning Representations, 2025

work page 2025

[17] [17]

A joint framework for identifying the type and ar- guments of scientific contribution

Wenhan Chao, Mengyuan Chen, Xian Zhou, and Zhunchen Luo. A joint framework for identifying the type and ar- guments of scientific contribution. Scientometrics, 128(6):3347–3376, 2023. doi: 10.1007/s11192-023-04694-6. 50

work page doi:10.1007/s11192-023-04694-6 2023

[18] [18]

Structuring scientific innovation: A framework for modeling and discovering impactful knowledge combinations

Junlan Chen, Kexin Zhang, Daifeng Li, Yangyang Feng, Yuxuan Zhang, and Bowen Deng. Structuring scientific innovation: A framework for modeling and discovering impactful knowledge combinations. arXiv preprint arXiv:2503.18865, 2025

work page arXiv 2025

[19] [19]

Ai4research: A survey of artificial intelligence for scientific research

Qiguang Chen, Mingda Yang, Libo Qin, Jinhao Liu, Zheng Yan, Jiannan Guan, Dengyun Peng, Yiyan Ji, Hanjing Li, Mengkang Hu, et al. Ai4research: A survey of artificial intelligence for scientific research. arXiv preprint arXiv:2507.01903, 2025

work page arXiv 2025

[20] [20]

Baker, Benjamin Burns, Daniel Adu-Ampratwum, Xuhui Huang, Xia Ning, Song Gao, Yu Su, and Huan Sun

Ziru Chen, Shijie Chen, Yuting Ning, Qianheng Zhang, Boshi Wang, Botao Yu, Yifei Li, Zeyi Liao, Chen Wei, Zitong Lu, Vishal Dey, Mingyi Xue, Frazier N. Baker, Benjamin Burns, Daniel Adu-Ampratwum, Xuhui Huang, Xia Ning, Song Gao, Yu Su, and Huan Sun. Scienceagentbench: Toward rigorous assessment of language agents for data-driven scientific discovery. In ...

work page 2025

[21] [21]

The theoretical and policy implications of knowledge codification

Patrick Cohendet and Frieder Meyer-Krahmer. The theoretical and policy implications of knowledge codification. Research policy, 30(9):1563–1591, 2001. doi: 10.1016/S0048-7333(01)00168-8

work page doi:10.1016/s0048-7333(01)00168-8 2001

[22] [22]

Curie: Evaluating llms on multitask scientific long-context understanding and reasoning

Hao Cui, Zahra Shamsi, Gowoon Cheon, Xuejian Ma, Shutong Li, Maria Tikhanovskaya, Peter Christian Norgaard, Nayantara Mudur, Martyna Beata Plomecka, Paul Raccuglia, et al. Curie: Evaluating llms on multitask scientific long-context understanding and reasoning. In The Thirteenth International Conference on Learning Representations, 2025

work page 2025

[23] [23]

Structured information extraction from scientific text with large language models

John Dagdelen, Alexander Dunn, Sanghoon Lee, Nicholas Walker, Andrew S Rosen, Gerbrand Ceder, Kristin A Persson, and Anubhav Jain. Structured information extraction from scientific text with large language models. Nature Communications, 15(1):1418, 2024. doi: 10.1038/s41467-024-45563-x

work page doi:10.1038/s41467-024-45563-x 2024

[24] [24]

Marg: Multi-agent review generation for scientific papers

Mike D’Arcy, Tom Hope, Larry Birnbaum, and Doug Downey. Marg: Multi-agent review generation for scientific papers. ArXiv, abs/2401.04259, 2024

work page arXiv 2024

[25] [25]

Organa: a robotic assistant for automated chemistry experimentation and characterization

Kourosh Darvish, Marta Skreta, Yuchi Zhao, Naruki Yoshikawa, Sagnik Som, Miroslav Bogdanovic, Yang Cao, Han Hao, Haoping Xu, Alán Aspuru-Guzik, et al. Organa: a robotic assistant for automated chemistry experimentation and characterization. Matter, 8(2), 2025. doi: 10.1016/j.matt.2024.10.015

work page doi:10.1016/j.matt.2024.10.015 2025

[26] [26]

Empowering ai as autonomous researchers: Evaluating llms in generating novel research ideas through automated metrics

Debajyoti Dasgupta, Arijit Mondal, and Partha Pratim Chakrabarti. Empowering ai as autonomous researchers: Evaluating llms in generating novel research ideas through automated metrics. In 2nd AI4Research Workshop: Towards a Knowledge-grounded Scientific Research Lifecycle, 2025

work page 2025

[27] [27]

Matexpert: Decomposing materials discovery by mimicking human experts

Qianggang Ding, Santiago Miret, and Bang Liu. Matexpert: Decomposing materials discovery by mimicking human experts. In The Thirteenth International Conference on Learning Representations, 2024

work page 2024

[28] [28]

Llms assist nlp researchers: Critique paper (meta-) reviewing

Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, et al. Llms assist nlp researchers: Critique paper (meta-) reviewing. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5081–5099, 2024

work page 2024

[29] [29]

Llm4ed: Large language models for automatic equation discovery

Mengge Du, Yuntian Chen, Zhongzheng Wang, Longfeng Nie, and Dongxiao Zhang. Llm4ed: Large language models for automatic equation discovery. CoRR, 2024

work page 2024

[30] [30]

The path to superintelligence: A critical analysis of openai’s five levels of ai progression

Tom Duenas and Diana Ruiz. The path to superintelligence: A critical analysis of openai’s five levels of ai progression. ResearchGate, 2024b. doi, 10, 2024. doi: http://dx.doi.org/10.13140/RG.2.2.33794.70085

work page doi:10.13140/rg.2.2.33794.70085 2024

[31] [31]

Agent ai: Surveying the horizons of multimodal interaction

Zane Durante, Qiuyuan Huang, Naoki Wake, Ran Gong, Jae Sung Park, Bidipta Sarkar, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Yejin Choi, et al. Agent ai: Surveying the horizons of multimodal interaction. CoRR, 2024

work page 2024

[32] [32]

Nlpeer: A unified resource for the computational study of peer review

Nils Dycke, Ilia Kuznetsov, and Iryna Gurevych. Nlpeer: A unified resource for the computational study of peer review. In Annual Meeting of the Association for Computational Linguistics, 2022

work page 2022

[33] [33]

mclm: A function-infused and synthesis-friendly modular chemical language model

Carl Edwards, Chi Han, Gawon Lee, Thao Nguyen, Bowen Jin, Chetan Kumar Prasad, Sara Szymku´c, Bartosz A Grzybowski, Ying Diao, Jiawei Han, et al. mclm: A function-infused and synthesis-friendly modular chemical language model. arXiv preprint arXiv:2505.12565, 2025

work page arXiv 2025

[34] [34]

Steffen Eger, Yong Cao, Jennifer D’Souza, Andreas Geiger, Christian Greisinger, Stephanie Gross, Yufang Hou, Brigitte Krenn, Anne Lauscher, Yizhi Li, et al. Transforming science with large language models: A survey on ai-assisted scientific discovery, experimentation, content generation, and evaluation.arXiv preprint arXiv:2502.05151, 2025

work page arXiv 2025

[35] [35]

Science of science

Santo Fortunato, Carl T Bergstrom, Katy Börner, James A Evans, Dirk Helbing, Staša Milojevi´c, Alexander M Petersen, Filippo Radicchi, Roberta Sinatra, Brian Uzzi, et al. Science of science. Science, 359(6379):eaao0185,

work page

[36] [36]

doi: 10.1126/science.aao0185. 51

work page doi:10.1126/science.aao0185

[37] [37]

Tradition and innovation in scientists’ research strategies

Jacob G Foster, Andrey Rzhetsky, and James A Evans. Tradition and innovation in scientists’ research strategies. American sociological review, 80(5):875–908, 2015. doi: 10.1177/0003122415601618

work page doi:10.1177/0003122415601618 2015

[38] [38]

Boxinggym: Benchmarking progress in automated experimental design and model discovery

Kanishk Gandhi, Michael Y Li, Lyle Goodyear, Louise Li, Aditi Bhaskar, Mohammed Zaman, and Noah D Goodman. Boxinggym: Benchmarking progress in automated experimental design and model discovery. arXiv preprint arXiv:2501.01540, 2025

work page arXiv 2025

[39] [39]

Empowering biomedical discovery with ai agents

Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, and Marinka Zitnik. Empowering biomedical discovery with ai agents. Cell, 187 (22):6125–6151, 2024. doi: https://doi.org/10.1016/j.cell.2024.09.022

work page doi:10.1016/j.cell.2024.09.022 2024

[40] [40]

Reviewagents: Bridging the gap between human and ai-generated paper reviews

Xian Gao, Jiacheng Ruan, Jingsheng Gao, Ting Liu, and Yuzhuo Fu. Reviewagents: Bridging the gap between human and ai-generated paper reviews. CoRR, 2025

work page 2025

[41] [41]

Reviewer2: Optimizing review generation through prompt generation

Zhaolin Gao, Kianté Brantley, and Thorsten Joachims. Reviewer2: Optimizing review generation through prompt generation. arXiv preprint arXiv:2402.10886, 2024

work page arXiv 2024

[42] [42]

Atomagents: Alloy design and discovery through physics-aware multi-modal multi-agent artificial intelligence

Alireza Ghafarollahi and Markus J Buehler. Atomagents: Alloy design and discovery through physics-aware multi-modal multi-agent artificial intelligence. arXiv preprint arXiv:2407.10022, 2024

work page arXiv 2024

[43] [43]

Sciagents: Automating scientific discovery through bioinspired multi- agent intelligent graph reasoning

Alireza Ghafarollahi and Markus J Buehler. Sciagents: Automating scientific discovery through bioinspired multi- agent intelligent graph reasoning. Advanced Materials, page 2413523, 2024. doi: 10.1002/adma.202413523

work page doi:10.1002/adma.202413523 2024

[44] [44]

Automating alloy design and discovery with physics-aware multimodal multiagent ai

Alireza Ghafarollahi and Markus J Buehler. Automating alloy design and discovery with physics-aware multimodal multiagent ai. Proceedings of the National Academy of Sciences, 122(4):e2414074122, 2025. doi: 10.1073/pnas.2414074122

work page doi:10.1073/pnas.2414074122 2025

[45] [45]

Towards an AI co-scientist

Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Fe- lix Weissenberger, Keran Rong, Ryutaro Tanno, et al. Towards an ai co-scientist.arXiv preprint arXiv:2502.18864, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[46] [46]

The concept of entropy in scientometrics and innovation research: An indicator for institutional involvement in scientific and technological developments

Hariolf Grupp. The concept of entropy in scientometrics and innovation research: An indicator for institutional involvement in scientific and technological developments. Scientometrics, 18(3-4):219–239, 1990

work page 1990

[47] [47]

Llms can realize combinatorial creativity: generating creative ideas via llms for scientific research

Tianyang Gu, Jingjin Wang, Zhihao Zhang, and HaoHong Li. Llms can realize combinatorial creativity: generating creative ideas via llms for scientific research. arXiv preprint arXiv:2412.14141, 2024

work page arXiv 2024

[48] [48]

Interesting scientific idea generation using knowledge graphs and llms: Evaluations with 100 research group leaders

Xuemei Gu and Mario Krenn. Interesting scientific idea generation using knowledge graphs and llms: Evaluations with 100 research group leaders. arXiv preprint arXiv:2405.17044, 2024

work page arXiv 2024

[49] [49]

Ideabench: Benchmarking large language models for research idea generation

Sikun Guo, Amir Hassan Shariatmadari, Guangzhi Xiong, Albert Huang, Eric Xie, Stefan Bekiranov, and Aidong Zhang. Ideabench: Benchmarking large language models for research idea generation. arXiv preprint arXiv:2411.02429, 2024

work page arXiv 2024

[50] [50]

De novo generation of sars-cov-2 antibody cdrh3 with a pre-trained generative large language model

Haohuai He, Bing He, Lei Guan, Yu Zhao, Feng Jiang, Guanxing Chen, Qingge Zhu, Calvin Yu-Chian Chen, Ting Li, and Jianhua Yao. De novo generation of sars-cov-2 antibody cdrh3 with a pre-trained generative large language model. Nature Communications, 15(1):6867, 2024. doi: 10.1038/s41467-024-50903-y

work page doi:10.1038/s41467-024-50903-y 2024

[51] [51]

Scisight: Com- bining faceted navigation and research group detection for covid-19 exploratory scientific search

Tom Hope, J Portenoy, K Vasan, J Borchardt, Eric Horvitz, DS Weld, MA Hearst, and Jevin West. Scisight: Com- bining faceted navigation and research group detection for covid-19 exploratory scientific search. Proceedings of the 2020 EMNLP (Systems Demonstrations), Association for Computational Linguistics, 2020

work page 2020

[52] [52]

A computational inflection for scientific discovery

Tom Hope, Doug Downey, Daniel S Weld, Oren Etzioni, and Eric Horvitz. A computational inflection for scientific discovery. Communications of the ACM, 66(8):62–73, 2023. doi: 10.1145/3576896

work page doi:10.1145/3576896 2023

[53] [53]

A new method for measuring the originality of academic articles based on knowledge units in semantic networks

Jianhua Hou, Dongyi Wang, and Jing Li. A new method for measuring the originality of academic articles based on knowledge units in semantic networks. Journal of Informetrics, 16(3):101306, 2022. doi: 10.1016/j.joi.2022. 101306

work page doi:10.1016/j.joi.2022 2022

[54] [54]

Chime: Llm-assisted hierarchical organization of scientific studies for literature review support

Chao-Chun Hsu, Erin Bransom, Jenna Sparks, Bailey Kuehl, Chenhao Tan, David Wadden, Lucy Lu Wang, and Aakanksha Naik. Chime: Llm-assisted hierarchical organization of scientific studies for literature review support. In Findings of the Association for Computational Linguistics ACL 2024, pages 118–132, 2024

work page 2024

[55] [55]

A multi-agent framework for materials laws discovery

Bo Hu, Siyu Liu, Beilin Ye, Yun Hao, and Tongqi Wen. A multi-agent framework for materials laws discovery. arXiv preprint arXiv:2411.16416, 2024

work page arXiv 2024

[56] [56]

Nova: An iterative planning and search approach to enhance novelty and diversity of llm generated ideas

Xiang Hu, Hongyu Fu, Jinge Wang, Yifeng Wang, Zhikun Li, Renjun Xu, Yu Lu, Yaochu Jin, Lili Pan, and Zhenzhong Lan. Nova: An iterative planning and search approach to enhance novelty and diversity of llm generated ideas. arXiv preprint arXiv:2410.14255, 2024

work page arXiv 2024

[57] [57]

Hireview: Hierarchical taxonomy-driven automatic literature review generation

Yuntong Hu, Zhuofeng Li, Zheng Zhang, Chen Ling, Raasikh Kanjiani, Boxin Zhao, and Liang Zhao. Hireview: Hierarchical taxonomy-driven automatic literature review generation. arXiv preprint arXiv:2410.03761, 2024. 52

work page arXiv 2024

[58] [58]

From detection to application: Recent advances in understanding scientific tables and figures

Jiani Huang, Haihua Chen, Fengchang Yu, and Wei Lu. From detection to application: Recent advances in understanding scientific tables and figures. ACM Computing Surveys, 56(10):1–39, 2024. doi: 10.1145/3657285

work page doi:10.1145/3657285 2024

[59] [59]

Crispr-gpt: An llm agent for automated design of gene-editing experiments

Kaixuan Huang, Yuanhao Qu, Henry Cousins, William A Johnson, Di Yin, Mihir Shah, Denny Zhou, Russ Altman, Mengdi Wang, and Le Cong. Crispr-gpt: An llm agent for automated design of gene-editing experiments. arXiv preprint arXiv:2404.18021, 2024

work page arXiv 2024

[60] [60]

Mlagentbench: Evaluating language agents on machine learning experimentation

Qian Huang, Jian V ora, Percy Liang, and Jure Leskovec. Mlagentbench: Evaluating language agents on machine learning experimentation. In Forty-first International Conference on Machine Learning, 2023

work page 2023

[61] [61]

Data multiplexed and hardware reused architecture for deep neural network accelerator,

Shengzhi Huang, Yong Huang, Yinpeng Liu, Zhuoran Luo, and Wei Lu. Are large language models qualified reviewers in originality evaluation? Information Processing & Management, 62(3):103973, 2025. doi: 10.1016/j. ipm.2024.103973

work page doi:10.1016/j 2025

[62] [62]

Papereval: A universal, quantitative, and explainable paper evaluation method powered by a multi-agent system

Shengzhi Huang, Qicong Wang, Wei Lu, Lingyu Liu, Zhenzhen Xu, and Yong Huang. Papereval: A universal, quantitative, and explainable paper evaluation method powered by a multi-agent system. Information Processing & Management, 62(6):104225, 2025

work page 2025

[63] [63]

Olympicarena: Benchmarking multi-discipline cognitive reasoning for superintelligent ai

Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, et al. Olympicarena: Benchmarking multi-discipline cognitive reasoning for superintelligent ai. Advances in Neural Information Processing Systems, 37:19209–19253, 2024

work page 2024

[64] [64]

Openreviewer: A specialized large language model for generating critical scientific paper reviews

Maximilian Idahl and Zahra Ahmadi. Openreviewer: A specialized large language model for generating critical scientific paper reviews. arXiv preprint arXiv:2412.11948, 2024

work page arXiv 2024

[65] [65]

Autonomous llm-driven research—from data to human-verifiable research papers

Tal Ifargan, Lukas Hafner, Maor Kern, Ori Alcalay, and Roy Kishony. Autonomous llm-driven research—from data to human-verifiable research papers. NEJM AI, 2(1):AIoa2400555, 2025. doi: 10.1056/AIoa2400555

work page doi:10.1056/aioa2400555 2025

[66] [66]

Zochi technical report

Intology. Zochi technical report. arXiv, 2025

work page 2025

[67] [67]

Scirex: A challenge dataset for document-level information extraction

Sarthak Jain, Madeleine van Zuylen, Hannaneh Hajishirzi, and Iz Beltagy. Scirex: A challenge dataset for document-level information extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7506–7516, 2020

work page 2020

[68] [68]

Discoveryworld: A virtual environment for developing and evaluating automated scientific discovery agents

Peter Jansen, Marc-Alexandre Côté, Tushar Khot, Erin Bransom, Bhavana Dalvi Mishra, Bodhisattwa Prasad Majumder, Oyvind Tafjord, and Peter Clark. Discoveryworld: A virtual environment for developing and evaluating automated scientific discovery agents. Advances in Neural Information Processing Systems , 37: 10088–10116, 2024

work page 2024

[69] [69]

Weld, and Peter Clark

Peter Jansen, Oyvind Tafjord, Marissa Radensky, Pao Siangliulue, Tom Hope, Bhavana Dalvi Mishra, Bod- hisattwa Prasad Majumder, Daniel S Weld, and Peter Clark. Codescientist: End-to-end semi-automated scientific discovery with code-based experimentation. arXiv preprint arXiv:2503.22708, 2025

work page arXiv 2025

[70] [70]

Llmatdesign: Autonomous materials discovery with large language models

Shuyi Jia, Chao Zhang, and Victor Fung. Llmatdesign: Autonomous materials discovery with large language models. arXiv preprint arXiv:2406.13163, 2024

work page arXiv 2024

[71] [71]

Hegta: Leveraging heterogeneous graph-enhanced large language models for few-shot complex table understanding

Rihui Jin, Yu Li, Guilin Qi, Nan Hu, Yuan-Fang Li, Jiaoyan Chen, Jianan Wang, Yongrui Chen, Dehai Min, and Sheng Bi. Hegta: Leveraging heterogeneous graph-enhanced large language models for few-shot complex table understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 24294–24302, 2025

work page 2025

[72] [72]

Agentreview: Exploring peer review dynamics with llm agents

Yiqiao Jin, Qinlin Zhao, Yiyang Wang, Hao Chen, Kaijie Zhu, Yijia Xiao, and Jindong Wang. Agentreview: Exploring peer review dynamics with llm agents. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 1208–1226, 2024

work page 2024

[73] [73]

DSBench: How far are data science agents from becoming data science experts? In The Thirteenth International Conference on Learning Representations, 2025

Liqiang Jing, Zhehui Huang, Xiaoyang Wang, Wenlin Yao, Wenhao Yu, Kaixin Ma, Hongming Zhang, Xinya Du, and Dong Yu. DSBench: How far are data science agents from becoming data science experts? In The Thirteenth International Conference on Learning Representations, 2025

work page 2025

[74] [74]

Researcharena: Benchmarking llms’ ability to collect and organize information as research agents

Hao Kang and Chenyan Xiong. Researcharena: Benchmarking llms’ ability to collect and organize information as research agents. arXiv preprint arXiv:2406.10291, 2024

work page arXiv 2024

[75] [75]

Chatmof: an artificial intelligence system for predicting and generating metal-organic frameworks using large language models

Yeonghun Kang and Jihan Kim. Chatmof: an artificial intelligence system for predicting and generating metal-organic frameworks using large language models. Nature communications, 15(1):4705, 2024. doi: 10.1038/s41467-024-48998-4

work page doi:10.1038/s41467-024-48998-4 2024

[76] [76]

Scireviewgen: A large-scale dataset for automatic literature review generation

Tetsu Kasanishi, Masaru Isonuma, Junichiro Mori, and Ichiro Sakata. Scireviewgen: A large-scale dataset for automatic literature review generation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6695–6715, 2023

work page 2023

[77] [77]

Sci-idea: Context- aware scientific ideation using token and sentence embeddings

Farhana Keya, Gollam Rabby, Prasenjit Mitra, Sahar Vahdati, Sören Auer, and Yaser Jaradeh. Sci-idea: Context- aware scientific ideation using token and sentence embeddings. arXiv preprint arXiv:2503.19257, 2025. 53

work page arXiv 2025

[78] [78]

Curie: Toward rigorous and automated scientific experimentation with ai agents

Patrick Tser Jern Kon, Jiachen Liu, Qiuyi Ding, Yiming Qiu, Zhenning Yang, Yibo Huang, Jayanth Srini- vasa, Myungjin Lee, Mosharaf Chowdhury, and Ang Chen. Curie: Toward rigorous and automated scientific experimentation with ai agents. CoRR, 2025

work page 2025

[79] [79]

Hypothesis generation for materials discovery and design using goal-driven and constraint-guided LLM agents

Shrinidhi Kumbhar, Venkatesh Mishra, Kevin Coutinho, Divij Handa, Ashif Iquebal, and Chitta Baral. Hypothesis generation for materials discovery and design using goal-driven and constraint-guided LLM agents. In Luis Chiruzzo, Alan Ritter, and Lu Wang, editors, Findings of the Association for Computational Linguistics: NAACL 2025, pages 7524–7555, Albuquer...

work page doi:10.18653/v1/2025.findings-naacl.420 2025

[80] [80]

Transformer-based highlights extraction from scientific papers.Knowledge- Based Systems, 252:109382, 2022

Moreno La Quatra and Luca Cagliero. Transformer-based highlights extraction from scientific papers.Knowledge- Based Systems, 252:109382, 2022. doi: 10.1016/j.knosys.2022.109382

work page doi:10.1016/j.knosys.2022.109382 2022