pith. sign in

arxiv: 2412.14642 · v4 · pith:LWUXMX44new · submitted 2024-12-19 · 💻 cs.CL

Speak-to-Structure: Evaluating LLMs in Open-domain Natural Language-Driven Molecule Generation

Pith reviewed 2026-05-25 08:08 UTC · model grok-4.3

classification 💻 cs.CL
keywords molecule generationlarge language modelsbenchmarknatural languageinstruction tuningmolecular designone-to-many mappingopen-ended generation
0
0 comments X

The pith

A new instruction dataset lets an 8B model surpass GPT-4o and Claude-3.5 on open-ended molecule generation from text.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to replace one-to-one molecule-text benchmarks that test recall of single answers with a new benchmark that requires models to produce multiple valid molecules from open natural language instructions. It defines three tasks that each test a different part of molecule design: editing existing structures, optimizing them toward desired properties, and generating custom molecules from scratch. The authors also release a large instruction-tuning collection that, when applied to Llama3.1-8B, produces higher scores than the strongest closed models across those tasks. If the results hold, the work shows that targeted data can move LLMs from pattern matching toward flexible molecular design.

Core claim

S^2-Bench is the first benchmark built explicitly for one-to-many natural language to molecule mappings through the tasks MolEdit, MolOpt, and MolCustom; when Llama3.1-8B is tuned on the accompanying OpenMolIns collection it exceeds GPT-4o and Claude-3.5 on the benchmark, demonstrating that open models can achieve stronger open-ended generation once the evaluation moves away from single-answer retrieval.

What carries the argument

S^2-Bench, a benchmark of three tasks (molecule editing, optimization, and customized generation) that each demand multiple chemically valid outputs from a single natural language prompt, together with the OpenMolIns instruction-tuning dataset that supplies the training signal.

If this is right

  • Evaluation of LLMs for molecule work must shift from single-answer retrieval to measuring the ability to produce diverse valid candidates.
  • Instruction tuning on one-to-many data can raise smaller open models above larger proprietary models on realistic design tasks.
  • The three tasks together provide separate probes for editing, property-directed change, and de-novo generation capabilities.
  • Results across 31 models establish a new baseline that future systems can be compared against on open-domain molecule generation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same one-to-many framing could be applied to other generative scientific tasks such as reaction planning or material property specification.
  • If the benchmark correlates with downstream usefulness, it could become a standard filter before any generated molecules are sent for laboratory synthesis.
  • Success with an 8B model suggests that deployment cost for language-driven molecule tools can be lowered without sacrificing output quality.
  • The gap between models may close further if future tuning data explicitly rewards chemical validity checks inside the generation loop.

Load-bearing premise

Performance on the three tasks actually reflects understanding of molecular structure and properties rather than skill at following instructions or matching surface patterns in text.

What would settle it

A controlled test in which the tuned 8B model and the larger models are given prompts that require generation of molecules with entirely novel scaffolds or property combinations absent from the training data, then checked for whether the smaller model still leads while producing chemically invalid structures.

Figures

Figures reproduced from arXiv: 2412.14642 by Changmeng Zheng, Dongzhan Zhou, Jiatong Li, Junxian Li, Qing Li, Weida Wang, Xiao-Yong Wei, Yatao Bian, Yunqing Liu.

Figure 1
Figure 1. Figure 1: Task illustration of S2 -Bench for open domain natural language-driven molecule generation. In contrast to text-based target molecule generation, multiple valid molecules may fulfill the textual requirements (right of the arrow). texts like ChEBI-20 (Edwards et al., 2022) and PubChem324K (Liu et al., 2023), are constructed based on a one-to-one mapping assumption, where each textual description is linked t… view at source ↗
Figure 2
Figure 2. Figure 2: The performance of LLMs benchmarked in S2 -Bench. LLMs fall into 4 categories: Proprietary Models, Open-source General LLMs, Open-source ChEBI-20 Fine-tuned LLMs, and OpenMolIns Fine-tuned LLMs. Models of unknown parameters are denoted as horizontal lines. OpenMolIns Fine-tuned LLMs. We further fine-tune LLMs like Galactica-125M (Taylor et al., 2022), Llama3.2-1B-Instruct, and Llama3.1-8B-Instruct on OpenM… view at source ↗
Figure 3
Figure 3. Figure 3: Task-specific performance scaling with increasing data in S [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Data construction workflow of S2 -Bench & OpenMolIns. OpenMolIns, we exclude all the molecules in Zinc250K to avoid data leakage and ensure the novelty score of the generated molecules. D.1 MOLEDIT For the MolEdit task, we consider the common operations on modifying functional groups in a given molecule (i.e., add, drop, and substitute), which are simple tasks for human experts but challenging to LLMs. In … view at source ↗
Figure 5
Figure 5. Figure 5: Performance Comparison across several representative LLMs on S [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Case study of AddComponent. In this example, we require LLMs to add a hydroxyl to [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Case study of DelComponent. In this example, we require LLMs to remove an amine from [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Case study of SubComponent. In this example, we require LLMs to substitute the hydroxyl [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Case study of LogP. In this example, we require LLMs to modfify the molecule to have a [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Case study of MR. In this example, we require LLMs to modify the molecule to have [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Case study of QED. In this example, we require LLMs to lower the QED value of the [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Case study of AtomNum. In this example, we require LLMs to generate a molecule with [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Case study of BondNum. In this example, we require LLMs to generate a molecule with 6 [PITH_FULL_IMAGE:figures/full_fig_p026_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Case study of FunctionalGroup. In this example, we require LLMs to generate a molecule [PITH_FULL_IMAGE:figures/full_fig_p027_14.png] view at source ↗
read the original abstract

Recently, Large Language Models (LLMs) have demonstrated great potential in natural language-driven molecule discovery. However, existing datasets and benchmarks for molecule-text alignment are predominantly built on one-to-one mappings, measuring LLMs' ability to retrieve a single, pre-defined answer, rather than their creative potential to generate diverse, yet equally valid, molecular candidates. To address this critical gap, we propose Speak-to-Structure (S^2-Bench), the first benchmark to evaluate LLMs in open-domain natural language-driven molecule generation. S^2-Bench is specifically designed for one-to-many relationships, challenging LLMs to exhibit genuine molecular understanding and open-ended generation capabilities. Our benchmark includes three key tasks: molecule editing (MolEdit), molecule optimization (MolOpt), and customized molecule generation (MolCustom), each probing a different aspect of molecule discovery. We also introduce OpenMolIns, a large-scale instruction tuning dataset that enables Llama3.1-8B to surpass the most powerful LLMs like GPT-4o and Claude-3.5 on S^2-Bench. Our comprehensive evaluation of 31 LLMs shifts the focus from simple pattern recall to realistic molecular design, paving the way for more capable LLMs in natural language-driven molecule discovery. Our codes and datasets are fully accessible through the Github Repository: https://github.com/phenixace/S2-TOMG-Bench and Huggingface Datasets: https://huggingface.co/datasets/phenixace/S2-TOMG-Bench.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces S^2-Bench, the first benchmark for open-domain natural language-driven molecule generation emphasizing one-to-many mappings via three tasks (MolEdit, MolOpt, MolCustom). It also releases OpenMolIns, a large-scale instruction-tuning dataset, and reports that fine-tuning Llama3.1-8B on it enables the model to outperform GPT-4o and Claude-3.5 on the benchmark. The work evaluates 31 LLMs and releases code and data to shift evaluation focus from pattern recall to realistic molecular design.

Significance. If the automated metrics and tasks demonstrably isolate chemical reasoning and validity rather than instruction following or SMILES fluency, the benchmark and dataset release would provide a valuable, reproducible resource for advancing LLM evaluation in chemistry applications. The reported result that a fine-tuned 8B model surpasses frontier models would be noteworthy for accessibility in molecular design if substantiated.

major comments (3)
  1. [§3, §4.2] §3 (Benchmark Construction) and §4.2 (Metric Definitions): The scoring functions for the one-to-many tasks are not specified in sufficient detail to confirm they enforce chemical validity (e.g., via RDKit sanitization or expert-validated substructure checks) or diversity independent of surface-level prompt matching; without this, the central claim that the benchmark probes 'genuine molecular understanding' cannot be verified.
  2. [§4.1] §4.1 (Evaluation Setup): No controls are described (non-chemical text baselines, random SMILES generators, or blind chemist ratings) to show that high scores on MolEdit/MolOpt/MolCustom reflect coherent molecular outputs rather than fluent instruction following; this directly undermines the reported superiority of Llama3.1-8B + OpenMolIns.
  3. [§5] §5 (Main Results): The performance tables claim Llama3.1-8B surpasses GPT-4o/Claude-3.5, but without reported validity rates, inter-metric correlations, or ablation on chemical vs. non-chemical prompts, the cross-model comparison is not load-bearing for the 'shift to realistic molecular design' conclusion.
minor comments (2)
  1. The GitHub and Hugging Face links are provided but the manuscript should include a brief reproducibility checklist (e.g., exact prompting templates and random seeds) to match the 'fully accessible' claim.
  2. [§3.3] Notation for task variants (e.g., exact input/output formats for MolCustom) could be clarified with an example table to aid readers unfamiliar with SMILES-based generation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us identify areas for improvement in the manuscript. We address each of the major comments below, indicating the revisions we plan to make. We believe these changes will enhance the clarity and rigor of our work on S^2-Bench and OpenMolIns.

read point-by-point responses
  1. Referee: [§3, §4.2] §3 (Benchmark Construction) and §4.2 (Metric Definitions): The scoring functions for the one-to-many tasks are not specified in sufficient detail to confirm they enforce chemical validity (e.g., via RDKit sanitization or expert-validated substructure checks) or diversity independent of surface-level prompt matching; without this, the central claim that the benchmark probes 'genuine molecular understanding' cannot be verified.

    Authors: We thank the referee for this observation. The current manuscript provides an overview of the metrics in §4.2, but we agree that more granular details are needed to fully substantiate the claims. In the revised manuscript, we will elaborate on the scoring functions, explicitly describing the RDKit-based sanitization process for validity and the specific diversity metrics employed to ensure they are independent of prompt surface features. This will allow readers to verify the benchmark's emphasis on molecular understanding. revision: yes

  2. Referee: [§4.1] §4.1 (Evaluation Setup): No controls are described (non-chemical text baselines, random SMILES generators, or blind chemist ratings) to show that high scores on MolEdit/MolOpt/MolCustom reflect coherent molecular outputs rather than fluent instruction following; this directly undermines the reported superiority of Llama3.1-8B + OpenMolIns.

    Authors: This is a valid point regarding the need for controls to isolate chemical reasoning from general instruction-following capabilities. Although our evaluation of 31 LLMs shows performance variations that suggest more than fluency, we did not include explicit non-chemical baselines. We will add such controls in the revision, including results from random SMILES generation and a baseline using non-chemical prompts, to directly address this concern and strengthen the evidence for the superiority of the fine-tuned model. revision: yes

  3. Referee: [§5] §5 (Main Results): The performance tables claim Llama3.1-8B surpasses GPT-4o/Claude-3.5, but without reported validity rates, inter-metric correlations, or ablation on chemical vs. non-chemical prompts, the cross-model comparison is not load-bearing for the 'shift to realistic molecular design' conclusion.

    Authors: We concur that supplementary statistics would make the results more compelling. The revised manuscript will include validity rates for generated molecules across models, inter-metric correlation analyses, and an ablation study on chemical versus non-chemical prompts. These additions will provide a more robust foundation for our conclusions regarding the shift toward realistic molecular design. revision: yes

Circularity Check

0 steps flagged

No circularity; benchmark and dataset are self-contained empirical contributions

full rationale

The paper introduces S^2-Bench (with tasks MolEdit, MolOpt, MolCustom) and OpenMolIns dataset as new artifacts for LLM evaluation in molecule generation. No equations, derivations, fitted parameters, or predictions appear in the provided text. Central claims rest on empirical results from evaluating 31 LLMs rather than any reduction to self-defined inputs or self-citation chains. The benchmark's one-to-many framing and metrics are asserted as novel design choices without invoking prior self-work as a uniqueness theorem or ansatz. This is a standard non-circular case of new resource creation and benchmarking.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is empirical and introduces no mathematical derivations, free parameters, or new physical entities; it relies on standard assumptions from cheminformatics and LLM evaluation.

axioms (1)
  • domain assumption Molecule validity and property changes can be assessed automatically from generated structures in the benchmark tasks
    Required for scoring the MolEdit, MolOpt, and MolCustom tasks but not detailed in the abstract.

pith-pipeline@v0.9.0 · 5835 in / 1343 out tokens · 41259 ms · 2026-05-25T08:08:07.784469+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MoleCode unlocks structural intelligence in large language models

    q-bio.BM 2026-05 unverdicted novelty 7.0

    MoleCode is a training-free, LLM-native representation that makes molecular graphs with explicit atoms, bonds, and topology directly readable and editable in language models, improving structural tasks over implicit s...

  2. MolViBench: Evaluating LLMs on Molecular Vibe Coding

    cs.CL 2026-05 unverdicted novelty 7.0

    MolViBench is the first benchmark designed to evaluate LLMs on generating executable programs for molecular tasks in drug discovery.

  3. How Creative Are Large Language Models in Generating Molecules?

    cs.CL 2026-04 unverdicted novelty 7.0

    Large language models exhibit distinct creative patterns in molecule generation, including higher constraint satisfaction when more constraints are added, and this is the first work to reframe molecule generation abil...

  4. TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration

    cs.AI 2026-04 unverdicted novelty 6.0

    TREX automates the LLM training lifecycle via collaborative agents and tree-based exploration, delivering consistent performance gains across 10 real-world fine-tuning tasks in FT-Bench.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · cited by 4 Pith papers · 10 internal anchors

  1. [1]

    Claude-3, 2024 a

    Anthropic. Claude-3, 2024 a . URL https://www.anthropic.com/news/claude-3-family

  2. [2]

    Claude-3.5, 2024 b

    Anthropic. Claude-3.5, 2024 b . URL https://www.anthropic.com/news/claude-3-5-sonnet

  3. [3]

    Lead optimization in drug discovery

    Mariana Pegrucci Barcelos, Suzane Quintana Gomes, Leonardo Bruno Federico, Isaque Antonio Galindo Francischini, Lorane Izabel da Silva Hage-Melim, Guilherme Martins Silva, and Carlos Henrique Tomich de Paula da Silva. Lead optimization in drug discovery. In Research topics in bioactivity, environment and energy: experimental and theoretical tools, pp.\ 48...

  4. [4]

    Unsupervised data base clustering based on daylight's fingerprint and tanimoto similarity: A fast and automated way to cluster small and large data sets

    Darko Butina. Unsupervised data base clustering based on daylight's fingerprint and tanimoto similarity: A fast and automated way to cluster small and large data sets. Journal of Chemical Information and Computer Sciences, 39 0 (4): 0 747--750, 1999

  5. [5]

    Microsoft COCO Captions: Data Collection and Evaluation Server

    Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Doll \'a r, and C Lawrence Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015

  6. [6]

    Gemini-1.5-pro, 2024

    Google Deepmind. Gemini-1.5-pro, 2024. URL https://deepmind.google/technologies/gemini/pro/

  7. [7]

    Molgensurvey: A systematic survey in machine learning models for molecule design

    Yuanqi Du, Tianfan Fu, Jimeng Sun, and Shengchao Liu. Molgensurvey: A systematic survey in machine learning models for molecule design. arXiv preprint arXiv:2203.14500, 2022

  8. [8]

    The Llama 3 Herd of Models

    Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

  9. [9]

    Text2mol: Cross-modal molecule retrieval with natural language queries

    Carl Edwards, ChengXiang Zhai, and Heng Ji. Text2mol: Cross-modal molecule retrieval with natural language queries. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.\ 595--607, 2021

  10. [10]

    Translation between molecules and natural language

    Carl Edwards, Tuan Lai, Kevin Ros, Garrett Honke, Kyunghyun Cho, and Heng Ji. Translation between molecules and natural language. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.\ 375--413, 2022

  11. [11]

    L+ m-24: Building a dataset for language+ molecules@ acl 2024

    Carl Edwards, Qingyun Wang, Lawrence Zhao, and Heng Ji. L+ m-24: Building a dataset for language+ molecules@ acl 2024. In The 1st Workshop on Language+ Molecules, pp.\ 1, 2024

  12. [12]

    The lab of the future: Self-driving labs for molecule discovery

    Sean Ekins. The lab of the future: Self-driving labs for molecule discovery. GEN Biotechnology, 3 0 (2): 0 83--86, 2024

  13. [13]

    Mol-instructions: A large-scale biomolecular instruction dataset for large language models

    Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, and Huajun Chen. Mol-instructions: A large-scale biomolecular instruction dataset for large language models. The Twelfth International Conference on Learning Representations, 2024

  14. [14]

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Team GLM, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, et al. Chatglm: A family of large language models from glm-130b to glm-4 all tools. arXiv preprint arXiv:2406.12793, 2024

  15. [15]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025

  16. [16]

    Material design for next-generation mrna vaccines using lipid nanoparticles

    Akon Higuchi, Tzu-Cheng Sung, Ting Wang, Qing-Dong Ling, S Suresh Kumar, Shih-Tien Hsu, and Akihiro Umezawa. Material design for next-generation mrna vaccines using lipid nanoparticles. Polymer Reviews, 63 0 (2): 0 394--436, 2023

  17. [17]

    Mistral 7B

    Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023

  18. [18]

    The chemical basis of pharmacology

    Michael J Keiser, John J Irwin, and Brian K Shoichet. The chemical basis of pharmacology. Biochemistry, 49 0 (48): 0 10267--10276, 2010

  19. [19]

    Pubchem 2019 update: improved access to chemical data

    Sunghwan Kim, Jie Chen, Tiejun Cheng, Asta Gindulyte, Jia He, Siqian He, Qingliang Li, Benjamin A Shoemaker, Paul A Thiessen, Bo Yu, et al. Pubchem 2019 update: improved access to chemical data. Nucleic acids research, 47 0 (D1): 0 D1102--D1109, 2019

  20. [20]

    Selfies: a robust representation of semantically constrained graphs with an example application in chemistry

    Mario Krenn, Florian H \"a se, Pascal Friederich, and Al \'a n Aspuru-Guzik. Selfies: a robust representation of semantically constrained graphs with an example application in chemistry. arXiv preprint arXiv:1905.13741, 1 0 (3), 2019

  21. [21]

    Rdkit documentation

    Greg Landrum. Rdkit documentation. Release, 1 0 (1-79): 0 4, 2013

  22. [22]

    Evaluating determinant priority of license fee in biotech industry

    Jeong Hee Lee, Tae-Eung Sung, Eungdo Kim, and Kwangsoo Shin. Evaluating determinant priority of license fee in biotech industry. Journal of Open Innovation: Technology, Market, and Complexity, 4 0 (3): 0 30, 2018

  23. [23]

    Large language models are in-context molecule learners

    Jiatong Li, Wei Liu, Zhihao Ding, Wenqi Fan, Yuqiang Li, and Qing Li. Large language models are in-context molecule learners. arXiv preprint arXiv:2403.04197, 2024 a

  24. [24]

    Empowering molecule discovery for molecule-caption translation with large language models: A chatgpt perspective

    Jiatong Li, Yunqing Liu, Wenqi Fan, Xiao-Yong Wei, Hui Liu, Jiliang Tang, and Qing Li. Empowering molecule discovery for molecule-caption translation with large language models: A chatgpt perspective. IEEE Transactions on Knowledge and Data Engineering, 2024 b

  25. [25]

    MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts

    Jiatong Li, Yunqing Liu, Wei Liu, Jingdi Le, Di Zhang, Wenqi Fan, Dongzhan Zhou, Yuqiang Li, and Qing Li. Molreflect: Towards in-context fine-grained alignments between molecules and texts. arXiv preprint arXiv:2411.14721, 2024 c

  26. [26]

    Mol-r1: Towards explicit long-cot reasoning in molecule discovery

    Jiatong Li, Weida Wang, Qinggang Zhang, Junxian Li, Di Zhang, Changmeng Zheng, Shufei Zhang, Xiaoyong Wei, and Qing Li. Mol-r1: Towards explicit long-cot reasoning in molecule discovery. arXiv preprint arXiv:2508.08401, 2025

  27. [27]

    Towards 3d molecule-text interpretation in language models

    Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, and Qi Tian. Towards 3d molecule-text interpretation in language models. The Twelfth International Conference on Learning Representations, 2024 d

  28. [28]

    Molca: Molecular graph-language modeling with cross-modal projector and uni-modal adapter

    Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, and Tat-Seng Chua. Molca: Molecular graph-language modeling with cross-modal projector and uni-modal adapter. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.\ 15623--15638, 2023

  29. [29]

    Automated end-to-end workflow for volumetric mass-transfer coefficient (k l a) characterization in small-molecule pharmaceutical development

    Keith Mattern and Shane T Grosser. Automated end-to-end workflow for volumetric mass-transfer coefficient (k l a) characterization in small-molecule pharmaceutical development. Organic Process Research & Development, 27 0 (11): 0 1992--2009, 2023

  30. [30]

    Biot5: Enriching cross-modal integration in biology with chemical knowledge and natural language associations

    Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, and Rui Yan. Biot5: Enriching cross-modal integration in biology with chemical knowledge and natural language associations. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.\ 1102--1123, 2023

  31. [31]

    Rethinking molecular similarity: comparing compounds on the basis of biological activity

    Paula M Petrone, Benjamin Simms, Florian Nigsch, Eugen Lounkine, Peter Kutchukian, Allen Cornett, Zhan Deng, John W Davies, Jeremy L Jenkins, and Meir Glick. Rethinking molecular similarity: comparing compounds on the basis of biological activity. ACS chemical biology, 7 0 (8): 0 1399--1409, 2012

  32. [32]

    Zinc 15--ligand discovery for everyone

    Teague Sterling and John J Irwin. Zinc 15--ligand discovery for everyone. Journal of chemical information and modeling, 55 0 (11): 0 2324--2337, 2015

  33. [33]

    A molecular multimodal foundation model associating molecule graphs with natural language

    Bing Su, Dazhao Du, Zhao Yang, Yujie Zhou, Jiangmeng Li, Anyi Rao, Hao Sun, Zhiwu Lu, and Ji-Rong Wen. A molecular multimodal foundation model associating molecule graphs with natural language. arXiv preprint arXiv:2209.05481, 2022

  34. [34]

    Galactica: A Large Language Model for Science

    Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, and Robert Stojnic. Galactica: A large language model for science. arXiv preprint arXiv:2211.09085, 2022

  35. [35]

    Gemma 3 Technical Report

    Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ram \'e , Morgane Rivi \`e re, et al. Gemma 3 technical report. arXiv preprint arXiv:2503.19786, 2025

  36. [36]

    Smiles, a chemical language and information system

    David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28 0 (1): 0 31--36, 1988

  37. [37]

    Qwen2 Technical Report

    An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, et al. Qwen2 technical report. arXiv preprint arXiv:2407.10671, 2024

  38. [38]

    Yi: Open Foundation Models by 01.AI

    Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, et al. Yi: Open foundation models by 01. ai. arXiv preprint arXiv:2403.04652, 2024

  39. [39]

    A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals

    Zheni Zeng, Yuan Yao, Zhiyuan Liu, and Maosong Sun. A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals. Nature communications, 13 0 (1): 0 862, 2022

  40. [40]

    Chemllm: A chemical large language model

    Di Zhang, Wei Liu, Qian Tan, Jingdan Chen, Hang Yan, Yuliang Yan, Jiatong Li, Weiran Huang, Xiangyu Yue, Dongzhan Zhou, et al. Chemllm: A chemical large language model. arXiv preprint arXiv:2402.06852, 2024

  41. [41]

    Uni-mol: A universal 3d molecular representation learning framework

    Gengmo Zhou, Zhifeng Gao, Qiankun Ding, Hang Zheng, Hongteng Xu, Zhewei Wei, Linfeng Zhang, and Guolin Ke. Uni-mol: A universal 3d molecular representation learning framework. The Eleventh International Conference on Learning Representations, 2023

  42. [42]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  43. [43]

    @esa (Ref

    \@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

  44. [44]

    \@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

  45. [45]

    L + M -24: Building a Dataset for L anguage+ M olecules @ ACL 2024

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...