arxiv: 2604.21508 · v1 · submitted 2026-04-23 · 💻 cs.AI · q-bio.BM

Recognition: unknown

BioMiner: A Multi-modal System for Automated Mining of Protein-Ligand Bioactivity Data from Literature

Jiaxian Yan , Jintao Zhu , Yuhang Yang , Qi Liu , Kai Zhang , Zaixi Zhang , Xukai Liu , Boyan Zhang

show 3 more authors

Kaiyuan Gao Jinchuan Xiao Enhong Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-09 21:28 UTC · model grok-4.3

classification 💻 cs.AI q-bio.BM

keywords bioactivity extractionmulti-modal LLMprotein-ligand dataliterature miningchemical structure reconstructiondrug discoveryBioVista benchmark

0 comments

The pith

BioMiner separates semantic reasoning from exact ligand structure reconstruction to extract protein-ligand bioactivity data from literature.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BioMiner to overcome the limits of manual curation for the growing volume of protein-ligand bioactivity data in publications. It does this by splitting the task into two parts: direct reasoning over biochemical meaning in text, tables, and figures, and a separate step for building precise chemical structures. The structure step uses multi-modal large language models on chemically grounded visual representations, then hands exact construction to standard chemistry tools. Evaluation on the new BioVista benchmark of 16,457 entries from 500 papers gives an F1 score of 0.32 for full bioactivity triplets. Three applications show the output can enlarge training sets, improve specific drug-target models, and speed up complex annotation tasks.

Core claim

BioMiner is a multi-modal extraction framework that infers bioactivity semantics through direct reasoning while resolving chemical structures via a chemical-structure-grounded visual semantic reasoning paradigm, in which multi-modal large language models operate on chemically grounded visual representations to infer inter-structure relationships, and exact molecular construction is delegated to domain chemistry tools. On the BioVista benchmark it reaches an F1 score of 0.32 for bioactivity triplets and, when applied at scale, produces datasets that improve downstream performance and accelerate annotation workflows.

What carries the argument

The separation of bioactivity semantic interpretation from ligand structure construction, where multi-modal LLMs work on chemically grounded visual representations to infer relationships and domain chemistry tools perform exact molecular construction.

If this is right

Extracting 82,262 bioactivity entries from 11,683 papers creates a pre-training database that raises downstream model performance by 3.9%.
A human-in-the-loop workflow doubles the number of high-quality NLRP3 bioactivity data points, producing a 38.6% improvement over 28 QSAR models and identifying 16 hit candidates with novel scaffolds.
Annotation of protein-ligand complexes on the PoseBusters dataset runs 5.59 times faster with a 5.75% accuracy gain compared with manual workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same separation of semantics from structure building could apply to other domains where literature mixes descriptive text with figures that encode precise entities.
Large extracted bioactivity collections might shorten the delay between publication and usable training data for interaction-prediction models.
Handling Markush structures at scale suggests the approach could manage other forms of incomplete or generic chemical information common in patents and papers.

Load-bearing premise

Multi-modal large language models can reliably reconstruct exact ligand structures, including Markush forms, from visual representations without producing chemically invalid or ambiguous outputs.

What would settle it

A high rate of chemically invalid or ambiguous ligand structures in the extracted outputs, or a large mismatch with manually verified annotations on a held-out set of papers, would show the reconstruction step does not work as claimed.

Figures

Figures reproduced from arXiv: 2604.21508 by Boyan Zhang, Enhong Chen, Jiaxian Yan, Jinchuan Xiao, Jintao Zhu, Kaiyuan Gao, Kai Zhang, Qi Liu, Xukai Liu, Yuhang Yang, Zaixi Zhang.

**Figure 1.** Figure 1: Overview of protein-ligand bioactivity extraction framework BIOMINER and benchmark BIOVISTA. (a) The whole protein-ligand bioactivity extraction framework BIOMINER. (b) The chemical structure extraction agent. In this agent, explicit full structures and Markush structures are both processed. For clarification, explicit full structures are plotted in purple boxes, and Markush scaffolds and R-group substitue… view at source ↗

**Figure 2.** Figure 2: Benchmarking extraction performance of BIOMINER on BIOVISTA. (a) Performance of bioactivity triplet and individual attributes extraction. For overall extraction performance, one-shot end-to-end extraction baseline and BIOMINER w/o CSG-VSR ablation study are included additionally. (b) Performance of the structure-bioactivity annotation task. (c) Detailed error source analysis of bioactivity triplet extracti… view at source ↗

**Figure 3.** Figure 3: Using BIOMINER collecting large-scale bioactivity data for deep learning model training. (a) Time and cost analysis of BIOMINER. (b) Statistical comparison between the manually curated PDBbind v2016 dataset and our extracted dataset. (c) Number of extracted bioactivity data in each paper. Papers without any bioactivity data are excluded. (d) The top-10 protein distribution within extracted bioactivity data… view at source ↗

**Figure 4.** Figure 4: Using BIOMINER for NLRP3 bioactivity data collection from 85 papers and inhibitor screening. (a) Consuming time (about 18.4 minutes on average) and the number of collected data points for each paper. The final consuming time is highly related to the number of bioactivity data and chemical structures. (b) Comparison of pIC50 distribution between BIOMINER collected bioactivity data and ChEMBL data. Classific… view at source ↗

**Figure 5.** Figure 5: Controlled evaluation of the BIOMINER-assisted structure–bioactivity annotation on PoseBusters. (a) Study design with one blank-baseline set (42 entries) and two crossover sets (100 entries each) from 242 test cases. 4 annotators, including 2 experts and 2 novices, are organized into matched pairs. (b) Fully automated annotation performance of BIOMINER on the PoseBusters dataset. (c) Per-entry annotation t… view at source ↗

read the original abstract

Protein-ligand bioactivity data published in the literature are essential for drug discovery, yet manual curation struggles to keep pace with rapidly growing literature. Automated bioactivity extraction remains challenging because it requires not only interpreting biochemical semantics distributed across text, tables, and figures, but also reconstructing chemically exact ligand structures (e.g., Markush structures). To address this bottleneck, we introduce BioMiner, a multi-modal extraction framework that explicitly separates bioactivity semantic interpretation from ligand structure construction. Within BioMiner, bioactivity semantics are inferred through direct reasoning, while chemical structures are resolved via a chemical-structure-grounded visual semantic reasoning paradigm, in which multi-modal large language models operate on chemically grounded visual representations to infer inter-structure relationships, and exact molecular construction is delegated to domain chemistry tools. For rigorous evaluation and method development, we further establish BioVista, a comprehensive benchmark comprising 16,457 bioactivity entries curated from 500 publications. BioMiner validates its extraction ability and provides a quantitative baseline, achieving an F1 score of 0.32 for bioactivity triplets. BioMiner's practical utility is demonstrated via three applications: (1) extracting 82,262 data from 11,683 papers to build a pre-training database that improves downstream models performance by 3.9%; (2) enabling a human-in-the-loop workflow that doubles the number of high-quality NLRP3 bioactivity data, helping 38.6% improvement over 28 QSAR models and identification of 16 hit candidates with novel scaffolds; and (3) accelerating protein-ligand complex bioactivity annotation, achieving a 5.59-fold speed increase and 5.75% accuracy improvement over manual workflows in PoseBusters dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BioMiner splits bioactivity semantics from ligand structure extraction using multi-modal visual reasoning plus chemistry tools, and ships a new 16k-entry benchmark, but the 0.32 F1 and missing structure-error checks leave the reliability open.

read the letter

The paper's main move is to treat ligand structure reconstruction as a separate visual-semantic step: multi-modal models work on chemically grounded figure representations to figure out relationships, then standard chemistry tools finish the exact molecules, including Markush cases. They also release BioVista, a benchmark of 16,457 entries from 500 papers, and run the system on 11k papers to pull 82k data points for downstream use.

Referee Report

4 major / 2 minor

Summary. The paper introduces BioMiner, a multi-modal extraction framework for protein-ligand bioactivity data from literature that separates semantic interpretation of bioactivity (via direct reasoning) from ligand structure construction (via chemically grounded visual semantic reasoning with multi-modal LLMs followed by domain chemistry tools for exact molecular construction, including Markush forms). It establishes the BioVista benchmark with 16,457 curated entries from 500 publications and reports an F1 score of 0.32 for bioactivity triplet extraction as a quantitative baseline. Practical utility is shown through three applications: extracting 82,262 entries from 11,683 papers to improve downstream QSAR models by 3.9%; a human-in-the-loop workflow that doubles high-quality NLRP3 data and yields 38.6% improvement over 28 QSAR models plus 16 novel-scaffold hits; and a 5.59-fold speedup with 5.75% accuracy gain over manual annotation on the PoseBusters dataset.

Significance. If the extraction reliability holds, BioMiner addresses a key bottleneck in scaling bioactivity data curation for drug discovery. The creation of the BioVista benchmark is a clear strength as a community resource for method development, and the three applications provide concrete, falsifiable demonstrations of downstream value with specific quantitative gains (3.9% model improvement, doubled NLRP3 data, 5.59x annotation speedup). The design choice to delegate exact structure construction to chemistry tools after visual reasoning is a positive step toward reducing invalid outputs.

major comments (4)

[§4] §4 (BioVista benchmark and evaluation): The reported F1 of 0.32 for bioactivity triplets is presented without any baseline comparisons to prior extraction methods, ablations of the multi-modal components, or error analysis; this makes it impossible to determine whether the score reflects meaningful progress or the inherent difficulty of the task.
[§3] Abstract and §3 (chemical-structure-grounded visual semantic reasoning): No quantitative breakdown of structure-level errors (e.g., invalid SMILES, ambiguous Markush interpretations, or stereochemistry failures) is provided, nor is there a dedicated validation subset for these cases; this assumption is load-bearing for the claim that the separation of semantics from structure construction avoids propagating chemically invalid data into the extracted database and all downstream applications.
[§4.1] BioVista curation description (likely §4.1): Inter-annotator agreement is not reported, and there are no details on how chemically invalid or ambiguous structures were detected and resolved during benchmark creation; without these, the reliability of the 16,457-entry ground truth cannot be assessed.
[§5] §5 (applications): The reported gains (3.9% QSAR improvement, 38.6% over 28 models on NLRP3, 5.59x speedup on PoseBusters) lack any analysis of how potential structure reconstruction errors would propagate into the pre-training set or human-in-the-loop results; this is required to substantiate that the extracted data are sufficiently clean for the claimed benefits.

minor comments (2)

[Figure 1] The pipeline diagram (Figure 1) would benefit from explicit arrows or labels distinguishing the semantic-reasoning path from the visual-structure path to clarify the core separation.
[§2] Notation for bioactivity triplets (e.g., protein-ligand-activity) should be defined consistently in the first use in §2 or §3 to avoid ambiguity for readers outside the immediate subfield.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for their thorough and constructive review. We address each major comment point by point below and indicate where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [§4] §4 (BioVista benchmark and evaluation): The reported F1 of 0.32 for bioactivity triplets is presented without any baseline comparisons to prior extraction methods, ablations of the multi-modal components, or error analysis; this makes it impossible to determine whether the score reflects meaningful progress or the inherent difficulty of the task.

Authors: We agree that baseline comparisons and ablations would better situate the F1 score. In the revised manuscript we will add comparisons against prior bioactivity extraction systems, ablations isolating the multi-modal visual reasoning and chemistry-tool components, and a detailed error analysis of triplet extraction failures to clarify the sources of difficulty. revision: yes
Referee: [§3] Abstract and §3 (chemical-structure-grounded visual semantic reasoning): No quantitative breakdown of structure-level errors (e.g., invalid SMILES, ambiguous Markush interpretations, or stereochemistry failures) is provided, nor is there a dedicated validation subset for these cases; this assumption is load-bearing for the claim that the separation of semantics from structure construction avoids propagating chemically invalid data into the extracted database and all downstream applications.

Authors: We acknowledge the need for explicit quantification. We will add a dedicated validation subset analysis reporting rates of invalid SMILES, Markush ambiguity, and stereochemistry errors, together with evidence that delegating exact construction to chemistry tools limits propagation of these errors into the final database. revision: yes
Referee: [§4.1] BioVista curation description (likely §4.1): Inter-annotator agreement is not reported, and there are no details on how chemically invalid or ambiguous structures were detected and resolved during benchmark creation; without these, the reliability of the 16,457-entry ground truth cannot be assessed.

Authors: We will expand §4.1 with a full description of the curation protocol, including how chemically invalid or ambiguous structures were identified and resolved by expert annotators. Inter-annotator agreement was not formally computed during the original curation; we will therefore describe the consensus process in detail rather than retroactively reporting agreement statistics. revision: partial
Referee: [§5] §5 (applications): The reported gains (3.9% QSAR improvement, 38.6% over 28 models on NLRP3, 5.59x speedup on PoseBusters) lack any analysis of how potential structure reconstruction errors would propagate into the pre-training set or human-in-the-loop results; this is required to substantiate that the extracted data are sufficiently clean for the claimed benefits.

Authors: We agree that error-propagation analysis is necessary to support the downstream claims. In the revised §5 we will include a sensitivity analysis examining how plausible rates of structure reconstruction errors would affect the reported QSAR improvements, the NLRP3 human-in-the-loop results, and the PoseBusters annotation speedup. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results on independent benchmark and external downstream tasks

full rationale

The paper describes an empirical multi-modal extraction system evaluated via F1 on a newly curated BioVista benchmark (16,457 entries from 500 publications) and reports gains on separate external tasks (pre-training set improving QSAR models, NLRP3 human-in-loop, PoseBusters annotation speedup). No mathematical derivations, equations, or self-referential definitions appear; performance metrics are measured against held-out or external data rather than reducing to fitted inputs or self-citation chains by construction. The framework's separation of semantics and structure is a stated design choice, not a tautological redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is an applied information-extraction system; it introduces no mathematical axioms, free parameters fitted to the target result, or new physical entities.

pith-pipeline@v0.9.0 · 5653 in / 1218 out tokens · 45351 ms · 2026-05-09T21:28:44.130719+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 4 canonical work pages · 2 internal anchors

[1]

Gaulton, A.et al.ChEMBL: a large-scale bioactivity database for drug discovery.Nucleic Acids Res.40, D1100–D1107 (2012)

2012
[2]

Zhang, Z.et al.Structure-based drug design with geometric deep learning: a comprehensive survey.ACM Comput. Surv. 58, 1–35 (2025)

2025
[3]

& Cicho ´nska, A

Theisen, R., Wang, T., Ravikumar, B., Rahman, R. & Cicho ´nska, A. Leveraging multiple data types for improved compound-kinase bioactivity prediction.Nat. Commun.15, 7596 (2024)

2024
[4]

Commun.15, 10223 (2024)

Lai, H.et al.Interformer: an interaction-aware model for protein-ligand docking and affinity prediction.Nat. Commun.15, 10223 (2024)

2024
[5]

M.et al.DeepDTAGen: a multitask deep learning framework for drug-target affinity prediction and target-aware drugs generation.Nat

Shah, P. M.et al.DeepDTAGen: a multitask deep learning framework for drug-target affinity prediction and target-aware drugs generation.Nat. Commun.16, 5021 (2025)

2025
[6]

Lu, Z.et al.DTIAM: a unified framework for predicting drug-target interactions, binding affinities and drug mechanisms. Nat. Commun.16, 2548 (2025)

2025
[7]

Commun.12, 6775 (2021)

Ye, Q.et al.A unified drug–target interaction prediction framework based on knowledge graph and recommendation system.Nat. Commun.12, 6775 (2021)

2021
[8]

Y ., Nguyen, A

Koh, H. Y ., Nguyen, A. T. N., Pan, S., May, L. T. & Webb, G. I. Physicochemical graph neural network for learning protein-ligand interaction fingerprints from sequence data.Nat. Mach. Intell.6, 673–687 (2024)

2024
[9]

& Bajorath, J

Mastropietro, A., Pasculli, G. & Bajorath, J. Learning characteristics of graph neural networks predicting protein-ligand affinities.Nat. Mach. Intell.5, 1427–1436 (2023)

2023
[10]

X., Liu, Q

Zhang, Z., Shen, W. X., Liu, Q. & Zitnik, M. Efficient generation of protein pockets with PocketGen.Nat. Mach. Intell.6, 1382–1395 (2024). 11.Feng, B.et al.A bioactivity foundation model using pairwise meta-learning.Nat. Mach. Intell.6, 962–974 (2024)

2024
[11]

Protoc.17, 672–697 (2022)

Gentile, F.et al.Artificial intelligence–enabled virtual screening of ultra-large chemical libraries with deep docking.Nat. Protoc.17, 672–697 (2022)

2022
[12]

Cao, D.et al.Generic protein-ligand interaction scoring by integrating physical prior knowledge and data augmentation modelling.Nat. Mach. Intell.6, 688–700 (2024)

2024
[13]

15.Liu, T.et al.BindingDB in 2024: a FAIR knowledgebase of protein-small molecule binding data.Nucleic Acids Res.53, D1633–D1644 (2025)

Zdrazil, B.et al.The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods.Nucleic Acids Res.52, D1180–D1192 (2024). 15.Liu, T.et al.BindingDB in 2024: a FAIR knowledgebase of protein-small molecule binding data.Nucleic Acids Res.53, D1633–D1644 (2025)

2023
[14]

Liu, Z.et al.PDB-wide collection of binding data: current status of the PDBbind database.Bioinformatics31, 405–412 (2015)

2015
[15]

Lan, T.et al.Generating mutants of monotone affinity towards stronger protein complexes through adversarial learning. Nat. Mach. Intell.6, 315–325 (2024)

2024
[16]

& Zeng, X

Song, B., Li, F., Liu, Y . & Zeng, X. Deep learning methods for biomedical named entity recognition: a survey and qualitative comparison.Brief. Bioinform.22, bbab282 (2021)

2021
[17]

Commun.15, 1418 (2024)

Dagdelen, J.et al.Structured information extraction from scientific text with large language models.Nat. Commun.15, 1418 (2024)

2024
[18]

I., Yu, F

Morin, L., Weber, V ., Meijer, G. I., Yu, F. & Staar, P. W. J. PatCID: an open-access dataset of chemical structures in patent documents.Nat. Commun.15, 6532 (2024)

2024
[19]

Zheng, Z., Zhang, O., Borgs, C., Chayes, J. T. & Yaghi, O. M. ChatGPT chemistry assistant for text mining and the prediction of MOF synthesis.J. Am. Chem. Soc.145, 18048–18062 (2023)

2023
[20]

& Moghe, G

Smith, N., Yuan, X., Melissinos, C. & Moghe, G. FuncFetch: an LLM-assisted workflow enables mining thousands of enzyme–substrate interactions from published manuscripts.Bioinformatics41, btae756 (2025)

2025
[21]

& Kim, J

Kang, Y . & Kim, J. ChatMOF: an artificial intelligence system for predicting and generating metal-organic frameworks using large language models.Nat. Commun.15, 4705 (2024). 24.Simmons, E. S. Markush structure searching over the years.World Pat. Inf.25, 195–202 (2003). 18/20

2024
[22]

Su, M.et al.Comparative assessment of scoring functions: The CASF-2016 update.J. Chem. Inf. Model.59, 895–913 (2018)

2016
[23]

B.et al.CSAR data set release 2012: ligands, affinities, complexes, and docking decoys.J

Dunbar Jr, J. B.et al.CSAR data set release 2012: ligands, affinities, complexes, and docking decoys.J. Chem. Inf. Model. 53, 1842–1852 (2013)

2012
[24]

V ., Deng, M

Swanson, K. V ., Deng, M. & Ting, J. P.-Y . The NLRP3 inflammasome: molecular activation and regulation to therapeutics. Nat. Rev. Immunol.19, 477–489 (2019). 28.ChemDiv, available: https://www.chemdiv.com/ (ChemDiv, 2023). 29.Enamine, available: https://enamine.net/ (Enamine, 2023)

2019
[25]

Buttenschoen, M., Morris, G. M. & Deane, C. M. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences.Chem. Sci.15, 3130–3139 (2023)

2023
[26]

Mineru: An open-source solution for precise document content extraction.arXiv preprint arXiv:2409.18839, 2024

Wang, B.et al.MinerU: an open-source solution for precise document content extraction.arXiv preprint arXiv:2409.18839 (2024). 32.Bai, S.et al.Qwen3-VL technical report.arXiv preprint arXiv:2511.21631(2025)

work page arXiv 2024
[27]

MolDetv2: a smaller, faster, and more powerful molecular detection model.Hugging Facehttps: //huggingface.co/UniParser/MolDetv2 (2025)

Uni-Parser Team. MolDetv2: a smaller, faster, and more powerful molecular detection model.Hugging Facehttps: //huggingface.co/UniParser/MolDetv2 (2025)

2025
[28]

M., Corbett, P

Lowe, D. M., Corbett, P. T., Murray-Rust, P. & Glen, R. C. Chemical name to structure: OPSIN, an open source solution. J. Chem. Inf. Model.51, 739–753 (2011)

2011
[29]

GPT-4o System Card

Lombard, M., Snyder-Duch, J. & Bracken, C. C. Content analysis in mass communication: assessment and reporting of intercoder reliability.Hum. Commun. Res.28, 587–604 (2002). 36.Hurst, A.et al.GPT-4o system card.arXiv preprint arXiv:2410.21276(2024). 37.Anthropic. Claude haiku 4.5 system card. https://www.anthropic.com/claude-haiku-4-5-system-card (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2002
[30]

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Bai, J.et al.Qwen-VL: a versatile vision-language model for understanding, localization, text reading, and beyond.arXiv preprint arXiv:2308.12966(2023). 39.Grok, available: https://grok.x.ai/ (xAI, 2023)

work page internal anchor Pith review arXiv 2023
[31]

Varadi, M.et al.AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences.Nucleic Acids Res.52, D368–D375 (2024)

2024
[32]

K.et al.Updated resources for exploring experimentally-determined PDB structures and Computed Structure Models at the RCSB Protein Data Bank.Nucleic Acids Res.53, D564–D574 (2025)

Burley, S. K.et al.Updated resources for exploring experimentally-determined PDB structures and Computed Structure Models at the RCSB Protein Data Bank.Nucleic Acids Res.53, D564–D574 (2025)

2025
[33]

Deep Learning is Robust to Massive Label Noise

Rolnick, D., Veit, A., Belongie, S. & Shavit, N. Deep learning is robust to massive label noise.arXiv preprint arXiv:1705.10694(2017). 43.Velickovic, P.et al.Graph attention networks. InProc. 6th Int. Conf. on Learning Representations(2018)

work page Pith review arXiv 2017
[34]

G., Hoogeboom, E

Satorras, V . G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. InProc. 38th Int. Conf. on Machine Learning, vol. 139, 9323–9332 (2021)

2021
[35]

Xiong, Z.et al.Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J. Med. Chem.63, 8749–8760 (2020)

2020
[36]

Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. InProc. 5th Int. Conf. on Learning Representations(2017). 47.Breiman, L. Random forests.Mach. Learn.45, 5–32 (2001)

2017
[37]

A., Dumais, S

Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J. & Scholkopf, B. Support vector machines.IEEE Intell. Syst. & Their Appl.13, 18–28 (1998). 49.Rogers, D. & Hahn, M. Extended-connectivity fingerprints.J. Chem. Inf. Model.50, 742–754 (2010)

1998
[38]

Inform.32, 133–138 (2013)

Reutlinger, M.et al.Chemically advanced template search (CATS) for scaffold-hopping and prospective target prediction for ‘orphan’ molecules.Mol. Inform.32, 133–138 (2013)

2013
[39]

C.et al.MCC950 directly targets the NLRP3 ATP-hydrolysis motif for inflammasome inhibition.Nat

Coll, R. C.et al.MCC950 directly targets the NLRP3 ATP-hydrolysis motif for inflammasome inhibition.Nat. Chem. Biol. 15, 556–559 (2019)

2019
[40]

Velcicky, J.et al.Discovery of potent, orally bioavailable, tricyclic NLRP3 inhibitors.J. Med. Chem.67, 1544–1562 (2024). 19/20

2024
[41]

J.et al.LoRA: low-rank adaptation of large language models

Hu, E. J.et al.LoRA: low-rank adaptation of large language models. InProc. 10th Int. Conf. on Learning Representations (2022)

2022
[42]

& Zettlemoyer, L

Dettmers, T., Pagnoni, A., Holtzman, A. & Zettlemoyer, L. QLoRA: efficient finetuning of quantized LLMs.Adv. Neural Inf. Process. Syst.36, 10088–10115 (2023). 20/20

2023