pith. sign in

arxiv: 2606.02902 · v1 · pith:PCQV7MXDnew · submitted 2026-06-01 · 💻 cs.CY · cs.LG

Fairness Definitions and Metrics in Deep Reinforcement Learning for Drug Discovery in Healthcare: A Rapid Evidence Review

Pith reviewed 2026-06-28 12:04 UTC · model grok-4.3

classification 💻 cs.CY cs.LG
keywords fairness definitionsdeep reinforcement learningdrug discoverymolecule generationdistribution parityoutcome parityreward designcancer targets
0
0 comments X

The pith

A rapid evidence review assembles fairness definitions and metrics for deep reinforcement learning in de novo molecule generation, centered on distribution and outcome parity across cancer and non-cancer targets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how dataset composition, split strategies, and reward designs in deep reinforcement learning for drug candidate generation can produce uneven results across disease areas and chemical structures. It synthesizes fairness definitions and metrics from existing studies to track these imbalances, particularly parity in physicochemical properties, scaffold diversity, validity, toxicity, and synthetic accessibility. A sympathetic reader would care because such imbalances could lead to AI tools that underperform for certain cancer subtypes or chemotypes, affecting equitable healthcare applications. The review analyzes links between choices like scaffold versus random splits and rewards such as QED or docking scores and observed parity effects. It supplies practical guidance on reporting these parities while noting remaining gaps in trustworthy generation.

Core claim

The review establishes that fairness in DRL molecule generation is captured through metrics of distribution parity in key descriptors and chemotype diversity plus outcome parity in groupwise validity, toxicity, and synthetic accessibility, with emphasis on cancer versus non-cancer indications. Through PRISMA-style screening and content coding of literature from 2017 onward, the work links these parity outcomes directly to dataset split strategies and reward components, yielding a concise set of definitions and metrics along with guidance for their reporting in future evaluations.

What carries the argument

Content coding of screened studies that maps reported parity outcomes to dataset composition, split strategies such as scaffold versus random, and reward designs including QED, docking, toxicity, and synthetic accessibility.

If this is right

  • Researchers gain concrete guidance for reporting distribution parity and outcome parity when evaluating DRL molecule generators.
  • Dataset split strategies and reward designs can be directly related to observed parity effects in cancer-relevant generation.
  • Metrics should separately track parity across cancer versus non-cancer indications and within subtypes.
  • Open gaps remain in extending these fairness considerations to fully trustworthy DRL applications in drug discovery.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The metrics could be retroactively applied to existing published DRL models to uncover previously undetected biases in underrepresented targets.
  • The synthesis approach might generalize to fairness assessment in other machine learning methods used for molecular design.
  • Prospective experiments that optimize models under the proposed parity metrics could test whether they lead to more balanced performance on held-out chemotypes.

Load-bearing premise

The database searches from 2017 onward combined with PRISMA-style screening and content coding accurately capture the relevant studies and correctly connect parity outcomes to dataset and reward choices without selection bias.

What would settle it

A controlled test of DRL models showing that the identified parity metrics fail to detect systematic differences in generated molecule properties across cancer subtypes when dataset splits or rewards are varied would undermine the claimed utility of the metrics.

Figures

Figures reproduced from arXiv: 2606.02902 by Behrouz Far, Esmaeil Shakeri, Ronnie de Souza Santos.

Figure 1
Figure 1. Figure 1: Distribution of included studies by publication source. capture multidisciplinary and applied work in computational chemistry and healthcare AI. We also searched journal col￾lections from the Nature portfolio, JMIR, and ACS to ensure coverage of high-impact interdisciplinary and translational research. Finally, arXiv was screened to identify recent preprints in RL and responsible AI; however, preprints wer… view at source ↗
Figure 3
Figure 3. Figure 3: presents the yearly distribution of the included studies over 2017–2025 and the corresponding fitted linear trend. Overall, the evidence base expands over time. The early period (2017–2019) shows limited and relatively stable output (1–2 studies per year), followed by a prolonged low￾activity interval during 2020–2022 (approximately one study per year). In contrast, publication activity increases sharply f… view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of the studies by the 7 most countries. diverse research contribution [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: provides a global overview of the geographic cov￾erage of the included studies, with countries contributing at least one study highlighted on the world map. As illustrated, the research footprint spans North America, Europe, and Asia, demonstrating that the topic has attracted international attention. Contributions are geographically dispersed across multiple regions rather than confined to a single contin… view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of type of publication. G. Types of Applications [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
read the original abstract

Deep reinforcement learning (DRL) is increasingly applied to de novo molecular design, but choices in data, rewards, and evaluation can yield uneven performance across disease areas and chemotypes. Despite this, there is no concise synthesis of how fairness is defined, measured, and tested in DRL-based drug discovery. In this rapid evidence review, we synthesize fairness definitions and metrics for DRL-driven molecule generation in healthcare. We focus on three questions: (i) how dataset composition and split strategies, especially scaffold versus random splits, affect evaluation and distribution shift; (ii) how reward design (e.g., QED, docking, toxicity, synthetic accessibility) can create or mitigate bias, with emphasis on cancer targets; and (iii) which measurable metrics best capture fairness. This includes parity across cancer versus non-cancer indications and across cancer subtypes. It also includes distributional balance in key physicochemical descriptors, scaffold/chemotype diversity, groupwise validity, toxicity, and synthetic accessibility. From 2017 onward, we searched major biomedical, computer science, and engineering literature databases and used arXiv for horizon scanning. Records were screened using PRISMA-style procedures and analyzed via content coding to link reported parity outcomes to dataset and reward choices. Our review provides a concise set of fairness definitions and metrics for DRL molecule generation. It offers practical guidance for reporting distribution parity and outcome parity. It also summarizes how dataset and reward choices relate to observed parity effects and identifies open gaps relevant to trustworthy, cancer-relevant DRL generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript is a rapid evidence review synthesizing fairness definitions and metrics in deep reinforcement learning (DRL) for de novo molecular design in drug discovery. It addresses three questions: effects of dataset composition and split strategies (scaffold vs. random) on distribution shift; how reward designs (QED, docking, toxicity, synthetic accessibility) create or mitigate bias especially for cancer targets; and which metrics best capture fairness including parity across cancer indications, physicochemical descriptors, scaffold diversity, validity, toxicity, and accessibility. The review uses PRISMA-style screening of literature from 2017 onward across major databases plus arXiv, followed by content coding to link reported parity outcomes to design choices, and concludes with concise definitions/metrics, reporting guidance for distribution and outcome parity, summaries of choice-outcome relations, and gap identification.

Significance. If the synthesis holds and accurately maps the (small) literature without coverage or coding bias, the work would supply usable practical guidance for reporting parity in DRL molecule generation, directly supporting more trustworthy AI applications in healthcare. The PRISMA-style process and explicit linkage of parity effects to concrete choices (splits, rewards) represent a strength for systematic, reproducible synthesis in this emerging niche.

major comments (1)
  1. [Abstract] Abstract (search and screening description): The PRISMA-style literature search from 2017+ and content coding are presented at a high level, but the manuscript provides no quantitative details on records identified, screened, or included after eligibility assessment. In this narrow, rapidly growing field, absence of these figures directly undermines evaluation of whether the synthesis captured essentially all relevant DRL drug-discovery papers and correctly associated parity outcomes with dataset/reward decisions without selection or interpretation bias, which is load-bearing for the claimed guidance and gap identification.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments on our rapid evidence review. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract (search and screening description): The PRISMA-style literature search from 2017+ and content coding are presented at a high level, but the manuscript provides no quantitative details on records identified, screened, or included after eligibility assessment. In this narrow, rapidly growing field, absence of these figures directly undermines evaluation of whether the synthesis captured essentially all relevant DRL drug-discovery papers and correctly associated parity outcomes with dataset/reward decisions without selection or interpretation bias, which is load-bearing for the claimed guidance and gap identification.

    Authors: We agree that the absence of quantitative search and screening figures in the abstract limits transparency and the ability to assess coverage in this emerging area. The revised manuscript will incorporate the specific numbers of records identified, screened, and included (along with a PRISMA-style flow summary) directly into the abstract and expand the methods section with the full flow diagram or table. This addresses the concern without altering the scope or conclusions. revision: yes

Circularity Check

0 steps flagged

No circularity: pure literature synthesis with no derivations or predictions

full rationale

This is a rapid evidence review paper that performs PRISMA-style literature search, screening, and content coding to synthesize fairness definitions and metrics from existing DRL drug-discovery studies. It contains no equations, no fitted parameters, no predictions, and no derivations. The central claim is a curated summary and practical guidance extracted from external papers; nothing reduces by construction to the authors' own inputs, self-citations, or ansatzes. The search methodology is presented as an independent process whose validity rests on external benchmarks (database coverage, PRISMA standards) rather than internal self-reference.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that the chosen search strategy and coding process yield an unbiased and representative sample of the literature; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption PRISMA-style screening procedures are appropriate and sufficient for identifying relevant records on fairness in DRL-based drug discovery
    The abstract states that records were screened using PRISMA-style procedures.

pith-pipeline@v0.9.1-grok · 5816 in / 1225 out tokens · 26761 ms · 2026-06-28T12:04:39.987544+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 2 canonical work pages · 2 internal anchors

  1. [1]

    Deep reinforcement learning for de novo drug design,

    M. Popova, O. Isayev, and A. Tropsha, “Deep reinforcement learning for de novo drug design,”Science advances, vol. 4, no. 7, eaap7885, 2018

  2. [2]

    Molecular de-novo design through deep reinforcement learning,

    M. Olivecrona, T. Blaschke, O. Engkvist, and H. Chen, “Molecular de-novo design through deep reinforcement learning,”Journal of cheminformatics, vol. 9, pp. 1–14, 2017

  3. [3]

    Accelerating drug discovery with deep reinforcement learning: Molecular generation using deep q-network,

    E. Shakeri and B. Far, “Accelerating drug discovery with deep reinforcement learning: Molecular generation using deep q-network,” in2025 IEEE Inter- national Conference on Information Reuse and Integration and Data Science (IRI), IEEE, 2025, pp. 91–96

  4. [4]

    Principles of early drug discovery,

    J. P. Hughes, S. Rees, S. B. Kalindjian, and K. L. Philpott, “Principles of early drug discovery,”British journal of pharmacology, vol. 162, no. 6, pp. 1239– 1249, 2011

  5. [5]

    How to improve r&d productivity: The pharmaceutical industry’s grand challenge,

    S. M. Paul et al., “How to improve r&d productivity: The pharmaceutical industry’s grand challenge,”Nature reviews Drug discovery, vol. 9, no. 3, pp. 203–214, 2010

  6. [6]

    Innovation in pharma: New r&d cost estimates,

    J. A. DiMasi, H. G. Grabowski, and R. W. Hansen, “Innovation in pharma: New r&d cost estimates,”J. Health Econ., vol. 47, pp. 20–33, 2016

  7. [7]

    Ex- ploring software fairness debt in gray literature,

    R. Sotolani, S. Freire, F. Fronchetti, R. de Souza Santos, and R. Spinola, “Ex- ploring software fairness debt in gray literature,” inEuromicro Conference on Software Engineering and Advanced Applications, Springer, 2025, pp. 85–104

  8. [8]

    A framework for understanding sources of harm throughout the machine learning life cycle,

    H. Suresh and J. Guttag, “A framework for understanding sources of harm throughout the machine learning life cycle,” inProceedings of the 1st ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimiza- tion, 2021, pp. 1–9

  9. [9]

    A survey on bias and fairness in machine learning,

    N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan, “A survey on bias and fairness in machine learning,”ACM computing surveys (CSUR), vol. 54, no. 6, pp. 1–35, 2021

  10. [10]

    A comparative study of fairness-enhancing inter- ventions in machine learning,

    S. A. Friedler, C. Scheidegger, S. Venkatasubramanian, S. Choudhary, E. P. Hamilton, and D. Roth, “A comparative study of fairness-enhancing inter- ventions in machine learning,” inProceedings of the conference on fairness, accountability, and transparency, 2019, pp. 329–338

  11. [11]

    Barocas, M

    S. Barocas, M. Hardt, and A. Narayanan,Fairness and machine learning: Limitations and opportunities. MIT press, 2023

  12. [12]

    Fairness through awareness,

    C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness through awareness,” inProceedings of the 3rd innovations in theoretical computer science conference, 2012, pp. 214–226

  13. [13]

    Inherent Trade-Offs in the Fair Determination of Risk Scores

    J. Kleinberg, S. Mullainathan, and M. Raghavan, “Inherent trade-offs in the fair determination of risk scores,”arXiv preprint arXiv:1609.05807, 2016

  14. [14]

    Model cards for model reporting,

    M. Mitchell et al., “Model cards for model reporting,” inProceedings of the conference on fairness, accountability, and transparency, 2019, pp. 220–229

  15. [15]

    Algorithmic fairness in artificial intelligence for medicine and healthcare,

    R. J. Chen et al., “Algorithmic fairness in artificial intelligence for medicine and healthcare,”Nature biomedical engineering, vol. 7, no. 6, pp. 719–742, 2023

  16. [16]

    Fairness of artificial intelligence in healthcare: Review and recommendations,

    D. Ueda et al., “Fairness of artificial intelligence in healthcare: Review and recommendations,”Japanese journal of radiology, vol. 42, no. 1, pp. 3–15, 2024

  17. [17]

    The properties of known drugs. 1. molecular frameworks,

    G. W. Bemis and M. A. Murcko, “The properties of known drugs. 1. molecular frameworks,”Journal of medicinal chemistry, vol. 39, no. 15, pp. 2887–2893, 1996

  18. [18]

    On calibration of modern neural networks,

    C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inInternational conference on machine learning, PMLR, 2017, pp. 1321–1330

  19. [19]

    Predicting with confidence: Using conformal prediction in drug discovery,

    J. Alvarsson, S. A. McShane, U. Norinder, and O. Spjuth, “Predicting with confidence: Using conformal prediction in drug discovery,”Journal of Phar- maceutical Sciences, vol. 110, no. 1, pp. 42–49, 2021

  20. [20]

    Concrete Problems in AI Safety

    D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman, and D. Man ´e, “Concrete problems in ai safety,”arXiv preprint arXiv:1606.06565, 2016

  21. [21]

    Guacamol: Benchmarking models for de novo molecular design,

    N. Brown, M. Fiscato, M. H. Segler, and A. C. Vaucher, “Guacamol: Benchmarking models for de novo molecular design,”Journal of chemical information and modeling, vol. 59, no. 3, pp. 1096–1108, 2019

  22. [22]

    Generative models should at least be able to design molecules that dock well: A new benchmark,

    T. Cieplinski, T. Danel, S. Podlewska, and S. Jastrzebski, “Generative models should at least be able to design molecules that dock well: A new benchmark,” Journal of Chemical Information and Modeling, vol. 63, no. 11, pp. 3238–3247, 2023

  23. [23]

    Diverse hits in de novo molecule design: Diversity-based comparison of goal-directed generators,

    P. Renz, S. Luukkonen, and G. Klambauer, “Diverse hits in de novo molecule design: Diversity-based comparison of goal-directed generators,”Journal of Chemical Information and Modeling, vol. 64, no. 15, pp. 5756–5761, 2024

  24. [24]

    Algorithmic fairness and bias mitigation for clinical machine learning with deep reinforcement learning,

    J. Yang, A. A. Soltan, D. W. Eyre, and D. A. Clifton, “Algorithmic fairness and bias mitigation for clinical machine learning with deep reinforcement learning,” Nature Machine Intelligence, vol. 5, no. 8, pp. 884–894, 2023

  25. [25]

    Bias in reinforcement learning: A review in healthcare applications,

    B. Smith, A. Khojandi, and R. Vasudevan, “Bias in reinforcement learning: A review in healthcare applications,”ACM Computing Surveys, vol. 56, no. 2, pp. 1–17, 2023

  26. [26]

    Drl-based control of chemo-drug dose in cancer treatment,

    H. Mashayekhi, M. Nazari, F. Jafarinejad, and N. Meskin, “Drl-based control of chemo-drug dose in cancer treatment,”Comput. Methods Programs Biomed., vol. 243, p. 107 884, 2024

  27. [27]

    Applications of machine learning in drug discovery and development,

    J. Vamathevan et al., “Applications of machine learning in drug discovery and development,”Nature reviews Drug discovery, vol. 18, no. 6, pp. 463–477, 2019

  28. [28]

    Assessing the impact of generative ai on medicinal chemistry,

    W. P. Walters and M. Murcko, “Assessing the impact of generative ai on medicinal chemistry,”Nature biotechnology, vol. 38, no. 2, pp. 143–145, 2020

  29. [29]

    Cochrane rapid reviews methods group offers evidence- informed guidance to conduct rapid reviews,

    C. Garritty et al., “Cochrane rapid reviews methods group offers evidence- informed guidance to conduct rapid reviews,”Journal of clinical epidemiology, vol. 130, pp. 13–22, 2021

  30. [30]

    A. C. Tricco, E. V . Langlois, S. E. Straus, et al.,Rapid reviews to strengthen health policy and systems: a practical guide. World Health Organization Geneva, 2017

  31. [31]

    The prisma 2020 statement: An updated guideline for reporting systematic reviews,

    M. J. Page et al., “The prisma 2020 statement: An updated guideline for reporting systematic reviews,”bmj, vol. 372, 2021

  32. [32]

    A scoping review of rapid review methods,

    A. C. Tricco et al., “A scoping review of rapid review methods,”BMC medicine, vol. 13, no. 1, p. 224, 2015

  33. [33]

    Krippendorff,Content analysis: An introduction to its methodology

    K. Krippendorff,Content analysis: An introduction to its methodology. Sage publications, 2018

  34. [34]

    Qualitative methods in empirical studies of software engineer- ing,

    C. B. Seaman, “Qualitative methods in empirical studies of software engineer- ing,”IEEE Transactions on software engineering, vol. 25, no. 4, pp. 557–572, 1999

  35. [35]

    Rapid reviews in software engineering,

    B. Cartaxo, G. Pinto, and S. Soares, “Rapid reviews in software engineering,” inContemporary Empirical Methods in Software Engineering, Springer, 2020, pp. 357–384

  36. [36]

    Interrater reliability: The kappa statistic,

    M. L. McHugh, “Interrater reliability: The kappa statistic,”Biochemia medica, vol. 22, no. 3, pp. 276–282, 2012

  37. [37]

    Generative models for molecular discovery: Recent advances and challenges,

    C. Bilodeau, W. Jin, T. Jaakkola, R. Barzilay, and K. F. Jensen, “Generative models for molecular discovery: Recent advances and challenges,”Wiley Interdisciplinary Reviews: Computational Molecular Science, vol. 12, no. 5, e1608, 2022

  38. [38]

    Optimization of molecules via deep reinforcement learning,

    Z. Zhou, S. Kearnes, L. Li, R. N. Zare, and P. Riley, “Optimization of molecules via deep reinforcement learning,”Scientific reports, vol. 9, no. 1, p. 10 752, 2019

  39. [39]

    Diffmeta-rl: Reinforcement learning-guided graph diffusion for metabolically stable molecular generation,

    D. Liu et al., “Diffmeta-rl: Reinforcement learning-guided graph diffusion for metabolically stable molecular generation,”Journal of Chemical Information and Modeling, 2025

  40. [40]

    Utilizing reinforcement learning for de novo drug design,

    H. Gummesson Svensson, C. Tyrchan, O. Engkvist, and M. Haghir Chehreghani, “Utilizing reinforcement learning for de novo drug design,” Machine Learning, vol. 113, no. 7, pp. 4811–4843, 2024

  41. [41]

    Evaluation of reinforcement learning in transformer-based molecular design,

    J. He et al., “Evaluation of reinforcement learning in transformer-based molecular design,”Journal of Cheminformatics, vol. 16, no. 1, p. 95, 2024

  42. [42]

    Umap-based clustering split for rigorous evaluation of ai models for virtual screening on cancer cell lines,

    Q. Guo, S. Hernandez-Hernandez, and P. J. Ballester, “Umap-based clustering split for rigorous evaluation of ai models for virtual screening on cancer cell lines,”Journal of Cheminformatics, vol. 17, no. 1, p. 94, 2025

  43. [43]

    Graph convolutional policy network for goal-directed molecular graph generation,

    J. You, B. Liu, Z. Ying, V . Pande, and J. Leskovec, “Graph convolutional policy network for goal-directed molecular graph generation,”Advances in neural information processing systems, vol. 31, 2018

  44. [44]

    De novo drug design by iterative mul- tiobjective deep reinforcement learning with graph-based molecular quality assessment,

    Y . Fang, X. Pan, and H.-B. Shen, “De novo drug design by iterative mul- tiobjective deep reinforcement learning with graph-based molecular quality assessment,”Bioinformatics, vol. 39, no. 4, btad157, 2023

  45. [45]

    Deep re- inforcement learning for multiparameter optimization in de novo drug design,

    N. St ˚ahl, G. Falkman, A. Karlsson, G. Mathiason, and J. Bostrom, “Deep re- inforcement learning for multiparameter optimization in de novo drug design,” Journal of chemical information and modeling, vol. 59, no. 7, pp. 3166–3176, 2019

  46. [46]

    Molecular generation strategy and optimization based on a2c reinforcement learning in de novo drug design,

    Q. Wang, Z. Wei, X. Hu, Z. Wang, Y . Dong, and H. Liu, “Molecular generation strategy and optimization based on a2c reinforcement learning in de novo drug design,”Bioinformatics, vol. 39, no. 11, btad693, 2023

  47. [47]

    De novo drug design using reinforcement learning with graph-based deep generative models,

    S. R. Atance, J. V . Diez, O. Engkvist, S. Olsson, and R. Mercado, “De novo drug design using reinforcement learning with graph-based deep generative models,”Journal of chemical information and modeling, vol. 62, no. 20, pp. 4863–4872, 2022

  48. [48]

    Molecule generation using transformers and policy gradient reinforcement learning,

    E. Mazuz, G. Shtar, B. Shapira, and L. Rokach, “Molecule generation using transformers and policy gradient reinforcement learning,”Scientific Reports, vol. 13, no. 1, p. 8799, 2023

  49. [49]

    Paccmannrl: De novo generation of hit-like anticancer molecules from tran- scriptomic data via reinforcement learning,

    J. Born, M. Manica, A. Oskooei, J. Cadow, G. Markert, and M. R. Mart ´ınez, “Paccmannrl: De novo generation of hit-like anticancer molecules from tran- scriptomic data via reinforcement learning,”Iscience, vol. 24, no. 4, 2021

  50. [50]

    Enabling target-aware molecule generation to follow multi objectives with pareto mcts,

    Y . Yang et al., “Enabling target-aware molecule generation to follow multi objectives with pareto mcts,”Communications Biology, vol. 7, no. 1, p. 1074, 2024

  51. [51]

    Activity cliff-aware reinforcement learning for de novo drug design,

    X. Hu, G. Liu, Y . Zhao, and H. Zhang, “Activity cliff-aware reinforcement learning for de novo drug design,”Journal of Cheminformatics, vol. 17, no. 1, p. 54, 2025

  52. [52]

    Acegen: Reinforcement learning of generative chemical agents for drug discovery,

    A. Bou et al., “Acegen: Reinforcement learning of generative chemical agents for drug discovery,”Journal of Chemical Information and Modeling, vol. 64, no. 15, pp. 5900–5911, 2024

  53. [53]

    Mol-air: Molecular reinforcement learning with adaptive intrinsic rewards for goal-directed molecular generation,

    J. Park, J. Ahn, J. Choi, and J. Kim, “Mol-air: Molecular reinforcement learning with adaptive intrinsic rewards for goal-directed molecular generation,”Journal of Chemical Information and Modeling, vol. 65, no. 5, pp. 2283–2296, 2025

  54. [54]

    Reinvent 4: Modern ai–driven generative molecule design,

    H. H. Loeffler et al., “Reinvent 4: Modern ai–driven generative molecule design,”Journal of Cheminformatics, vol. 16, no. 1, p. 20, 2024

  55. [55]

    Practical notes on building molecular graph generative models,

    R. Mercado et al., “Practical notes on building molecular graph generative models,”Applied AI Letters, vol. 1, no. 2, 2020

  56. [56]

    Rgfn: Synthesizable molecular generation using gflownets,

    M. Koziarski et al., “Rgfn: Synthesizable molecular generation using gflownets,”Advances in Neural Information Processing Systems, vol. 37, pp. 46 908–46 955, 2024

  57. [57]

    Lessons learned in empirical scoring with smina from the csar 2011 benchmarking exercise,

    D. R. Koes, M. P. Baumgartner, and C. J. Camacho, “Lessons learned in empirical scoring with smina from the csar 2011 benchmarking exercise,” Journal of chemical information and modeling, vol. 53, no. 8, pp. 1893–1904, 2013