Fairness Definitions and Metrics in Deep Reinforcement Learning for Drug Discovery in Healthcare: A Rapid Evidence Review

Behrouz Far; Esmaeil Shakeri; Ronnie de Souza Santos

arxiv: 2606.02902 · v1 · pith:PCQV7MXDnew · submitted 2026-06-01 · 💻 cs.CY · cs.LG

Fairness Definitions and Metrics in Deep Reinforcement Learning for Drug Discovery in Healthcare: A Rapid Evidence Review

Esmaeil Shakeri , Ronnie de Souza Santos , Behrouz Far This is my paper

Pith reviewed 2026-06-28 12:04 UTC · model grok-4.3

classification 💻 cs.CY cs.LG

keywords fairness definitionsdeep reinforcement learningdrug discoverymolecule generationdistribution parityoutcome parityreward designcancer targets

0 comments

The pith

A rapid evidence review assembles fairness definitions and metrics for deep reinforcement learning in de novo molecule generation, centered on distribution and outcome parity across cancer and non-cancer targets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how dataset composition, split strategies, and reward designs in deep reinforcement learning for drug candidate generation can produce uneven results across disease areas and chemical structures. It synthesizes fairness definitions and metrics from existing studies to track these imbalances, particularly parity in physicochemical properties, scaffold diversity, validity, toxicity, and synthetic accessibility. A sympathetic reader would care because such imbalances could lead to AI tools that underperform for certain cancer subtypes or chemotypes, affecting equitable healthcare applications. The review analyzes links between choices like scaffold versus random splits and rewards such as QED or docking scores and observed parity effects. It supplies practical guidance on reporting these parities while noting remaining gaps in trustworthy generation.

Core claim

The review establishes that fairness in DRL molecule generation is captured through metrics of distribution parity in key descriptors and chemotype diversity plus outcome parity in groupwise validity, toxicity, and synthetic accessibility, with emphasis on cancer versus non-cancer indications. Through PRISMA-style screening and content coding of literature from 2017 onward, the work links these parity outcomes directly to dataset split strategies and reward components, yielding a concise set of definitions and metrics along with guidance for their reporting in future evaluations.

What carries the argument

Content coding of screened studies that maps reported parity outcomes to dataset composition, split strategies such as scaffold versus random, and reward designs including QED, docking, toxicity, and synthetic accessibility.

If this is right

Researchers gain concrete guidance for reporting distribution parity and outcome parity when evaluating DRL molecule generators.
Dataset split strategies and reward designs can be directly related to observed parity effects in cancer-relevant generation.
Metrics should separately track parity across cancer versus non-cancer indications and within subtypes.
Open gaps remain in extending these fairness considerations to fully trustworthy DRL applications in drug discovery.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The metrics could be retroactively applied to existing published DRL models to uncover previously undetected biases in underrepresented targets.
The synthesis approach might generalize to fairness assessment in other machine learning methods used for molecular design.
Prospective experiments that optimize models under the proposed parity metrics could test whether they lead to more balanced performance on held-out chemotypes.

Load-bearing premise

The database searches from 2017 onward combined with PRISMA-style screening and content coding accurately capture the relevant studies and correctly connect parity outcomes to dataset and reward choices without selection bias.

What would settle it

A controlled test of DRL models showing that the identified parity metrics fail to detect systematic differences in generated molecule properties across cancer subtypes when dataset splits or rewards are varied would undermine the claimed utility of the metrics.

Figures

Figures reproduced from arXiv: 2606.02902 by Behrouz Far, Esmaeil Shakeri, Ronnie de Souza Santos.

**Figure 1.** Figure 1: Distribution of included studies by publication source. capture multidisciplinary and applied work in computational chemistry and healthcare AI. We also searched journal collections from the Nature portfolio, JMIR, and ACS to ensure coverage of high-impact interdisciplinary and translational research. Finally, arXiv was screened to identify recent preprints in RL and responsible AI; however, preprints wer… view at source ↗

**Figure 3.** Figure 3: presents the yearly distribution of the included studies over 2017–2025 and the corresponding fitted linear trend. Overall, the evidence base expands over time. The early period (2017–2019) shows limited and relatively stable output (1–2 studies per year), followed by a prolonged lowactivity interval during 2020–2022 (approximately one study per year). In contrast, publication activity increases sharply f… view at source ↗

**Figure 4.** Figure 4: Distribution of the studies by the 7 most countries. diverse research contribution [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: provides a global overview of the geographic coverage of the included studies, with countries contributing at least one study highlighted on the world map. As illustrated, the research footprint spans North America, Europe, and Asia, demonstrating that the topic has attracted international attention. Contributions are geographically dispersed across multiple regions rather than confined to a single contin… view at source ↗

**Figure 6.** Figure 6: Distribution of type of publication. G. Types of Applications [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

read the original abstract

Deep reinforcement learning (DRL) is increasingly applied to de novo molecular design, but choices in data, rewards, and evaluation can yield uneven performance across disease areas and chemotypes. Despite this, there is no concise synthesis of how fairness is defined, measured, and tested in DRL-based drug discovery. In this rapid evidence review, we synthesize fairness definitions and metrics for DRL-driven molecule generation in healthcare. We focus on three questions: (i) how dataset composition and split strategies, especially scaffold versus random splits, affect evaluation and distribution shift; (ii) how reward design (e.g., QED, docking, toxicity, synthetic accessibility) can create or mitigate bias, with emphasis on cancer targets; and (iii) which measurable metrics best capture fairness. This includes parity across cancer versus non-cancer indications and across cancer subtypes. It also includes distributional balance in key physicochemical descriptors, scaffold/chemotype diversity, groupwise validity, toxicity, and synthetic accessibility. From 2017 onward, we searched major biomedical, computer science, and engineering literature databases and used arXiv for horizon scanning. Records were screened using PRISMA-style procedures and analyzed via content coding to link reported parity outcomes to dataset and reward choices. Our review provides a concise set of fairness definitions and metrics for DRL molecule generation. It offers practical guidance for reporting distribution parity and outcome parity. It also summarizes how dataset and reward choices relate to observed parity effects and identifies open gaps relevant to trustworthy, cancer-relevant DRL generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a standard rapid review that organizes fairness metrics for DRL in drug discovery but adds little beyond synthesis and its value rests on unverified search completeness.

read the letter

This paper is a rapid evidence review that collects existing work on fairness definitions and metrics when deep reinforcement learning is used for de novo molecule generation in healthcare, with special attention to cancer targets. It asks how dataset splits and reward functions shape parity across indications and chemotypes, then maps those choices to observable effects.

It does a reasonable job laying out the practical angles: scaffold versus random splits and their impact on distribution shift, rewards such as QED, docking, toxicity, and synthetic accessibility, and a short list of metrics for distribution parity and outcome parity. The PRISMA-style screening plus content coding that ties reported parity results back to those design decisions follows a defensible template for this kind of synthesis.

The soft spots are exactly where the stress test points. The field is small and moving fast, so any gaps in the 2017+ database and arXiv search or any drift in how coders linked outcomes to specific splits and rewards would directly weaken the claimed guidance. The abstract gives no counts of records screened or included, which leaves the reliability of the synthesis hard to judge from the outside. The cancer focus is a clear scoping choice but further limits how far the takeaways travel.

The work is mainly useful to researchers already running DRL pipelines for molecular design who want a compact reference on fairness reporting. It does not introduce new methods or resolve open questions, but it could nudge better documentation in that narrow corner of applied work. A referee could usefully check the search protocol and coding reliability.

I would send it to peer review rather than desk reject; the topic is timely enough and the method is standard enough that external eyes on the execution details make sense.

Referee Report

1 major / 0 minor

Summary. The manuscript is a rapid evidence review synthesizing fairness definitions and metrics in deep reinforcement learning (DRL) for de novo molecular design in drug discovery. It addresses three questions: effects of dataset composition and split strategies (scaffold vs. random) on distribution shift; how reward designs (QED, docking, toxicity, synthetic accessibility) create or mitigate bias especially for cancer targets; and which metrics best capture fairness including parity across cancer indications, physicochemical descriptors, scaffold diversity, validity, toxicity, and accessibility. The review uses PRISMA-style screening of literature from 2017 onward across major databases plus arXiv, followed by content coding to link reported parity outcomes to design choices, and concludes with concise definitions/metrics, reporting guidance for distribution and outcome parity, summaries of choice-outcome relations, and gap identification.

Significance. If the synthesis holds and accurately maps the (small) literature without coverage or coding bias, the work would supply usable practical guidance for reporting parity in DRL molecule generation, directly supporting more trustworthy AI applications in healthcare. The PRISMA-style process and explicit linkage of parity effects to concrete choices (splits, rewards) represent a strength for systematic, reproducible synthesis in this emerging niche.

major comments (1)

[Abstract] Abstract (search and screening description): The PRISMA-style literature search from 2017+ and content coding are presented at a high level, but the manuscript provides no quantitative details on records identified, screened, or included after eligibility assessment. In this narrow, rapidly growing field, absence of these figures directly undermines evaluation of whether the synthesis captured essentially all relevant DRL drug-discovery papers and correctly associated parity outcomes with dataset/reward decisions without selection or interpretation bias, which is load-bearing for the claimed guidance and gap identification.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments on our rapid evidence review. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract (search and screening description): The PRISMA-style literature search from 2017+ and content coding are presented at a high level, but the manuscript provides no quantitative details on records identified, screened, or included after eligibility assessment. In this narrow, rapidly growing field, absence of these figures directly undermines evaluation of whether the synthesis captured essentially all relevant DRL drug-discovery papers and correctly associated parity outcomes with dataset/reward decisions without selection or interpretation bias, which is load-bearing for the claimed guidance and gap identification.

Authors: We agree that the absence of quantitative search and screening figures in the abstract limits transparency and the ability to assess coverage in this emerging area. The revised manuscript will incorporate the specific numbers of records identified, screened, and included (along with a PRISMA-style flow summary) directly into the abstract and expand the methods section with the full flow diagram or table. This addresses the concern without altering the scope or conclusions. revision: yes

Circularity Check

0 steps flagged

No circularity: pure literature synthesis with no derivations or predictions

full rationale

This is a rapid evidence review paper that performs PRISMA-style literature search, screening, and content coding to synthesize fairness definitions and metrics from existing DRL drug-discovery studies. It contains no equations, no fitted parameters, no predictions, and no derivations. The central claim is a curated summary and practical guidance extracted from external papers; nothing reduces by construction to the authors' own inputs, self-citations, or ansatzes. The search methodology is presented as an independent process whose validity rests on external benchmarks (database coverage, PRISMA standards) rather than internal self-reference.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that the chosen search strategy and coding process yield an unbiased and representative sample of the literature; no free parameters or invented entities are introduced.

axioms (1)

domain assumption PRISMA-style screening procedures are appropriate and sufficient for identifying relevant records on fairness in DRL-based drug discovery
The abstract states that records were screened using PRISMA-style procedures.

pith-pipeline@v0.9.1-grok · 5816 in / 1225 out tokens · 26761 ms · 2026-06-28T12:04:39.987544+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 2 canonical work pages · 2 internal anchors

[1]

Deep reinforcement learning for de novo drug design,

M. Popova, O. Isayev, and A. Tropsha, “Deep reinforcement learning for de novo drug design,”Science advances, vol. 4, no. 7, eaap7885, 2018

2018
[2]

Molecular de-novo design through deep reinforcement learning,

M. Olivecrona, T. Blaschke, O. Engkvist, and H. Chen, “Molecular de-novo design through deep reinforcement learning,”Journal of cheminformatics, vol. 9, pp. 1–14, 2017

2017
[3]

Accelerating drug discovery with deep reinforcement learning: Molecular generation using deep q-network,

E. Shakeri and B. Far, “Accelerating drug discovery with deep reinforcement learning: Molecular generation using deep q-network,” in2025 IEEE Inter- national Conference on Information Reuse and Integration and Data Science (IRI), IEEE, 2025, pp. 91–96

2025
[4]

Principles of early drug discovery,

J. P. Hughes, S. Rees, S. B. Kalindjian, and K. L. Philpott, “Principles of early drug discovery,”British journal of pharmacology, vol. 162, no. 6, pp. 1239– 1249, 2011

2011
[5]

How to improve r&d productivity: The pharmaceutical industry’s grand challenge,

S. M. Paul et al., “How to improve r&d productivity: The pharmaceutical industry’s grand challenge,”Nature reviews Drug discovery, vol. 9, no. 3, pp. 203–214, 2010

2010
[6]

Innovation in pharma: New r&d cost estimates,

J. A. DiMasi, H. G. Grabowski, and R. W. Hansen, “Innovation in pharma: New r&d cost estimates,”J. Health Econ., vol. 47, pp. 20–33, 2016

2016
[7]

Ex- ploring software fairness debt in gray literature,

R. Sotolani, S. Freire, F. Fronchetti, R. de Souza Santos, and R. Spinola, “Ex- ploring software fairness debt in gray literature,” inEuromicro Conference on Software Engineering and Advanced Applications, Springer, 2025, pp. 85–104

2025
[8]

A framework for understanding sources of harm throughout the machine learning life cycle,

H. Suresh and J. Guttag, “A framework for understanding sources of harm throughout the machine learning life cycle,” inProceedings of the 1st ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimiza- tion, 2021, pp. 1–9

2021
[9]

A survey on bias and fairness in machine learning,

N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan, “A survey on bias and fairness in machine learning,”ACM computing surveys (CSUR), vol. 54, no. 6, pp. 1–35, 2021

2021
[10]

A comparative study of fairness-enhancing inter- ventions in machine learning,

S. A. Friedler, C. Scheidegger, S. Venkatasubramanian, S. Choudhary, E. P. Hamilton, and D. Roth, “A comparative study of fairness-enhancing inter- ventions in machine learning,” inProceedings of the conference on fairness, accountability, and transparency, 2019, pp. 329–338

2019
[11]

Barocas, M

S. Barocas, M. Hardt, and A. Narayanan,Fairness and machine learning: Limitations and opportunities. MIT press, 2023

2023
[12]

Fairness through awareness,

C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness through awareness,” inProceedings of the 3rd innovations in theoretical computer science conference, 2012, pp. 214–226

2012
[13]

Inherent Trade-Offs in the Fair Determination of Risk Scores

J. Kleinberg, S. Mullainathan, and M. Raghavan, “Inherent trade-offs in the fair determination of risk scores,”arXiv preprint arXiv:1609.05807, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[14]

Model cards for model reporting,

M. Mitchell et al., “Model cards for model reporting,” inProceedings of the conference on fairness, accountability, and transparency, 2019, pp. 220–229

2019
[15]

Algorithmic fairness in artificial intelligence for medicine and healthcare,

R. J. Chen et al., “Algorithmic fairness in artificial intelligence for medicine and healthcare,”Nature biomedical engineering, vol. 7, no. 6, pp. 719–742, 2023

2023
[16]

Fairness of artificial intelligence in healthcare: Review and recommendations,

D. Ueda et al., “Fairness of artificial intelligence in healthcare: Review and recommendations,”Japanese journal of radiology, vol. 42, no. 1, pp. 3–15, 2024

2024
[17]

The properties of known drugs. 1. molecular frameworks,

G. W. Bemis and M. A. Murcko, “The properties of known drugs. 1. molecular frameworks,”Journal of medicinal chemistry, vol. 39, no. 15, pp. 2887–2893, 1996

1996
[18]

On calibration of modern neural networks,

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inInternational conference on machine learning, PMLR, 2017, pp. 1321–1330

2017
[19]

Predicting with confidence: Using conformal prediction in drug discovery,

J. Alvarsson, S. A. McShane, U. Norinder, and O. Spjuth, “Predicting with confidence: Using conformal prediction in drug discovery,”Journal of Phar- maceutical Sciences, vol. 110, no. 1, pp. 42–49, 2021

2021
[20]

Concrete Problems in AI Safety

D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman, and D. Man ´e, “Concrete problems in ai safety,”arXiv preprint arXiv:1606.06565, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[21]

Guacamol: Benchmarking models for de novo molecular design,

N. Brown, M. Fiscato, M. H. Segler, and A. C. Vaucher, “Guacamol: Benchmarking models for de novo molecular design,”Journal of chemical information and modeling, vol. 59, no. 3, pp. 1096–1108, 2019

2019
[22]

Generative models should at least be able to design molecules that dock well: A new benchmark,

T. Cieplinski, T. Danel, S. Podlewska, and S. Jastrzebski, “Generative models should at least be able to design molecules that dock well: A new benchmark,” Journal of Chemical Information and Modeling, vol. 63, no. 11, pp. 3238–3247, 2023

2023
[23]

Diverse hits in de novo molecule design: Diversity-based comparison of goal-directed generators,

P. Renz, S. Luukkonen, and G. Klambauer, “Diverse hits in de novo molecule design: Diversity-based comparison of goal-directed generators,”Journal of Chemical Information and Modeling, vol. 64, no. 15, pp. 5756–5761, 2024

2024
[24]

Algorithmic fairness and bias mitigation for clinical machine learning with deep reinforcement learning,

J. Yang, A. A. Soltan, D. W. Eyre, and D. A. Clifton, “Algorithmic fairness and bias mitigation for clinical machine learning with deep reinforcement learning,” Nature Machine Intelligence, vol. 5, no. 8, pp. 884–894, 2023

2023
[25]

Bias in reinforcement learning: A review in healthcare applications,

B. Smith, A. Khojandi, and R. Vasudevan, “Bias in reinforcement learning: A review in healthcare applications,”ACM Computing Surveys, vol. 56, no. 2, pp. 1–17, 2023

2023
[26]

Drl-based control of chemo-drug dose in cancer treatment,

H. Mashayekhi, M. Nazari, F. Jafarinejad, and N. Meskin, “Drl-based control of chemo-drug dose in cancer treatment,”Comput. Methods Programs Biomed., vol. 243, p. 107 884, 2024

2024
[27]

Applications of machine learning in drug discovery and development,

J. Vamathevan et al., “Applications of machine learning in drug discovery and development,”Nature reviews Drug discovery, vol. 18, no. 6, pp. 463–477, 2019

2019
[28]

Assessing the impact of generative ai on medicinal chemistry,

W. P. Walters and M. Murcko, “Assessing the impact of generative ai on medicinal chemistry,”Nature biotechnology, vol. 38, no. 2, pp. 143–145, 2020

2020
[29]

Cochrane rapid reviews methods group offers evidence- informed guidance to conduct rapid reviews,

C. Garritty et al., “Cochrane rapid reviews methods group offers evidence- informed guidance to conduct rapid reviews,”Journal of clinical epidemiology, vol. 130, pp. 13–22, 2021

2021
[30]

A. C. Tricco, E. V . Langlois, S. E. Straus, et al.,Rapid reviews to strengthen health policy and systems: a practical guide. World Health Organization Geneva, 2017

2017
[31]

The prisma 2020 statement: An updated guideline for reporting systematic reviews,

M. J. Page et al., “The prisma 2020 statement: An updated guideline for reporting systematic reviews,”bmj, vol. 372, 2021

2020
[32]

A scoping review of rapid review methods,

A. C. Tricco et al., “A scoping review of rapid review methods,”BMC medicine, vol. 13, no. 1, p. 224, 2015

2015
[33]

Krippendorff,Content analysis: An introduction to its methodology

K. Krippendorff,Content analysis: An introduction to its methodology. Sage publications, 2018

2018
[34]

Qualitative methods in empirical studies of software engineer- ing,

C. B. Seaman, “Qualitative methods in empirical studies of software engineer- ing,”IEEE Transactions on software engineering, vol. 25, no. 4, pp. 557–572, 1999

1999
[35]

Rapid reviews in software engineering,

B. Cartaxo, G. Pinto, and S. Soares, “Rapid reviews in software engineering,” inContemporary Empirical Methods in Software Engineering, Springer, 2020, pp. 357–384

2020
[36]

Interrater reliability: The kappa statistic,

M. L. McHugh, “Interrater reliability: The kappa statistic,”Biochemia medica, vol. 22, no. 3, pp. 276–282, 2012

2012
[37]

Generative models for molecular discovery: Recent advances and challenges,

C. Bilodeau, W. Jin, T. Jaakkola, R. Barzilay, and K. F. Jensen, “Generative models for molecular discovery: Recent advances and challenges,”Wiley Interdisciplinary Reviews: Computational Molecular Science, vol. 12, no. 5, e1608, 2022

2022
[38]

Optimization of molecules via deep reinforcement learning,

Z. Zhou, S. Kearnes, L. Li, R. N. Zare, and P. Riley, “Optimization of molecules via deep reinforcement learning,”Scientific reports, vol. 9, no. 1, p. 10 752, 2019

2019
[39]

Diffmeta-rl: Reinforcement learning-guided graph diffusion for metabolically stable molecular generation,

D. Liu et al., “Diffmeta-rl: Reinforcement learning-guided graph diffusion for metabolically stable molecular generation,”Journal of Chemical Information and Modeling, 2025

2025
[40]

Utilizing reinforcement learning for de novo drug design,

H. Gummesson Svensson, C. Tyrchan, O. Engkvist, and M. Haghir Chehreghani, “Utilizing reinforcement learning for de novo drug design,” Machine Learning, vol. 113, no. 7, pp. 4811–4843, 2024

2024
[41]

Evaluation of reinforcement learning in transformer-based molecular design,

J. He et al., “Evaluation of reinforcement learning in transformer-based molecular design,”Journal of Cheminformatics, vol. 16, no. 1, p. 95, 2024

2024
[42]

Umap-based clustering split for rigorous evaluation of ai models for virtual screening on cancer cell lines,

Q. Guo, S. Hernandez-Hernandez, and P. J. Ballester, “Umap-based clustering split for rigorous evaluation of ai models for virtual screening on cancer cell lines,”Journal of Cheminformatics, vol. 17, no. 1, p. 94, 2025

2025
[43]

Graph convolutional policy network for goal-directed molecular graph generation,

J. You, B. Liu, Z. Ying, V . Pande, and J. Leskovec, “Graph convolutional policy network for goal-directed molecular graph generation,”Advances in neural information processing systems, vol. 31, 2018

2018
[44]

De novo drug design by iterative mul- tiobjective deep reinforcement learning with graph-based molecular quality assessment,

Y . Fang, X. Pan, and H.-B. Shen, “De novo drug design by iterative mul- tiobjective deep reinforcement learning with graph-based molecular quality assessment,”Bioinformatics, vol. 39, no. 4, btad157, 2023

2023
[45]

Deep re- inforcement learning for multiparameter optimization in de novo drug design,

N. St ˚ahl, G. Falkman, A. Karlsson, G. Mathiason, and J. Bostrom, “Deep re- inforcement learning for multiparameter optimization in de novo drug design,” Journal of chemical information and modeling, vol. 59, no. 7, pp. 3166–3176, 2019

2019
[46]

Molecular generation strategy and optimization based on a2c reinforcement learning in de novo drug design,

Q. Wang, Z. Wei, X. Hu, Z. Wang, Y . Dong, and H. Liu, “Molecular generation strategy and optimization based on a2c reinforcement learning in de novo drug design,”Bioinformatics, vol. 39, no. 11, btad693, 2023

2023
[47]

De novo drug design using reinforcement learning with graph-based deep generative models,

S. R. Atance, J. V . Diez, O. Engkvist, S. Olsson, and R. Mercado, “De novo drug design using reinforcement learning with graph-based deep generative models,”Journal of chemical information and modeling, vol. 62, no. 20, pp. 4863–4872, 2022

2022
[48]

Molecule generation using transformers and policy gradient reinforcement learning,

E. Mazuz, G. Shtar, B. Shapira, and L. Rokach, “Molecule generation using transformers and policy gradient reinforcement learning,”Scientific Reports, vol. 13, no. 1, p. 8799, 2023

2023
[49]

Paccmannrl: De novo generation of hit-like anticancer molecules from tran- scriptomic data via reinforcement learning,

J. Born, M. Manica, A. Oskooei, J. Cadow, G. Markert, and M. R. Mart ´ınez, “Paccmannrl: De novo generation of hit-like anticancer molecules from tran- scriptomic data via reinforcement learning,”Iscience, vol. 24, no. 4, 2021

2021
[50]

Enabling target-aware molecule generation to follow multi objectives with pareto mcts,

Y . Yang et al., “Enabling target-aware molecule generation to follow multi objectives with pareto mcts,”Communications Biology, vol. 7, no. 1, p. 1074, 2024

2024
[51]

Activity cliff-aware reinforcement learning for de novo drug design,

X. Hu, G. Liu, Y . Zhao, and H. Zhang, “Activity cliff-aware reinforcement learning for de novo drug design,”Journal of Cheminformatics, vol. 17, no. 1, p. 54, 2025

2025
[52]

Acegen: Reinforcement learning of generative chemical agents for drug discovery,

A. Bou et al., “Acegen: Reinforcement learning of generative chemical agents for drug discovery,”Journal of Chemical Information and Modeling, vol. 64, no. 15, pp. 5900–5911, 2024

2024
[53]

Mol-air: Molecular reinforcement learning with adaptive intrinsic rewards for goal-directed molecular generation,

J. Park, J. Ahn, J. Choi, and J. Kim, “Mol-air: Molecular reinforcement learning with adaptive intrinsic rewards for goal-directed molecular generation,”Journal of Chemical Information and Modeling, vol. 65, no. 5, pp. 2283–2296, 2025

2025
[54]

Reinvent 4: Modern ai–driven generative molecule design,

H. H. Loeffler et al., “Reinvent 4: Modern ai–driven generative molecule design,”Journal of Cheminformatics, vol. 16, no. 1, p. 20, 2024

2024
[55]

Practical notes on building molecular graph generative models,

R. Mercado et al., “Practical notes on building molecular graph generative models,”Applied AI Letters, vol. 1, no. 2, 2020

2020
[56]

Rgfn: Synthesizable molecular generation using gflownets,

M. Koziarski et al., “Rgfn: Synthesizable molecular generation using gflownets,”Advances in Neural Information Processing Systems, vol. 37, pp. 46 908–46 955, 2024

2024
[57]

Lessons learned in empirical scoring with smina from the csar 2011 benchmarking exercise,

D. R. Koes, M. P. Baumgartner, and C. J. Camacho, “Lessons learned in empirical scoring with smina from the csar 2011 benchmarking exercise,” Journal of chemical information and modeling, vol. 53, no. 8, pp. 1893–1904, 2013

2011

[1] [1]

Deep reinforcement learning for de novo drug design,

M. Popova, O. Isayev, and A. Tropsha, “Deep reinforcement learning for de novo drug design,”Science advances, vol. 4, no. 7, eaap7885, 2018

2018

[2] [2]

Molecular de-novo design through deep reinforcement learning,

M. Olivecrona, T. Blaschke, O. Engkvist, and H. Chen, “Molecular de-novo design through deep reinforcement learning,”Journal of cheminformatics, vol. 9, pp. 1–14, 2017

2017

[3] [3]

Accelerating drug discovery with deep reinforcement learning: Molecular generation using deep q-network,

E. Shakeri and B. Far, “Accelerating drug discovery with deep reinforcement learning: Molecular generation using deep q-network,” in2025 IEEE Inter- national Conference on Information Reuse and Integration and Data Science (IRI), IEEE, 2025, pp. 91–96

2025

[4] [4]

Principles of early drug discovery,

J. P. Hughes, S. Rees, S. B. Kalindjian, and K. L. Philpott, “Principles of early drug discovery,”British journal of pharmacology, vol. 162, no. 6, pp. 1239– 1249, 2011

2011

[5] [5]

How to improve r&d productivity: The pharmaceutical industry’s grand challenge,

S. M. Paul et al., “How to improve r&d productivity: The pharmaceutical industry’s grand challenge,”Nature reviews Drug discovery, vol. 9, no. 3, pp. 203–214, 2010

2010

[6] [6]

Innovation in pharma: New r&d cost estimates,

J. A. DiMasi, H. G. Grabowski, and R. W. Hansen, “Innovation in pharma: New r&d cost estimates,”J. Health Econ., vol. 47, pp. 20–33, 2016

2016

[7] [7]

Ex- ploring software fairness debt in gray literature,

R. Sotolani, S. Freire, F. Fronchetti, R. de Souza Santos, and R. Spinola, “Ex- ploring software fairness debt in gray literature,” inEuromicro Conference on Software Engineering and Advanced Applications, Springer, 2025, pp. 85–104

2025

[8] [8]

A framework for understanding sources of harm throughout the machine learning life cycle,

H. Suresh and J. Guttag, “A framework for understanding sources of harm throughout the machine learning life cycle,” inProceedings of the 1st ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimiza- tion, 2021, pp. 1–9

2021

[9] [9]

A survey on bias and fairness in machine learning,

N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan, “A survey on bias and fairness in machine learning,”ACM computing surveys (CSUR), vol. 54, no. 6, pp. 1–35, 2021

2021

[10] [10]

A comparative study of fairness-enhancing inter- ventions in machine learning,

S. A. Friedler, C. Scheidegger, S. Venkatasubramanian, S. Choudhary, E. P. Hamilton, and D. Roth, “A comparative study of fairness-enhancing inter- ventions in machine learning,” inProceedings of the conference on fairness, accountability, and transparency, 2019, pp. 329–338

2019

[11] [11]

Barocas, M

S. Barocas, M. Hardt, and A. Narayanan,Fairness and machine learning: Limitations and opportunities. MIT press, 2023

2023

[12] [12]

Fairness through awareness,

C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness through awareness,” inProceedings of the 3rd innovations in theoretical computer science conference, 2012, pp. 214–226

2012

[13] [13]

Inherent Trade-Offs in the Fair Determination of Risk Scores

J. Kleinberg, S. Mullainathan, and M. Raghavan, “Inherent trade-offs in the fair determination of risk scores,”arXiv preprint arXiv:1609.05807, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[14] [14]

Model cards for model reporting,

M. Mitchell et al., “Model cards for model reporting,” inProceedings of the conference on fairness, accountability, and transparency, 2019, pp. 220–229

2019

[15] [15]

Algorithmic fairness in artificial intelligence for medicine and healthcare,

R. J. Chen et al., “Algorithmic fairness in artificial intelligence for medicine and healthcare,”Nature biomedical engineering, vol. 7, no. 6, pp. 719–742, 2023

2023

[16] [16]

Fairness of artificial intelligence in healthcare: Review and recommendations,

D. Ueda et al., “Fairness of artificial intelligence in healthcare: Review and recommendations,”Japanese journal of radiology, vol. 42, no. 1, pp. 3–15, 2024

2024

[17] [17]

The properties of known drugs. 1. molecular frameworks,

G. W. Bemis and M. A. Murcko, “The properties of known drugs. 1. molecular frameworks,”Journal of medicinal chemistry, vol. 39, no. 15, pp. 2887–2893, 1996

1996

[18] [18]

On calibration of modern neural networks,

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inInternational conference on machine learning, PMLR, 2017, pp. 1321–1330

2017

[19] [19]

Predicting with confidence: Using conformal prediction in drug discovery,

J. Alvarsson, S. A. McShane, U. Norinder, and O. Spjuth, “Predicting with confidence: Using conformal prediction in drug discovery,”Journal of Phar- maceutical Sciences, vol. 110, no. 1, pp. 42–49, 2021

2021

[20] [20]

Concrete Problems in AI Safety

D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman, and D. Man ´e, “Concrete problems in ai safety,”arXiv preprint arXiv:1606.06565, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[21] [21]

Guacamol: Benchmarking models for de novo molecular design,

N. Brown, M. Fiscato, M. H. Segler, and A. C. Vaucher, “Guacamol: Benchmarking models for de novo molecular design,”Journal of chemical information and modeling, vol. 59, no. 3, pp. 1096–1108, 2019

2019

[22] [22]

Generative models should at least be able to design molecules that dock well: A new benchmark,

T. Cieplinski, T. Danel, S. Podlewska, and S. Jastrzebski, “Generative models should at least be able to design molecules that dock well: A new benchmark,” Journal of Chemical Information and Modeling, vol. 63, no. 11, pp. 3238–3247, 2023

2023

[23] [23]

Diverse hits in de novo molecule design: Diversity-based comparison of goal-directed generators,

P. Renz, S. Luukkonen, and G. Klambauer, “Diverse hits in de novo molecule design: Diversity-based comparison of goal-directed generators,”Journal of Chemical Information and Modeling, vol. 64, no. 15, pp. 5756–5761, 2024

2024

[24] [24]

Algorithmic fairness and bias mitigation for clinical machine learning with deep reinforcement learning,

J. Yang, A. A. Soltan, D. W. Eyre, and D. A. Clifton, “Algorithmic fairness and bias mitigation for clinical machine learning with deep reinforcement learning,” Nature Machine Intelligence, vol. 5, no. 8, pp. 884–894, 2023

2023

[25] [25]

Bias in reinforcement learning: A review in healthcare applications,

B. Smith, A. Khojandi, and R. Vasudevan, “Bias in reinforcement learning: A review in healthcare applications,”ACM Computing Surveys, vol. 56, no. 2, pp. 1–17, 2023

2023

[26] [26]

Drl-based control of chemo-drug dose in cancer treatment,

H. Mashayekhi, M. Nazari, F. Jafarinejad, and N. Meskin, “Drl-based control of chemo-drug dose in cancer treatment,”Comput. Methods Programs Biomed., vol. 243, p. 107 884, 2024

2024

[27] [27]

Applications of machine learning in drug discovery and development,

J. Vamathevan et al., “Applications of machine learning in drug discovery and development,”Nature reviews Drug discovery, vol. 18, no. 6, pp. 463–477, 2019

2019

[28] [28]

Assessing the impact of generative ai on medicinal chemistry,

W. P. Walters and M. Murcko, “Assessing the impact of generative ai on medicinal chemistry,”Nature biotechnology, vol. 38, no. 2, pp. 143–145, 2020

2020

[29] [29]

Cochrane rapid reviews methods group offers evidence- informed guidance to conduct rapid reviews,

C. Garritty et al., “Cochrane rapid reviews methods group offers evidence- informed guidance to conduct rapid reviews,”Journal of clinical epidemiology, vol. 130, pp. 13–22, 2021

2021

[30] [30]

A. C. Tricco, E. V . Langlois, S. E. Straus, et al.,Rapid reviews to strengthen health policy and systems: a practical guide. World Health Organization Geneva, 2017

2017

[31] [31]

The prisma 2020 statement: An updated guideline for reporting systematic reviews,

M. J. Page et al., “The prisma 2020 statement: An updated guideline for reporting systematic reviews,”bmj, vol. 372, 2021

2020

[32] [32]

A scoping review of rapid review methods,

A. C. Tricco et al., “A scoping review of rapid review methods,”BMC medicine, vol. 13, no. 1, p. 224, 2015

2015

[33] [33]

Krippendorff,Content analysis: An introduction to its methodology

K. Krippendorff,Content analysis: An introduction to its methodology. Sage publications, 2018

2018

[34] [34]

Qualitative methods in empirical studies of software engineer- ing,

C. B. Seaman, “Qualitative methods in empirical studies of software engineer- ing,”IEEE Transactions on software engineering, vol. 25, no. 4, pp. 557–572, 1999

1999

[35] [35]

Rapid reviews in software engineering,

B. Cartaxo, G. Pinto, and S. Soares, “Rapid reviews in software engineering,” inContemporary Empirical Methods in Software Engineering, Springer, 2020, pp. 357–384

2020

[36] [36]

Interrater reliability: The kappa statistic,

M. L. McHugh, “Interrater reliability: The kappa statistic,”Biochemia medica, vol. 22, no. 3, pp. 276–282, 2012

2012

[37] [37]

Generative models for molecular discovery: Recent advances and challenges,

C. Bilodeau, W. Jin, T. Jaakkola, R. Barzilay, and K. F. Jensen, “Generative models for molecular discovery: Recent advances and challenges,”Wiley Interdisciplinary Reviews: Computational Molecular Science, vol. 12, no. 5, e1608, 2022

2022

[38] [38]

Optimization of molecules via deep reinforcement learning,

Z. Zhou, S. Kearnes, L. Li, R. N. Zare, and P. Riley, “Optimization of molecules via deep reinforcement learning,”Scientific reports, vol. 9, no. 1, p. 10 752, 2019

2019

[39] [39]

Diffmeta-rl: Reinforcement learning-guided graph diffusion for metabolically stable molecular generation,

D. Liu et al., “Diffmeta-rl: Reinforcement learning-guided graph diffusion for metabolically stable molecular generation,”Journal of Chemical Information and Modeling, 2025

2025

[40] [40]

Utilizing reinforcement learning for de novo drug design,

H. Gummesson Svensson, C. Tyrchan, O. Engkvist, and M. Haghir Chehreghani, “Utilizing reinforcement learning for de novo drug design,” Machine Learning, vol. 113, no. 7, pp. 4811–4843, 2024

2024

[41] [41]

Evaluation of reinforcement learning in transformer-based molecular design,

J. He et al., “Evaluation of reinforcement learning in transformer-based molecular design,”Journal of Cheminformatics, vol. 16, no. 1, p. 95, 2024

2024

[42] [42]

Umap-based clustering split for rigorous evaluation of ai models for virtual screening on cancer cell lines,

Q. Guo, S. Hernandez-Hernandez, and P. J. Ballester, “Umap-based clustering split for rigorous evaluation of ai models for virtual screening on cancer cell lines,”Journal of Cheminformatics, vol. 17, no. 1, p. 94, 2025

2025

[43] [43]

Graph convolutional policy network for goal-directed molecular graph generation,

J. You, B. Liu, Z. Ying, V . Pande, and J. Leskovec, “Graph convolutional policy network for goal-directed molecular graph generation,”Advances in neural information processing systems, vol. 31, 2018

2018

[44] [44]

De novo drug design by iterative mul- tiobjective deep reinforcement learning with graph-based molecular quality assessment,

Y . Fang, X. Pan, and H.-B. Shen, “De novo drug design by iterative mul- tiobjective deep reinforcement learning with graph-based molecular quality assessment,”Bioinformatics, vol. 39, no. 4, btad157, 2023

2023

[45] [45]

Deep re- inforcement learning for multiparameter optimization in de novo drug design,

N. St ˚ahl, G. Falkman, A. Karlsson, G. Mathiason, and J. Bostrom, “Deep re- inforcement learning for multiparameter optimization in de novo drug design,” Journal of chemical information and modeling, vol. 59, no. 7, pp. 3166–3176, 2019

2019

[46] [46]

Molecular generation strategy and optimization based on a2c reinforcement learning in de novo drug design,

Q. Wang, Z. Wei, X. Hu, Z. Wang, Y . Dong, and H. Liu, “Molecular generation strategy and optimization based on a2c reinforcement learning in de novo drug design,”Bioinformatics, vol. 39, no. 11, btad693, 2023

2023

[47] [47]

De novo drug design using reinforcement learning with graph-based deep generative models,

S. R. Atance, J. V . Diez, O. Engkvist, S. Olsson, and R. Mercado, “De novo drug design using reinforcement learning with graph-based deep generative models,”Journal of chemical information and modeling, vol. 62, no. 20, pp. 4863–4872, 2022

2022

[48] [48]

Molecule generation using transformers and policy gradient reinforcement learning,

E. Mazuz, G. Shtar, B. Shapira, and L. Rokach, “Molecule generation using transformers and policy gradient reinforcement learning,”Scientific Reports, vol. 13, no. 1, p. 8799, 2023

2023

[49] [49]

Paccmannrl: De novo generation of hit-like anticancer molecules from tran- scriptomic data via reinforcement learning,

J. Born, M. Manica, A. Oskooei, J. Cadow, G. Markert, and M. R. Mart ´ınez, “Paccmannrl: De novo generation of hit-like anticancer molecules from tran- scriptomic data via reinforcement learning,”Iscience, vol. 24, no. 4, 2021

2021

[50] [50]

Enabling target-aware molecule generation to follow multi objectives with pareto mcts,

Y . Yang et al., “Enabling target-aware molecule generation to follow multi objectives with pareto mcts,”Communications Biology, vol. 7, no. 1, p. 1074, 2024

2024

[51] [51]

Activity cliff-aware reinforcement learning for de novo drug design,

X. Hu, G. Liu, Y . Zhao, and H. Zhang, “Activity cliff-aware reinforcement learning for de novo drug design,”Journal of Cheminformatics, vol. 17, no. 1, p. 54, 2025

2025

[52] [52]

Acegen: Reinforcement learning of generative chemical agents for drug discovery,

A. Bou et al., “Acegen: Reinforcement learning of generative chemical agents for drug discovery,”Journal of Chemical Information and Modeling, vol. 64, no. 15, pp. 5900–5911, 2024

2024

[53] [53]

Mol-air: Molecular reinforcement learning with adaptive intrinsic rewards for goal-directed molecular generation,

J. Park, J. Ahn, J. Choi, and J. Kim, “Mol-air: Molecular reinforcement learning with adaptive intrinsic rewards for goal-directed molecular generation,”Journal of Chemical Information and Modeling, vol. 65, no. 5, pp. 2283–2296, 2025

2025

[54] [54]

Reinvent 4: Modern ai–driven generative molecule design,

H. H. Loeffler et al., “Reinvent 4: Modern ai–driven generative molecule design,”Journal of Cheminformatics, vol. 16, no. 1, p. 20, 2024

2024

[55] [55]

Practical notes on building molecular graph generative models,

R. Mercado et al., “Practical notes on building molecular graph generative models,”Applied AI Letters, vol. 1, no. 2, 2020

2020

[56] [56]

Rgfn: Synthesizable molecular generation using gflownets,

M. Koziarski et al., “Rgfn: Synthesizable molecular generation using gflownets,”Advances in Neural Information Processing Systems, vol. 37, pp. 46 908–46 955, 2024

2024

[57] [57]

Lessons learned in empirical scoring with smina from the csar 2011 benchmarking exercise,

D. R. Koes, M. P. Baumgartner, and C. J. Camacho, “Lessons learned in empirical scoring with smina from the csar 2011 benchmarking exercise,” Journal of chemical information and modeling, vol. 53, no. 8, pp. 1893–1904, 2013

2011