arxiv: 2605.06552 · v1 · submitted 2026-05-07 · 💻 cs.LG

Recognition: unknown

Sequential Design of Genetic Circuits Under Uncertainty With Reinforcement Learning

Michal Kobiela , Diego A. Oyarz\'un , Michael U. Gutmann

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:30 UTC · model grok-4.3

classification 💻 cs.LG

keywords genetic circuitsreinforcement learningsequential designuncertaintystochasticitysynthetic biologyMarkov jump processesdifferential equations

0 comments

The pith

A reinforcement learning policy trained upfront on simulators lets genetic circuit designs adapt immediately to lab variability and molecular noise without repeated parameter inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out a sequential optimization method for genetic circuits that must work despite two sources of uncertainty: the random timing of individual molecular reactions and the fact that parameter values shift from one laboratory or experimental run to another. It trains a policy once, across a distribution of possible parameter values, using either differential-equation or Markov-jump-process simulators; after that training the policy can choose the next experimental action directly from fresh observations. This removes the need to stop after every round, infer parameters, and re-solve an optimization problem. A reader would care because the usual Bayesian loop becomes slow when inference and optimization are expensive; an amortized policy removes that pause and lets the design process keep moving. The approach is shown on a heterologous gene-expression model and on a repressilator, confirming that both forms of uncertainty can be handled inside the same policy.

Core claim

The central claim is that a reinforcement-learning policy, trained in advance across a distribution of uncertain parameters using differential-equation or Markov-jump-process simulators, produces a sequential design strategy that responds directly to each new experimental observation, thereby incorporating both intrinsic reaction stochasticity and cross-laboratory variability without performing explicit parameter inference or re-optimization after every round.

What carries the argument

The RL policy trained across a distribution of simulator parameters, which maps current observations to the next experimental action without intermediate inference.

If this is right

Design cycles avoid the computational delay of inference and re-optimization after each experiment.
The same trained policy can handle both deterministic and stochastic simulator models within one framework.
Sequential suggestions remain effective even when laboratory conditions differ from the training distribution.
Immediate observation-based adaptation is possible for circuits such as repressilators and heterologous expression systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be paired with automated liquid-handling platforms to close the loop from observation to next design step without human intervention.
Because the policy is trained once, it might be reused for families of related circuits that share the same simulator structure.
If the policy generalizes across parameter distributions, it could reduce the total number of physical experiments needed to reach a working circuit compared with non-adaptive methods.

Load-bearing premise

The simulator models capture the relevant uncertainties well enough that a policy trained on them will produce useful adaptations when applied to real laboratory conditions.

What would settle it

Run the trained policy in a real wet-lab setting on the repressilator or gene-expression circuit and measure whether the resulting designs achieve the target behavior faster or more reliably than designs produced by repeated Bayesian inference-plus-optimization cycles under the same variability.

Figures

Figures reproduced from arXiv: 2605.06552 by Diego A. Oyarz\'un, Michael U. Gutmann, Michal Kobiela.

**Figure 1.** Figure 1: Overview of the approach using the example task of maximising gene expression in a biological system. view at source ↗

**Figure 2.** Figure 2: Timelines comparing inference-optimization based approach [ view at source ↗

**Figure 3.** Figure 3: Overview of the methodology. We train a neural network policy that takes all previously tried actions and corresponding view at source ↗

**Figure 4.** Figure 4: The training loop. In each episode, different values of the simulator parameters view at source ↗

**Figure 5.** Figure 5: Design of heterologous gene expression system. A: The host organism view at source ↗

**Figure 6.** Figure 6: Host-aware heterologous gene expression with minimal impact. view at source ↗

**Figure 7.** Figure 7: A: Schematic representation of the oscillator circuit – repressilator. Three genes repress each other in a circular fashion. B: The goal is to design an oscillator with a specific frequency. C: The same action applied to different uncertain parameters can result in very diverse responses (epistemic uncertainty). D: Even if the uncertain parameters are fixed to specific values, the same action can still res… view at source ↗

**Figure 8.** Figure 8: RL-based optimization of the repressilator under biomolecular noise. Top left: Training curve showing improvement of the view at source ↗

**Figure 9.** Figure 9: We present the differences between two random seeds used to initialize the policy and environment. Although the results are view at source ↗

**Figure 10.** Figure 10: We compare the performance of the adaptive policy to four “oracle” policies, i.e., policies trained using the ground-truth view at source ↗

**Figure 11.** Figure 11: Simplified repressilator study. The policy adapts to unknown components of the system to achieve seven oscillations. Top view at source ↗

read the original abstract

The design of biological systems is hindered by uncertainty arising from both intrinsic stochasticity of biomolecular reactions and variability across laboratory or experimental conditions. In this work, we present a sequential framework to optimize genetic circuits under both forms of uncertainty. By employing simulator models based on differential equations or Markov jump processes alongside a reinforcement learning (RL) policy-based approach, our method suggests experiments that adapt to unknown laboratory conditions while accounting for inherent stochasticity. While previous Bayesian methods address uncertainty through iterative experiment-inference-optimization cycles, they typically require computationally expensive inference and optimization steps after each experimental round, leading to delays. To overcome this bottleneck, we propose an amortized approach trained up-front across a distribution of possible uncertain parameters. This strategy sidesteps the need for explicit parameter inference during the design cycle, enabling immediate, observation-based adaptation. We demonstrate our framework on models for heterologous gene expression and a repressilator circuit, showing that it efficiently handles both molecular noise and cross-laboratory variability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper trains an RL policy upfront across parameter distributions so genetic circuit design can adapt to observations without repeated inference, but the in-silico demos on two models give no numbers or robustness checks.

read the letter

The main point is that they replace the usual Bayesian cycle of experiment then inference then re-optimization with a single policy trained once on a range of possible parameters. After that the policy reads the latest observations and picks the next experiment immediately. This targets the computational lag that slows down lab work in synthetic biology, and the framing is a reasonable way to amortize the cost of handling both molecular noise and cross-lab variability.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes an amortized reinforcement learning framework for the sequential design of genetic circuits under uncertainty arising from both intrinsic molecular stochasticity and cross-laboratory parameter variability. Simulator models (ODEs or Markov jump processes) are used to train RL policies across a distribution of uncertain parameters upfront, enabling immediate observation-based adaptation without repeated inference or optimization steps after each experiment. The approach is illustrated on two in-silico models: heterologous gene expression and the repressilator circuit, with the claim that it efficiently handles noise and variability while sidestepping the computational bottlenecks of traditional Bayesian design loops.

Significance. If the central claims hold, the work could meaningfully accelerate synthetic biology workflows by amortizing uncertainty handling into a pre-trained policy, reducing delays from per-round Bayesian inference. This addresses a practical challenge in circuit design where both noise and parameter variability hinder optimization. The RL-based amortized strategy offers a distinct alternative to existing methods, and successful simulator-to-reality transfer would constitute a useful contribution to automated design under uncertainty.

major comments (2)

[Abstract] The abstract states that the method 'efficiently handles both molecular noise and cross-laboratory variability' and 'enabling immediate, observation-based adaptation,' yet provides no quantitative performance metrics, baselines, error analysis, or details on adaptation success rates. This absence limits evaluation of whether the in-silico demonstrations support the central claims about practical advantages over iterative Bayesian methods.
[Demonstration] The framework is demonstrated only on in-silico trajectories of the heterologous expression and repressilator models. No physical laboratory experiments, model-mismatch ablations, or out-of-distribution parameter tests are reported, leaving unverified the key assumption that a policy trained on simulator ensembles will generalize to produce effective adaptations under real laboratory conditions without retraining or inference.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments. We respond point by point to the major comments below, noting planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract] The abstract states that the method 'efficiently handles both molecular noise and cross-laboratory variability' and 'enabling immediate, observation-based adaptation,' yet provides no quantitative performance metrics, baselines, error analysis, or details on adaptation success rates. This absence limits evaluation of whether the in-silico demonstrations support the central claims about practical advantages over iterative Bayesian methods.

Authors: We agree that the abstract would be strengthened by including quantitative metrics. In the revised version we will add concise results from the in-silico experiments, such as average optimization success rates under different noise levels, comparison to non-amortized baselines, and summary error statistics, to better substantiate the claims. revision: yes
Referee: [Demonstration] The framework is demonstrated only on in-silico trajectories of the heterologous expression and repressilator models. No physical laboratory experiments, model-mismatch ablations, or out-of-distribution parameter tests are reported, leaving unverified the key assumption that a policy trained on simulator ensembles will generalize to produce effective adaptations under real laboratory conditions without retraining or inference.

Authors: We acknowledge that all reported results are in-silico. This choice permits controlled evaluation with known ground-truth parameters and stochasticity, allowing direct measurement of adaptation performance. We agree that physical experiments would provide stronger evidence of real-world generalization. As the contribution is methodological, we focus on simulator-based validation; wet-lab work lies outside the present scope. In revision we will add a discussion of model-mismatch and out-of-distribution simulator tests together with an outline of sim-to-real considerations. revision: partial

standing simulated objections not resolved

Physical laboratory experiments to verify generalization under actual experimental conditions

Circularity Check

0 steps flagged

No circularity in the amortized RL policy training for sequential genetic circuit design

full rationale

The paper describes a standard reinforcement learning setup in which a policy is trained offline on an ensemble of simulator trajectories (DE or MJP models) drawn from a prior over uncertain parameters; the trained policy is then deployed to map observations directly to experiment suggestions. No equation or claim reduces a derived quantity to a fitted input by construction, no load-bearing premise rests on a self-citation whose content is itself unverified, and the central amortization step is an independent computational procedure whose correctness is evaluated on held-out simulation trajectories rather than on quantities defined from the policy outputs themselves. The method is therefore self-contained against external simulation benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are detailed. The framework assumes standard RL training and simulation models are sufficient without introducing new postulated entities.

pith-pipeline@v0.9.0 · 5469 in / 1218 out tokens · 37290 ms · 2026-05-08T12:30:00.412067+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 10 canonical work pages · 2 internal anchors

[1]

Joshua Achiam. 2018. Spinning Up in Deep Reinforcement Learning. (2018)

2018
[2]

Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B Shah. 2017. Julia: A fresh approach to numerical computing.SIAM Review59, 1 (2017), 65–98. doi:10.1137/141000671

work page doi:10.1137/141000671 2017
[3]

Tom Blau, Edwin V Bonilla, Iadine Chades, and Amir Dezfouli. 2022. Optimizing sequential experimental design with deep reinforcement learning. InInternational conference on machine learning. PMLR, 2107–2128

2022
[4]

Jennifer AN Brophy and Christopher A Voigt. 2014. Principles of genetic circuit design.Nature methods11, 5 (2014), 508–520

2014
[5]

Madhukar S Dasika and Costas D Maranas. 2008. OptCircuit: an optimization based method for computational design of genetic circuits.BMC systems biology2, 1 (2008), 1–19

2008
[6]

Michael B Elowitz and Stanislas Leibler. 2000. A synthetic oscillatory network of transcriptional regulators.Nature403, 6767 (2000), 335–338

2000
[7]

Michael B Elowitz, Arnold J Levine, Eric D Siggia, and Peter S Swain. 2002. Stochastic gene expression in a single cell.Science297, 5584 (2002), 1183–1186

2002
[8]

Maurice Filo, Nicolò Rossi, Zhou Fang, and Mustafa Khammash. 2026. GenAI-Net: A Generative AI Framework for Automated Biomolecular Network Design.arXiv preprint arXiv:2601.17582(2026)

work page arXiv 2026
[9]

Adam Foster, Desi R Ivanova, and Ilyas Malik. [n. d.]. Deep Adaptive Design: Amortizing Bayesian Experimental Design.Variational, Monte Carlo and Policy-Based Approaches to Bayesian Experimental Design([n. d.]), 139
[10]

2023.Bayesian optimization

Roman Garnett. 2023.Bayesian optimization. Cambridge University Press

2023
[11]

Adam R Gerlach, Andrew Leonard, Jonathan Rogers, and Chris Rackauckas. 2020. The Koopman expectation: An operator theoretic method for efficient analysis and optimization of uncertain hybrid dynamical systems.arXiv preprint arXiv:2008.08737(2020)

work page arXiv 2020
[12]

Leonardo Giannantoni, Alessandro Savino, and Stefano Di Carlo. 2023. Optimization of synthetic oscillatory biological networks through Reinforcement Learning. In2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2624–2631

2023
[13]

Daniel T Gillespie. 1977. Exact stochastic simulation of coupled chemical reactions.The journal of physical chemistry81, 25 (1977), 2340–2361

1977
[14]

Tom W Hiscock. 2019. Adapting machine-learning algorithms to design gene circuits.BMC bioinformatics20 (2019), 1–13

2019
[15]

J. D. Hunter. 2007. Matplotlib: A 2D graphics environment.Computing in Science & Engineering9, 3 (2007), 90–95. doi:10.1109/MCSE.2007.55

work page doi:10.1109/mcse.2007.55 2007
[16]

Desi R Ivanova, Adam Foster, Steven Kleinegesse, Michael U Gutmann, and Thomas Rainforth. 2021. Implicit deep adaptive design: Policy-based experimental design without likelihoods.Advances in neural information processing systems34 (2021), 25785–25798

2021
[17]

Desi R Ivanova, Marcel Hedman, Cong Guan, and Tom Rainforth. 2024. Step-DAD: Semi-amortized policy-based bayesian experimental design. In ICLR 2024 Workshop on Data-centric Machine Learning Research (DMLR), Vol. 2. 21. Manuscript submitted to ACM 16 Michal Kobiela, Diego A. Oyarzún, and Michael U. Gutmann

2024
[18]

Tian Jiang, Veronica A Montgomery, Karuna Jetty, Vijaydev Ganesan, Matthew R Incha, John M Gladden, Nathan J Hillson, and Di Liu. 2025. Metabolic engineering and synthetic biology for the environment: from perspectives of biodetection, bioremediation, and biomanufacturing. Biotechnology for the Environment2, 1 (2025), 14

2025
[19]

Linda M Keefer, Marie-Agnès Piron, and Pierre De Meyts. 1981. Human insulin prepared by recombinant DNA techniques and native human insulin interact identically with insulin receptors.Proceedings of the National Academy of Sciences78, 3 (1981), 1391–1395

1981
[20]

Michal Kobiela, Diego A Oyarzún, and Michael U Gutmann. 2026. Risk-averse optimization of genetic circuits under uncertainty.Cell Systems17, 1 (2026)

2026
[21]

Sang Yup Lee, Hyun Uk Kim, Tong Un Chae, Jae Sung Cho, Je Woong Kim, Jae Ho Shin, Dong In Kim, Yoo-Sung Ko, Woo Dae Jang, and Yu-Sin Jang
[22]

2019), 18–33

A comprehensive metabolic map for production of bio-based chemicals.Nature Catalysis2, 1 (Jan. 2019), 18–33. doi:10.1038/s41929-018-0212-4

work page doi:10.1038/s41929-018-0212-4 2019
[23]

Zhengda Li, Shixuan Liu, and Qiong Yang. 2017. Incoherent inputs enhance the robustness of biological oscillators.Cell systems5, 1 (2017), 72–81

2017
[24]

Loman, Yingbo Ma, Vasily Ilin, Shashi Gowda, Niklas Korsbo, Nikhil Yewale, Chris Rackauckas, and Samuel A

Torkel E. Loman, Yingbo Ma, Vasily Ilin, Shashi Gowda, Niklas Korsbo, Nikhil Yewale, Chris Rackauckas, and Samuel A. Isaacson. 2023. Catalyst: Fast and flexible modeling of reaction networks.PLOS Computational Biology19, 10 (10 2023), 1–19. doi:10.1371/journal.pcbi.1011530

work page doi:10.1371/journal.pcbi.1011530 2023
[25]

Wenzhe Ma, Ala Trusina, Hana El-Samad, Wendell A Lim, and Chao Tang. 2009. Defining network topologies that can achieve biochemical adaptation.Cell138, 4 (2009), 760–773

2009
[26]

Charlotte Merzbacher, Oisin Mac Aodha, and Diego A Oyarzún. 2023. Bayesian Optimization for Design of Multiscale Biological Circuits.ACS Synthetic Biology(2023)

2023
[27]

Evangelos-Marios Nikolados, Andrea Y Weiße, and Diego A Oyarzún. 2021. Prediction of Cellular Burden with Host–Circuit Models. InSynthetic Gene Circuits. Springer, 267–291

2021
[28]

Irene Otero-Muras and Julio R Banga. 2017. Automated design framework for synthetic biology exploiting pareto optimality.ACS Synthetic Biology 6, 7 (2017), 1180–1193

2017
[29]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems32 (2019)

2019
[30]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python.Journal of Machine Learning Research12 (2011), 2825–2830

2011
[31]

Lingxia Qiao, Wei Zhao, Chao Tang, Qing Nie, and Lei Zhang. 2019. Network topologies that can achieve dual function of adaptation and noise attenuation.Cell systems9, 3 (2019), 271–285

2019
[32]

Christopher Rackauckas and Qing Nie. 2017. Differentialequations.jl–a performant and feature-rich ecosystem for solving differential equations in julia.Journal of Open Research Software5, 1 (2017), 15

2017
[33]

Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. 2021. Stable-Baselines3: Reliable Reinforce- ment Learning Implementations.Journal of Machine Learning Research22, 268 (2021), 1–8. http://jmlr.org/papers/v22/20-1364.html

2021
[34]

Tobias Schladt, Nicolai Engelmann, Erik Kubaczka, Christian Hochberger, and Heinz Koeppl. 2021. Automated Design of Robust Genetic Circuits: Structural Variants and Parameter Uncertainty.ACS Synthetic Biology10, 12 (Dec. 2021), 3316–3329. doi:10.1021/acssynbio.1c00193

work page doi:10.1021/acssynbio.1c00193 2021
[35]

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2015. High-dimensional continuous control using generalized advantage estimation.arXiv preprint arXiv:1506.02438(2015)

work page internal anchor Pith review arXiv 2015
[36]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347(2017)

work page internal anchor Pith review arXiv 2017
[37]

Carlos Sequeiros, Carlos Vázquez, Julio R Banga, and Irene Otero-Muras. 2023. Automated Design of Synthetic Gene Circuits in the Presence of Molecular Noise.ACS Synthetic Biology12, 10 (2023), 2865–2876

2023
[38]

David Silver and Joel Veness. 2010. Monte-Carlo Planning in Large POMDPs. InAdvances in Neural Information Processing Systems, J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta (Eds.), Vol. 23. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2010/ file/edfbe1afcf9246bb0d40eb4d8027d90f-Paper.pdf

2010
[39]

Neythen J Treloar, Nathan Braniff, Brian Ingalls, and Chris P Barnes. 2022. Deep reinforcement learning for optimal experimental design in biology. PLOS Computational Biology18, 11 (2022), e1010695

2022
[40]

Neythen J Treloar, Alex JH Fedorec, Brian Ingalls, and Chris P Barnes. 2020. Deep reinforcement learning for the control of microbial co-cultures in bioreactors.PLoS computational biology16, 4 (2020), e1007783

2020
[41]

Jean-Yves Trosset and Pablo Carbonell. 2015. Synthetic biology for pharmaceutical drug discovery.Drug design, development and therapy(2015), 6285–6302

2015
[42]

Babita K Verma, Ahmad A Mannan, Fuzhong Zhang, and Diego A Oyarzún. 2021. Trade-offs in biosensor optimization for dynamic pathway engineering.ACS synthetic biology11, 1 (2021), 228–240

2021
[43]

Michael L. Waskom. 2021. seaborn: statistical data visualization.Journal of Open Source Software6, 60 (2021), 3021. doi:10.21105/joss.03021

work page doi:10.21105/joss.03021 2021
[44]

Andrea Y Weiße, Diego A Oyarzún, Vincent Danos, and Peter S Swain. 2015. Mechanistic links between cellular trade-offs, gene expression, and growth.Proceedings of the National Academy of Sciences112, 9 (2015), E1038–E1047

2015
[45]

Mae L Woods, Miriam Leon, Ruben Perez-Carrasco, and Chris P Barnes. 2016. A statistical approach reveals designs for the most robust stochastic gene oscillators.ACS synthetic biology5, 6 (2016), 459–470. Manuscript submitted to ACM Sequential Design of Genetic Circuits Under Uncertainty With Reinforcement Learning 17 A Methods details A.1 Formal POMDP bac...

2016
[46]

effective observation

Training runs for millions of steps, corresponding to thousands of simulated design–experiment iterations, allowing the policy to progressively infer hidden 𝜃 𝑗 from history ℎ𝑡 and adapt its design strategy (Fig. 3D). Stable Baselines is designed for policies that are conditioned on a single observation. In our setting, however, the policy needs access to...