pith. machine review for the scientific record. sign in

arxiv: 2604.19718 · v1 · submitted 2026-04-21 · 🧬 q-bio.QM

Recognition: unknown

Direct RNA sequence design under codon constraints using expressive tensor-based secondary structure models

Christina Wuyan Wang, Mark Fornace, Michael Lindsey

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:24 UTC · model grok-4.3

classification 🧬 q-bio.QM
keywords RNA sequence designcodon optimizationsecondary structureBoltzmann samplingtensor decompositionsynthetic biologymRNA design
0
0 comments X

The pith

An algorithm directly samples codon sequences for a target protein from a Boltzmann distribution based on detailed RNA secondary structure energies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a direct method for designing RNA sequences that encode a specific protein while accounting for the full thermodynamics of secondary structure formation. Instead of optimizing simple proxies for codon usage and folding, it samples sequences from a probability distribution defined by the codon constraints and a complete free energy model. This approach matters for applications like mRNA vaccine design because it allows navigation of the huge space of possible codon choices in a principled way that respects accurate folding predictions. The method also provides exact ways to calculate quantities like the overall free energy and probabilities of specific base pairs under the model.

Core claim

We demonstrate a direct and efficient algorithm to sample sequences from a suitable Boltzmann distribution defined in terms of the codon sequence and a fully detailed secondary structure free energy model, as well as related algorithms for exact computation of statistical quantities such as free energies, base pairing probabilities, and base and codon marginals. These draw upon a tensor-based formulation of secondary structure thermodynamics and show that global sequence design can be accomplished with respect to a highly accurate free energy model while leveraging CPU and GPU resources in parallel.

What carries the argument

The tensor-based formulation of secondary structure thermodynamics, extended to codon-constrained sequences to allow efficient sampling and exact marginal computations.

If this is right

  • Sequence design can now use the full accuracy of secondary structure free energy models instead of simplified objectives.
  • Exact values for free energies, base-pairing probabilities, and codon marginals become computable for realistic design problems.
  • Parallel hardware can be used to achieve large speedups in the design process.
  • Applications such as vaccine and therapeutic mRNA design gain a more principled way to optimize sequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be combined with experimental feedback to iteratively improve sequence designs for cellular stability.
  • It may extend to other constraints like translation speed or immune evasion in mRNA constructs.
  • Scalability tests on longer proteins would reveal if the tensor approach remains practical for genome-scale applications.

Load-bearing premise

The tensor-based secondary structure model remains accurate and tractable when applied to the space of sequences that only use valid codons for each amino acid.

What would settle it

If the base-pairing probabilities computed by the algorithm for a test codon sequence deviate significantly from those measured by structure-probing experiments on the corresponding RNA.

Figures

Figures reproduced from arXiv: 2604.19718 by Christina Wuyan Wang, Mark Fornace, Michael Lindsey.

Figure 1
Figure 1. Figure 1: a. An example mRNA strand colored by codon (word of three consecutive bases), with an arrowhead marking the 3′ end. The first and last codons (black) are the start and stop codons, while each interior codon codes for a specific amino acid. b. Base pair diagram for an example secondary structure with the same sequence as (a), in which the strand is laid out horizontally from 5′ to 3′ and base pairs appear a… view at source ↗
Figure 2
Figure 2. Figure 2: a. Conceptual idea of partition function dynamic programs, in which an additional 3′ -most base j is added and incorporated. That base may remain unpaired (orange), or it may close a base pair i · j originating from any earlier base i < j (blue). Considering all possibilities incurs an O(n) summation for the recursion. Recursing through all O(n 2 ) subsequences gives an overall O(n 3 ) cost. b. Block matri… view at source ↗
Figure 3
Figure 3. Figure 3: Tensor train formulation of codon constraints. a. Without any consideration of secondary structure, hard and soft sequence constraints can be implemented using a tensor train. b. Depiction of an individual tensor core Ti in the codon pair formulation. The tensor is indexed by the base ϕi, codon index σi of the base ϕi−1, and codon index σi+1 of the base ϕi+1, yielding a nonzero element only if the combinat… view at source ↗
Figure 4
Figure 4. Figure 4: Combination of structure and sequence tensor diagrams. a. Combination of diagrams for an unpaired secondary structure. In this case, the structure free energy is simply a matrix product (left, green). Combining it with an RNA sequence tensor train (left, purple) yields a compound diagram (middle). Then pre-contraction over vertical edges yields another matrix product of expanded dimension (right). b. Combi… view at source ↗
Figure 5
Figure 5. Figure 5: Schematics for the folding functions χ and χˆ. Depic￾tion of the χ operation and equivalent tensor network contractions for unpaired base (top) and base pair (bottom) incorporation into dynamic programming recursions, in multiple settings of increasing complexity. a. Depiction of the considered operations, in which a 3 ′ -most unpaired or paired base is considered. b. Existing tensor network formulation fo… view at source ↗
Figure 6
Figure 6. Figure 6: Results for example systems listed in Section 5a (1-4) using different RNA sequence generation methods. ϕ ∼ U(ψ) was generated by sampling 2500 codon sequences independently (uniformly subject to codon constraints, with no consideration of secondary structure). ϕ ∼ D(ψ) was generated by sampling 2500 codon sequences proportional to e −qψ(ϕ) , while ϕmin is the optimum yielded by solving (2.8). D0 (ψ) and ϕ… view at source ↗
Figure 7
Figure 7. Figure 7: Algorithm performance and observed complexities in computing qψ as a function of sequence length (in nucleotides (nt) on bottom and in amino acids (aa) on top), illustrating the effects of GPU acceleration and parallelization. a. Wall clock time to compute qψ. Inset: same, with linear axes. b. Cost of computing qψ divided by the cost of computing qψ(ϕ) for an equal length sequence, showing that design is w… view at source ↗
Figure 8
Figure 8. Figure 8: Demonstration of codon frequency matching via fixed point iteration with Anderson acceleration (with no history size limit). Single codon frequencies were fit for each of benchmark systems 1-4 (Section S5a), with the target frequencies matching those used to construct CAI bonuses. Calculations were run in single precision using up to two GPUs. a. Maximum absolute logarithmic error (∥log(b/b∗)∥∞) as a funct… view at source ↗
Figure 9
Figure 9. Figure 9: Results for example systems listed in Section 5a (1-4) with CAI soft constraints applied. a: complex free energy qψ(ϕ) (unitless; multiply by 0.62 kcal/mol for results at 37°C). b: equilibrium fraction of unpaired bases f(ϕ). Each plot depicts the different probability density distributions which can be sampled from using our provided algorithms. ϕ ∼ U(ψ) was generated by sampling 2500 codon sequences inde… view at source ↗
Figure 10
Figure 10. Figure 10: Results for example systems listed in Section 5a (1-4) with CAI and CPB soft constraints applied. a: complex free energy qψ(ϕ) (unitless; multiply by 0.62 kcal/mol for results at 37°C). b: equilibrium fraction of unpaired bases f(ϕ). Each plot depicts the different probability density distributions which can be sampled from using our provided algorithms. ϕ ∼ U(ψ) was generated by sampling 2500 codon seque… view at source ↗
Figure 11
Figure 11. Figure 11: Marginal codon probabilities for System 1 in Section 5a. p ψ j (c) is depicted for each possible codon c (in arbitrary vertical order) and amino acid position j. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Equilibrium unpaired probabilities P ψ i,i for System 1 in Section 5a. (y-axes are normalized to span [0, 1].) 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500 520 540 560 580 600 620 640 0.0 0.5 1.0 [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Marginal codon probabilities for System 2 in Section 5a. p ψ j (c) is depicted for each possible codon c (in arbitrary vertical order) and amino acid position j. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Equilibrium unpaired probabilities P ψ i,i for System 2 in Section 5a. (y-axes are normalized to span [0, 1].) 27 [PITH_FULL_IMAGE:figures/full_fig_p027_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Marginal codon probabilities for System 3 in Section 5a. p ψ j (c) is depicted for each possible codon c (in arbitrary vertical order) and amino acid position j. Amino acid Codon Frequency A GCU 0.238 GCC 0.229 GCA 0.091 GCG 0.443 R CGU 0.145 CGC 0.208 CGA 0.087 CGG 0.347 AGA 0.028 AGG 0.184 N AAU 0.437 AAC 0.563 D GAU 0.450 GAC 0.550 C UGU 0.344 UGC 0.656 Q CAA 0.179 CAG 0.821 E GAA 0.149 GAG 0.851 G GGU… view at source ↗
Figure 16
Figure 16. Figure 16: Equilibrium unpaired probabilities P ψ i,i for System 3 in Section 5a. (y-axes are normalized to span [0, 1].) 29 [PITH_FULL_IMAGE:figures/full_fig_p029_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Marginal codon probabilities for System 4 in Section 5a. p ψ j (c) is depicted for each possible codon c (in arbitrary vertical order) and amino acid position j. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Equilibrium unpaired probabilities P ψ i,i for System 4 in Section 5a. (y-axes are normalized to span [0, 1].) 32 [PITH_FULL_IMAGE:figures/full_fig_p032_18.png] view at source ↗
read the original abstract

Nucleic acid sequence design via codon optimization is a fundamental task with applications across synthetic biology, mRNA therapeutics, and vaccine design. Given a target protein, it is a major open challenge to navigate the combinatorially large design space of codon sequences mapping to its amino acid sequence. Computational approaches generally seek to optimize simple objectives based on the codon sequence, possibly together with more complicated contributions based on secondary structure analysis. In this work, we demonstrate a direct and efficient algorithm to sample sequences from a suitable Boltzmann distribution defined in terms of the codon sequence and a fully detailed secondary structure free energy model, as well as related algorithms for exact computation of statistical quantities such as free energies, base pairing probabilities, and base and codon marginals. These algorithms draw upon a recently developed tensor-based formulation of secondary structure thermodynamics and demonstrate, for the first time, that global sequence design can be accomplished with respect to a highly accurate free energy model. Moreover, the algorithms can leverage any available CPU and GPU resources in parallel for massive computational speedups.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents algorithms for direct RNA sequence design under codon constraints, using a tensor-based formulation of secondary structure thermodynamics. It claims to enable exact sampling of codon sequences from a Boltzmann distribution defined over a target amino acid sequence and a detailed free energy model, along with exact computation of partition functions, base-pairing probabilities, and base/codon marginals, with parallelization over CPU/GPU resources.

Significance. If the algorithmic extension is shown to preserve exactness and polynomial scaling, the work would advance codon optimization in synthetic biology and mRNA design by incorporating global secondary structure energetics directly, rather than relying on simplified or local objectives. The tensor approach and parallel scaling are noted strengths if substantiated.

major comments (2)
  1. [Abstract] Abstract: the claim that the tensor formulation extends to codon-constrained sequence spaces while remaining 'direct and efficient' and 'exact' is not supported by any recurrence, state-space analysis, or complexity bound. The per-position restriction to synonymous codons changes the effective alphabet and may require additional auxiliary states or tensor ranks; without explicit demonstration this remains a load-bearing gap for the central claim.
  2. No section provides small-instance verification, benchmark timings, or comparison against existing codon-optimization methods (e.g., those using simpler secondary-structure heuristics). The absence of any numerical results or implementation details prevents assessment of whether the claimed polynomial scaling and exact marginals are realized in practice.
minor comments (1)
  1. [Abstract] The abstract refers to 'expressive tensor-based secondary structure models' without clarifying what additional expressivity is gained over prior tensor formulations or how it interacts with the codon constraints.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the constructive major comments. We address each point below and describe the revisions we will incorporate to address the concerns.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the tensor formulation extends to codon-constrained sequence spaces while remaining 'direct and efficient' and 'exact' is not supported by any recurrence, state-space analysis, or complexity bound. The per-position restriction to synonymous codons changes the effective alphabet and may require additional auxiliary states or tensor ranks; without explicit demonstration this remains a load-bearing gap for the central claim.

    Authors: We agree that the abstract is too concise to convey the technical details of the extension. The full manuscript presents the tensor-based dynamic programming recursions in the Methods section, where the standard RNA folding recursions are adapted by restricting the nucleotide choices at each codon position to the synonymous codons corresponding to the fixed amino acid. Because the number of synonymous codons is bounded by a small constant (at most six), the state space and tensor ranks are unchanged from the unconstrained case, preserving both exactness and the O(n^3) asymptotic complexity. In the revision we will add an explicit subsection that states the modified recurrence relations, provides the state-space analysis, and derives the complexity bound, thereby substantiating the claims made in the abstract. revision: yes

  2. Referee: [—] No section provides small-instance verification, benchmark timings, or comparison against existing codon-optimization methods (e.g., those using simpler secondary-structure heuristics). The absence of any numerical results or implementation details prevents assessment of whether the claimed polynomial scaling and exact marginals are realized in practice.

    Authors: The present manuscript is primarily algorithmic and focuses on establishing exactness and polynomial scaling through the tensor formulation. We acknowledge that the lack of empirical results makes it difficult to evaluate practical performance. In the revised version we will add a Results section containing (i) verification on small instances by comparing partition functions and marginals against exhaustive enumeration, (ii) wall-clock timings on CPU and GPU for sequences of increasing length to illustrate parallel scaling, and (iii) a comparison against a baseline codon-optimization method that uses a simpler minimum-free-energy heuristic. Pseudocode and implementation notes will also be included in the supplement. revision: yes

Circularity Check

0 steps flagged

No circularity in claimed derivation; new algorithm extends prior tensor model independently

full rationale

The paper's central contribution is the demonstration of a direct algorithm for sampling codon sequences from a Boltzmann distribution over secondary structure free energies, together with exact marginal and probability computations. This is explicitly positioned as an extension of a recently developed tensor-based dynamic programming formulation to the codon-constrained alphabet. The tensor model is treated as an external prior (cited as 'recently developed'), and the new work consists of algorithmic modifications to enforce per-position codon restrictions while preserving exactness and polynomial scaling. No equation or claim reduces the output distribution, free energies, or marginals to a redefinition or statistical fit of the inputs; the derivation chain is the construction of the constrained DP recurrences themselves. The abstract and described claims contain no self-definitional loops, fitted-input predictions, or load-bearing self-citations that collapse the result to prior outputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the validity of the tensor-based secondary structure model when applied to codon sequences and on the appropriateness of Boltzmann sampling for design objectives.

axioms (2)
  • domain assumption The recently developed tensor formulation correctly computes the partition function and derived quantities for secondary structure thermodynamics.
    Invoked to justify efficiency and accuracy of the sampling and marginal computations.
  • domain assumption Boltzmann distribution over codon sequences and structures is the right objective for sequence design.
    Standard statistical-mechanics framing but treated as given for the design task.

pith-pipeline@v0.9.0 · 5480 in / 1382 out tokens · 93181 ms · 2026-05-10T00:24:11.367906+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. GoForth: Language Models for RNA Design under Structure, Sequence, and Coding Constraints

    q-bio.QM 2026-05 unverdicted novelty 7.0

    GoForth is a forward-trained encoder-decoder RNA language model that generates sequences under mixed constraints on fold, sequence, and coding by separating sequence prior, forward folding sampler, and reward oracle.

Reference graph

Works this paper leans on

90 extracted references · 21 canonical work pages · cited by 1 Pith paper

  1. [1]

    Alexaki, J

    A. Alexaki, J. Kames et al. Codon and codon-pair usage tables (CoCoPUTs): Facilitating genetic varia- tion analyses and recombinant gene design.J. Mol. Biol., 431:2434–2441, 2019

  2. [2]

    Athey, A

    J. Athey, A. Alexaki et al. A new and updated resource for codon usage tables.BMC Bioinformatics, 18:391, 2017

  3. [3]

    B. K.-S. Chung and D.-Y. Lee. Computational codon optimization of synthetic gene for protein expression. BMC Syst. Biol., 6:134, 2012

  4. [4]

    Cohen and S

    B. Cohen and S. Skiena. Natural selection and algorithmic design of mRNA.Journal of Com- putational Biology, 10(3–4):419–432, 2003. doi: 10.1089/10665270360688101

  5. [5]

    J. R. Coleman, D. Papamichail et al. Virus atten- uation by genome-scale changes in codon pair bias. Science, 320:1784–1787, 2008

  6. [6]

    Condon and C

    A. Condon and C. Thachuk. Efficient codon op- timization with motif engineering.J. of Discrete Algorithms, 16:104–112, Oct. 2012. ISSN 1570-

  7. [7]

    URL https: //doi.org/10.1016/j.jda.2012.04.017

    doi: 10.1016/j.jda.2012.04.017. URL https: //doi.org/10.1016/j.jda.2012.04.017

  8. [8]

    D. A. Constant, J. M. Gutierrez et al. Deep learning- based codon optimization with large-scale synony- mous variant datasets enables generalized tunable protein expression.bioRxiv, 2023

  9. [9]

    N. Dai, T. Zhou et al. EnsembleDesign: mes- senger RNA design minimizing ensemble free en- ergy via probabilistic lattice parsing.Bioinfor- matics, 41:i391–i400, 07 2025. ISSN 1367-4811. doi: 10.1093/bioinformatics/btaf245. URL https: //doi.org/10.1093/bioinformatics/btaf245

  10. [10]

    E. A. Demissie, S.-Y. Park et al. Comparative analysis of codon optimization tools: Advancing to- ward a multi-criteria framework for synthetic gene design.Journal of Microbiology and Biotechnol- ogy, 35:e2411066, Apr 2025. ISSN 1738-8872. doi: 10.4014/jmb.2411.11066. URL https://doi.org/10. 4014/jmb.2411.11066

  11. [11]

    Ding and C

    Y. Ding and C. E. Lawrence. A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res., 31:7280–7301, 2003

  12. [12]

    R. M. Dirks, J. S. Bois et al. Thermodynamic analysis of interacting nucleic acid strands.SIAM Rev., 49: 65–88, 2007

  13. [13]

    Elazar, S

    A. Elazar, S. M. D A, and M. Madan Babu. Interro- gating nucleotide sequences with AI to understand codon usage patterns.Proc. Natl. Acad. Sci. U. S. A., 122:e2426326122, 2025

  14. [14]

    Faizi, H

    M. Faizi, H. Sakharova, and L. F. Lareau. A genera- tive language model decodes contextual constraints on codon choice for mRNA design.bioRxiv, page 2025.05.13.653614, June 2025. doi: 10.1101/2025.05. 13.653614. URL https://doi.org/10.1101/2025.05.13. 653614. Preprint

  15. [15]

    Fallahpour, V

    A. Fallahpour, V. Gureghian et al. Codontransformer: a multispecies codon optimizer using context-aware neural networks.Nature Communications, 16(1): 3205, Apr 2025. ISSN 2041-1723. doi: 10.1038/ s41467-025-58588-7. URL https://doi.org/10.1038/ s41467-025-58588-7

  16. [16]

    Fornace and N

    M. Fornace and N. A. Pierce. A new class of tensor- based models for nucleic acid secondary structure thermodynamics.in prep, 1, 2026

  17. [17]

    M. E. Fornace, N. J. Porubsky, and N. A. Pierce. A unified dynamic programming framework for the analysis of interacting nucleic acid strands: Enhanced models, scalability, and speed.ACS Synth. Biol., 9: 2665–2678, 2020. PMID: 32910644

  18. [18]

    M. E. Fornace, J. Huang et al. NUPACK: Analysis and design of nucleic acid structures, devices, and systems.ChemRxiv, 2022

  19. [19]

    X. Gu, Y. Qi, and M. El-Kebir. DERNA enables Pareto optimal RNA design.Journal of Compu- tational Biology, 31(3):179–196, mar 2024. ISSN 1557-8666. doi: 10.1089/cmb.2023.0283

  20. [20]

    Gustafsson, S

    C. Gustafsson, S. Govindarajan, and J. Minshull. Codon bias and heterologous protein expression. Trends Biotechnol., 22:346–353, 2004

  21. [21]

    G. A. Gutman and G. W. Hatfield. Nonrandom utilization of codon pairs in escherichia coli.Pro- ceedings of the National Academy of Sciences of the United States of America, 86(10):3699–3703, May 9

  22. [22]

    doi: 10.1073/pnas.86.10.3699

    ISSN 0027-8424. doi: 10.1073/pnas.86.10.3699. URL https://research.ebsco.com/linkprocessor/plink? id=79060f14-94bd-3676-99a9-14d7f47cea52

  23. [23]

    N. K. Hegelmeyer, M. L. Previti et al. Gene recoding by synonymous mutations creates promiscuous intra- genic transcription initiation in mycobacteria.Mbio, 14, Mar. 2023

  24. [24]

    L. Jin, Y. Zhou et al. mRNA vaccine sequence and structure design and optimization: Advances and challenges.Journal of Biological Chemistry, 301 (1), Jan 2025. ISSN 0021-9258. doi: 10.1016/j.jbc. 2024.108015. URL https://doi.org/10.1016/j.jbc.2024. 108015

  25. [25]

    pAN001-SY172 Wt- ATG, 2025

    Joint BioEnergy Institute. pAN001-SY172 Wt- ATG, 2025. URL https://public-registry.jbei.org/ entry/21267. Accessed 15 Dec 2025 at https:// public-registry.jbei.org/entry/21267

  26. [26]

    Kardar.Statistical physics of particles

    M. Kardar.Statistical physics of particles. Cambridge University Press, 2007

  27. [27]

    S. C. Kim, S. S. Sekhon et al. Modifications of mRNA vaccine structural elements for improving mRNA stability and translation efficiency.Mol. Cell. Toxicol., 18:1–8, 2022

  28. [28]

    Y.-A. Kim, K. Mousavi et al. Computational design of mRNA vaccines.Vaccine, 42:1831–1840, 2024

  29. [29]

    Kloczkowski, T

    A. Kloczkowski, T. Z. Sen, and R. L. Jerni- gan. The transfer matrix method for lattice proteins–an application with cooperative interac- tions.Polymer, 45(2):707–716, 2004. ISSN 0032-3861. doi: https://doi.org/10.1016/j.polymer.2003.10.072. URL https://www.sciencedirect.com/science/article/ pii/S0032386103010085. Conformational Protein Con- formations

  30. [30]

    M. S. D. Kormann, G. Hasenpusch et al. Expression of therapeutic proteins after delivery of chemically modified mRNA in mice.Nat. Biotechnol., 29:154– 157, 2011

  31. [31]

    M. J. Lajoie, A. J. Rovner et al. Genomically recoded organisms expand biological functions.Science, 342: 357–360, 2013

  32. [32]

    R. A. Larocca, P. Abbink et al. Vaccine protection against Zika virus from Brazil.Nature, 536:474–478, 2016

  33. [33]

    Li, H.-S

    J. Li, H.-S. Lai et al. Arcade: Controllable codon design from foundation models via activation engi- neering.bioRxiv, Nov. 2025. doi: 10.1101/2025.08. 19.668819. URL https://doi.org/10.1101/2025.08.19. 668819. Preprint

  34. [34]

    Y. Li, F. Wang et al. Deep generative optimiza- tion of mRNA codon sequences for enhanced mRNA translation and therapeutic efficacy.Nature Com- munications, 16(1):9957, Nov 2025. ISSN 2041-

  35. [35]

    URL https: //doi.org/10.1038/s41467-025-64894-x

    doi: 10.1038/s41467-025-64894-x. URL https: //doi.org/10.1038/s41467-025-64894-x

  36. [36]

    J.-G. Liu, L. Wang, and P. Zhang. Tropical tensor network for ground states of spin glasses.Phys. Rev. Lett., 126:090506, 2021

  37. [37]

    P. Mali, L. Yang et al. RNA-guided human genome engineering via Cas9.Science, 339:823–826, 2013

  38. [38]

    Y. Mao, H. Liu et al. Deciphering the rules by which dynamics of mRNA secondary structure affect translation efficiency in saccharomyces cerevisiae. Nucleic Acids Res., 42:4813–4822, 2014

  39. [39]

    Mart´ ınez-Flores, J

    D. Mart´ ınez-Flores, J. Zepeda-Cervantes et al. SARS- CoV-2 Vaccines Based on the Spike Glycoprotein and Implications of New Viral Variants.Frontiers in Immunology, 12:701501, jul 2021. ISSN 1664-3224. doi: 10.3389/fimmu.2021.701501

  40. [40]

    D. H. Mathews, J. Sabina et al. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure.J. Mol. Biol., 288:911–940, 1999

  41. [41]

    D. M. Mauger, B. J. Cabral et al. mRNA structure regulates protein expression through changes in func- tional half-life.Proc. Natl. Acad. Sci. U. S. A., 116: 24075–24083, 2019

  42. [42]

    J. S. McCaskill. The equilibrium partition function and base pair binding probabilities for RNA sec- ondary structure.Biopolymers: Original Research on Biomolecules, 29:1105–1119, 1990

  43. [43]

    R. A. Mir, J. Lovelace et al. Biophysical character- ization and modeling of human ecdysoneless (ecd) protein supports a scaffolding function.AIMS Bio- physics, 3(1):195–208, mar 2016. ISSN 2377-9098. doi: 10.3934/biophy.2016.1.195

  44. [44]

    I. V. Oseledets. Tensor-train decomposition.SIAM Journal on Scientific Computing, 33(5):2295–2317, 2011

  45. [45]

    C. J. Paddon, P. J. Westfall et al. High-level semi-synthetic production of the potent antimalarial artemisinin.Nature, 496:528–532, 2013

  46. [46]

    J.-E. Pin. Tropical semirings.Idempotency (Bristol, 1994), pages 50–69, 1998

  47. [47]

    Z. Ren, L. Jiang et al. Codonbert: a bert-based architecture tailored for codon optimization us- ing the cross-attention mechanism.Bioinformat- ics, 40(7):btae330, 05 2024. ISSN 1367-4811. doi: 10.1093/bioinformatics/btae330. URL https://doi. org/10.1093/bioinformatics/btae330

  48. [48]

    Ringn´ er and M

    M. Ringn´ er and M. Krogh. Folding free energies of 5’-UTRs impact post-transcriptional regulation on a genomic scale in yeast.PLoS Comput. Biol., 1:e72, 2005

  49. [49]

    SantaLucia

    J. SantaLucia. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermo- dynamics.Proceedings of the National Academy of Sciences, 95:1460–1465, 1998. 10

  50. [50]

    Schmidt, N

    M. Schmidt, N. Lee et al. Maximizing heterologous expression of engineered type I polyketide synthases: Investigating codon optimization strategies.ACS Synth. Biol., 12:3366–3380, 2023

  51. [51]

    A. Sen, K. Kargar et al. Codon optimization: a mathematical programing approach.Bioinformatics, 36(13):4012–4020, 04 2020. ISSN 1367-4803. doi: 10.1093/bioinformatics/btaa248. URL https://doi. org/10.1093/bioinformatics/btaa248

  52. [52]

    M. J. Serra and D. H. Turner. Predicting thermody- namic properties of RNA.Methods Enzymol., 259: 242, 1995

  53. [53]

    P. M. Sharp and W. H. Li. The codon adaptation index–a measure of directional synonymous codon usage bias, and its potential applications.Nucleic Acids Res, 15(3):1281–1295, Feb. 1987

  54. [54]

    T. Sidi, S. Bahiri-Elitzur et al. Predicting gene sequences with AI to study codon usage patterns. Proceedings of the National Academy of Sciences of the United States of America, 122(1):e2410003121, jan

  55. [55]

    doi: 10.1073/pnas.2410003121

    ISSN 1091-6490. doi: 10.1073/pnas.2410003121

  56. [56]

    Terai, S

    G. Terai, S. Kamegai, and K. Asai. CDSfold: an algorithm for designing a protein-coding sequence with the most stable secondary structure.Bioin- formatics, 32(6):828–834, 11 2015. ISSN 1367-

  57. [57]

    URL https://doi.org/10.1093/bioinformatics/btv678

    doi: 10.1093/bioinformatics/btv678. URL https://doi.org/10.1093/bioinformatics/btv678

  58. [58]

    Thomas, G

    R. Thomas, G. Al-Khadairi et al. NY-ESO-1 based immunotherapy of cancer: Current perspectives.Fron- tiers in Immunology, 9:947, may 2018. ISSN 1664-3224. doi: 10.3389/fimmu.2018.00947

  59. [59]

    Tinoco, O

    I. Tinoco, O. C. Uhlenbeck, and M. D. Levine. Esti- mation of secondary structure in ribonucleic acids. Nature, 230:362–367, 1971

  60. [60]

    D. H. Turner and D. H. Mathews. NNDB: The nearest neighbor parameter database for predicting stability of nucleic acid secondary structure.Nucleic Acids Res., 38:D280–D282, 2010

  61. [61]

    O95905 ·ECD HUMAN, 2025

    UniProt Consortium. O95905 ·ECD HUMAN, 2025. URL https://www.uniprot.org/uniprotkb/O95905/ entry. Accessed 15 Dec 2025 at https://www.uniprot. org/uniprotkb/O95905/entry

  62. [62]

    P0DTC2 ·SPIKE SARS2, 2025

    UniProt Consortium. P0DTC2 ·SPIKE SARS2, 2025. URL https://www.uniprot.org/uniprotkb/P0DTC2/ entry. Accessed 15 Dec 2025 at https://www.uniprot. org/uniprotkb/P0DTC2/entry

  63. [63]

    P78358·CTG1B HUMAN, 2025

    UniProt Consortium. P78358·CTG1B HUMAN, 2025. URL https://www.uniprot.org/uniprotkb/P78358/ entry. Accessed 15 Dec 2025 at https://www.uniprot. org/uniprotkb/P78358/entry

  64. [64]

    A. B. Vogel, I. Kanevsky et al. BNT162b vaccines protect rhesus macaques from SARS-CoV-2.Nature, 592:283–289, 2021

  65. [65]

    Vostrosablin, S

    N. Vostrosablin, S. Lim et al. mRNAid, an open- source platform for therapeutic mRNA design and optimization strategies.NAR Genomics and Bioin- formatics, 6(1):lqae028, 03 2024. ISSN 2631-9268. doi: 10.1093/nargab/lqae028. URL https://doi.org/ 10.1093/nargab/lqae028

  66. [66]

    H. F. Walker and P. Ni. Anderson acceleration for fixed-point iterations.SIAM J. Numer. Anal., 49: 1715–1735, 2011

  67. [67]

    M. Ward, M. Richardson, and M. Metkar. mRNA folding algorithms for structure and codon optimiza- tion.Briefings in Bioinformatics, 26(4):bbaf386, 08

  68. [68]

    doi: 10.1093/bib/bbaf386

    ISSN 1477-4054. doi: 10.1093/bib/bbaf386. URL https://doi.org/10.1093/bib/bbaf386

  69. [69]

    M. Ward, M. Richardson, and M. Metkar. mRNA fold- ing algorithms for structure and codon optimization. arXiv [q-bio.BM], 2025

  70. [70]

    H. K. Wayment-Steele, D. S. Kim et al. Theoret- ical basis for stabilizing messenger RNA through secondary structure design.Nucleic acids research, 49:10604–10617, 2021

  71. [71]

    B. R. Wolfe, N. J. Porubsky et al. Constrained multistate sequence design for nucleic acid reaction pathway engineering.J. Am. Chem. Soc., 139:3134– 3144, 2017

  72. [72]

    J. N. Zadeh, B. R. Wolfe, and N. A. Pierce. Nucleic acid sequence design via efficient ensemble defect optimization.J. Comput. Chem., 32:439–452, 2011

  73. [73]

    Zhang, L

    H. Zhang, L. Zhang et al. Algorithm for optimized mRNA design improves stability and immunogenicity. Nature, 621:396–403, 2023

  74. [74]

    Zhang, H

    H. Zhang, H. Liu et al. Deep generative models design mRNA sequences with enhanced translational capacity and stability.Science, 390(6773):eadr8470,

  75. [75]

    URL https:// www.science.org/doi/abs/10.1126/science.adr8470

    doi: 10.1126/science.adr8470. URL https:// www.science.org/doi/abs/10.1126/science.adr8470

  76. [76]

    Zuker and P

    M. Zuker and P. Stiegler. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information.Nucleic Acids Res., 9:133–148, 1981

  77. [77]

    Zulkower and S

    V. Zulkower and S. Rosser. DNA chisel, a versatile sequence optimizer.Bioinformatics, 36:4508–4509, 2020. 11 Appendix S1. Glossary of symbols .We provide a glossary of notation below. Symbol Definition ψ the amino acid sequence (of length|ψ|) being coded for (a fixed input of our algorithms) ϕ an RNA sequence of lengthn(so thatn=|ϕ|), generally satisfying...

  78. [78]

    For each unpaired basei, a matrix ˜Vi

  79. [79]

    For each base pairi·j, a pair of tensors ˜Bi,j and ˜Bj,i modeling each side of the base pair

  80. [80]

    A matrix Z modeling each strand break in secondary structure, constructed as Z = lr⊤ for global vectors l and r

Showing first 80 references.