Recognition: unknown
Probabilistic RNA Designability via Interpretable Ensemble Approximation and Dynamic Decomposition
read the original abstract
Motivation: RNA design aims to find RNA sequences that fold into a given target secondary structure, a problem also known as RNA inverse folding. However, not all target structures are designable. Recent advances in RNA designability have focused primarily on minimum free energy (MFE)-based criteria, while ensemble-based notions of designability remain largely underexplored. To address this gap, we introduce a theory of ensemble approximation and a probability decomposition framework for bounding the folding probabilities of RNA structures in an explainable way. We further develop a linear-time dynamic programming algorithm that efficiently searches over exponentially many decompositions and identifies the optimal one that yields the tightest probabilistic bound for a given structure. Results: Applying our methods to both native and artificial RNA structures in the ArchiveII and Eterna100 benchmarks, we obtained probability bounds that are much tighter than prior approaches. In addition, our methods further provide anatomical tools for analyzing RNA structures and understanding the sources of design difficulty at the motif level. Availability: Source code and data are available at https://github.com/shanry/RNA-Undesign. Supplementary information: Supplementary text and data are available in a separate PDF.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
GoForth: Language Models for RNA Design under Structure, Sequence, and Coding Constraints
GoForth is a forward-trained encoder-decoder RNA language model that generates sequences under mixed constraints on fold, sequence, and coding by separating sequence prior, forward folding sampler, and reward oracle.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.