STRIDE: Post-Training LLMs to Reason and Refine Bio-Sequences via Edit Trajectories

Daiheng Zhang; David van Dijk; Shiyang Zhang; Sizhuang He; Syed Asad Rizvi; Yangtian Zhang

read the original abstract

Discrete biological sequence optimization often requires goal-directed, parser-valid edits to an existing protein or molecule. Diffusion models support iterative refinement but do not expose a controllable discrete-edit interface, while autoregressive LLMs can be myopic when planning constrained edits over multiple steps. We introduce STRIDE (Sequence Trajectory Refinement via Iterative Discrete Editing), a post-training framework that trains an LLM to emit executable INSERT/DELETE/REPLACE trajectories for variable-length refinement. STRIDE first learns Levenshtein-aligned shortest-edit demonstrations, then uses supervised fine-tuning and group-based policy optimization to align trajectories with task rewards while preserving coherent editing. On an oracle-based full-action protein stress test, STRIDE raises success over Vanilla SFT from 42% to 89% and novelty among unique improvements from 47% to 97%. On instruction-conditioned molecular editing, the GSPO-aligned variant improves strict success, controllability, and SMILES validity over the SFT-only STRIDE model (code: https://github.com/daiheng-zhang/STRIDE).

STRIDE: Post-Training LLMs to Reason and Refine Bio-Sequences via Edit Trajectories

discussion (0)