ProteinOPD uses token-level on-policy distillation from multiple preference-specific teacher models into a shared student to balance competing objectives in protein design, delivering gains on targets without losing designability and an 8x speedup over RL baselines.
Aligning protein generative models with experimental fitness via direct preference optimization.bioRxiv, pages 2024–05
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
SGRPO expands the utility-diversity Pareto frontier in biomolecular design by using supergroup sampling and leave-one-out diversity rewards combined with utility signals.
citing papers explorer
-
ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design
ProteinOPD uses token-level on-policy distillation from multiple preference-specific teacher models into a shared student to balance competing objectives in protein design, delivering gains on targets without losing designability and an 8x speedup over RL baselines.
-
Pushing Biomolecular Utility-Diversity Frontiers with Supergroup Relative Policy Optimization
SGRPO expands the utility-diversity Pareto frontier in biomolecular design by using supergroup sampling and leave-one-out diversity rewards combined with utility signals.