arxiv: 2605.10189 · v1 · submitted 2026-05-11 · 💻 cs.LG · cs.AI

Recognition: no theorem link

ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design

Yulin Zhang , He Cao , Zihao Jiang , Chenyi Zi , Zhipeng Zhou , Zijing Liu , Yu Li , Jia Li

show 1 more author

Ziqi Gao

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:31 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords protein designpreference alignmenton-policy distillationprotein language modelsmulti-objective optimizationsynthetic biology

0 comments

The pith

ProteinOPD aligns protein language models to multiple preference objectives while preserving their designability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework called ProteinOPD to steer pretrained protein language models toward specific functional goals such as stability or binding affinity. It creates separate teacher models for each goal and then uses on-policy distillation to transfer a balanced combination of those goals into a single student model on the student's own generated sequences. This approach is presented as a solution to the forgetting of basic sequence generation skills that occurs with other alignment techniques and to the high computational cost of reinforcement learning methods. A sympathetic reader would care because effective protein design supports applications in synthetic biology and drug discovery where both custom properties and reliable foldability matter.

Core claim

ProteinOPD adapts a pretrained protein language model into preference-specific teachers and distills their knowledge into a shared student via token-level on-policy distillation on the student's own trajectories. The student aligns to a unique normalized geometric consensus of weighted teachers while ensuring bounded optimization under conflicts. This enables multi-objective preference alignment without catastrophic forgetting of the model's original designability.

What carries the argument

Token-level on-policy distillation to a normalized geometric consensus of weighted preference-specific teachers.

If this is right

Generated proteins show substantial gains on the chosen preference objectives.
Designability of the sequences remains comparable to the unaligned model.
Training completes approximately eight times faster than reinforcement-learning alignment baselines.
Multiple competing objectives can be balanced through a single normalized consensus without separate retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same distillation structure could be tested on other biological sequence tasks where multiple constraints must be satisfied simultaneously.
The reported speedup indicates that replacing policy-gradient steps with on-policy distillation may lower the barrier to aligning larger generative models in biology.
Direct validation of the resulting proteins through structure prediction or experimental assays would be a natural next measurement to confirm the maintained designability.

Load-bearing premise

The mode-seeking behavior of on-policy distillation will reliably keep the model from losing its pretrained ability to generate designable protein sequences when it is aligned to multiple conflicting preferences at once.

What would settle it

A side-by-side measurement of designability scores (such as predicted fold quality or energy) on proteins generated before and after ProteinOPD training, together with scores on the target preference objectives, to check whether designability holds steady while preferences improve.

read the original abstract

Designing proteins with desired functions or properties represents a core goal in synthetic biology and drug discovery. Recent advances in protein language models (PLMs) have enabled the generation of highly designable protein sequences, while preference alignment provides a promising way to steer designs toward desired functions and properties. Nevertheless, they often trigger catastrophic forgetting of pretrained knowledge, degrading basic designability and failing to balance multiple competing objectives. To address these issues, we draw inspiration from On-Policy Distillation (OPD), an advanced post-training method renowned for mitigating catastrophic forgetting through its mode-seeking nature. In this work, we propose ProteinOPD, a multi-objective preference alignment framework that can effectively balance multiple preference objectives while maintaining the inherent designability of PLMs. ProteinOPD adapts a pretrained PLM into preference-specific teachers and distills their knowledge into a shared student via token-level OPD on the student's own trajectories. During this process, the student is aligned to a unique normalized geometric consensus of weighted teachers while ensuring bounded optimization under conflicts. This bridges the gap for OPD in multi-objective/teacher alignment. Extensive experiments show that ProteinOPD achieves substantial gains on target preference objectives without compromising the designability, with an 8x training speedup over RL-based alignment competitors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ProteinOPD adapts on-policy distillation to multi-objective alignment for protein LMs and reports gains plus 8x speedup, but the no-drift claim under conflicting teachers rests on an assumption that needs tighter checks.

read the letter

ProteinOPD adapts on-policy distillation for aligning protein language models to multiple preferences at once. The core idea is to create separate teachers for each objective, then distill their knowledge into a student model using token-level on-policy distillation on trajectories sampled from the student itself, guided by a normalized geometric consensus of the weighted teachers. This adaptation to the multi-teacher case is what is new. It directly addresses the gap in applying OPD to multi-objective settings for proteins, where balancing competing goals like function and stability often leads to forgetting the base model's design capabilities. The paper does well in the experimental section by demonstrating substantial improvements on the target preferences while keeping designability intact. The 8x training speedup compared to RL-based methods is a notable practical benefit, making it more feasible for iterative design workflows. The soft spot is the handling of objective conflicts. The mode-seeking nature and geometric mean are intended to prevent drift, but sampling from the student can still reinforce small shifts away from the original distribution when teachers disagree. The abstract claims bounded optimization under conflicts, yet without detailed ablations showing that generated sequences remain close to the pretrained mode in terms of likelihood or structural metrics across different weight settings, the prevention of catastrophic forgetting is not fully secured. This assumption needs stronger empirical backing. This work is for researchers in machine learning for biology, particularly those developing alignment techniques for generative protein models. Readers who value efficient post-training methods over full reinforcement learning will find the framework and results relevant. It has a clear method, grounded claims, and testable experiments, so it deserves a serious referee. I recommend sending the paper for peer review, focusing review on the drift and designability preservation aspects.

Referee Report

2 major / 1 minor

Summary. The paper proposes ProteinOPD, a multi-objective preference alignment framework for protein language models (PLMs) that adapts a pretrained PLM into preference-specific teachers and distills their outputs into a shared student model using token-level on-policy distillation (OPD) on the student's own trajectories. The student is aligned to a normalized geometric consensus of the weighted teachers, with the method claimed to balance competing objectives while preserving designability due to OPD's mode-seeking property and bounded optimization under conflicts, yielding substantial preference gains and an 8x training speedup over RL-based competitors.

Significance. If the empirical claims hold, ProteinOPD would offer a practical and efficient alternative to RL for steering PLM-based protein generators toward multiple functional objectives without degrading core designability. This addresses a central limitation in current preference alignment for proteins and could accelerate applications in synthetic biology and drug discovery by reducing training costs while maintaining the generative quality of the base model.

major comments (2)

[Method description (abstract and §3)] The central claim that mode-seeking OPD on student trajectories preserves designability under multi-teacher conflicts relies on the normalized geometric consensus preventing drift from the pretrained distribution. However, on-policy sampling from the student can reinforce deviations once the consensus tilts, and the geometric mean alone does not explicitly bound the student to the high-density region of the original PLM; this assumption is load-bearing for the no-forgetting guarantee and requires a formal argument or ablation in the methods.
[Experiments (abstract)] The 8x speedup claim over RL-based alignment competitors is central to the efficiency contribution but cannot be assessed without details on the exact RL baselines, training configurations, hardware, and wall-clock measurements; the abstract states clear performance claims including this factor, yet the provided text lacks the experimental setup needed to verify it.

minor comments (1)

[Abstract] The abstract mentions 'bounded optimization under conflicts' but does not define the normalization procedure for the geometric consensus or how weights are set; this notation should be clarified with an equation for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and describe the revisions we will incorporate to strengthen the presentation of our method and experimental claims.

read point-by-point responses

Referee: [Method description (abstract and §3)] The central claim that mode-seeking OPD on student trajectories preserves designability under multi-teacher conflicts relies on the normalized geometric consensus preventing drift from the pretrained distribution. However, on-policy sampling from the student can reinforce deviations once the consensus tilts, and the geometric mean alone does not explicitly bound the student to the high-density region of the original PLM; this assumption is load-bearing for the no-forgetting guarantee and requires a formal argument or ablation in the methods.

Authors: We appreciate the referee highlighting the need for clearer justification of the designability preservation claim. The manuscript argues that the normalized geometric consensus of the weighted teachers, together with OPD's mode-seeking property and the explicit bounded optimization under conflicts, keeps the student from drifting outside the high-density region of the pretrained PLM. We acknowledge that the current text does not provide a fully formal divergence bound or dedicated ablation isolating the consensus effect. We will revise §3 to include a short theoretical sketch showing that the geometric mean induces a bounded KL divergence from the original distribution and add an ablation comparing designability metrics (e.g., pLDDT, scRMSD) with and without the normalized consensus to empirically confirm the no-forgetting behavior. revision: partial
Referee: [Experiments (abstract)] The 8x speedup claim over RL-based alignment competitors is central to the efficiency contribution but cannot be assessed without details on the exact RL baselines, training configurations, hardware, and wall-clock measurements; the abstract states clear performance claims including this factor, yet the provided text lacks the experimental setup needed to verify it.

Authors: We agree that the 8x speedup claim requires explicit experimental details for verification. The full manuscript reports the speedup based on direct wall-clock comparisons against RL baselines (PPO and adapted DPO variants) in the experiments section, but these details are not summarized in the abstract. We will revise the abstract to briefly reference the setup and add a new paragraph in the experimental details subsection (and an appendix table) specifying the RL baselines, training hyperparameters, hardware (NVIDIA A100 GPUs), batch sizes, and measured wall-clock times for both ProteinOPD and the RL competitors. This will make the efficiency claim fully reproducible and verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity; adaptation of external OPD with independent empirical validation

full rationale

The paper adapts the existing On-Policy Distillation (OPD) method to protein PLMs for multi-objective alignment, citing its mode-seeking property to mitigate forgetting. Claims rest on experimental results (gains on preferences, preserved designability, 8x speedup) rather than any derivation that reduces to fitted inputs, self-definitions, or load-bearing self-citations. No equations or steps equate outputs to inputs by construction; the framework is presented as a practical extension with external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only abstract available, so ledger is necessarily incomplete; the central claim rests on the unverified assumption that OPD's mode-seeking property transfers to this domain.

free parameters (1)

preference weights
Multi-objective setup requires weighting different teachers or objectives; exact values not specified in abstract.

axioms (1)

domain assumption On-policy distillation mitigates catastrophic forgetting due to its mode-seeking nature
Explicitly stated as the inspiration for the method in the abstract.

pith-pipeline@v0.9.0 · 5543 in / 1285 out tokens · 56860 ms · 2026-05-12T03:31:14.840745+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 3 internal anchors

[1]

Uniprot: the universal protein knowledgebase in 2023.Nucleic acids research, 51(D1):D523–D531, 2023

work page 2023
[2]

From mechanistic interpretability to mechanistic biology: Training, evaluating, and interpreting sparse autoencoders on protein language models

Etowah Adams, Liam Bai, Minji Lee, Yiyang Yu, and Mohammed AlQuraishi. From mechanistic interpretability to mechanistic biology: Training, evaluating, and interpreting sparse autoencoders on protein language models. bioRxiv, 2025

work page 2025
[3]

On-policy distillation of language models: Learning from self-generated mistakes

Rishabh Agarwal, Nino Vieillard, Yongchao Zhou, Piotr Stanczyk, Sabela Ramos Garea, Matthieu Geist, and Olivier Bachem. On-policy distillation of language models: Learning from self-generated mistakes. InThe twelfth international conference on learning representations, 2024

work page 2024
[4]

Engineering living therapeutics with synthetic biology.Nature Reviews Drug Discovery, 20(12):941–960, 2021

Andres Cubillos-Ruiz, Tingxi Guo, Anna Sokolovska, Paul F Miller, James J Collins, Timothy K Lu, and Jose M Lora. Engineering living therapeutics with synthetic biology.Nature Reviews Drug Discovery, 20(12):941–960, 2021

work page 2021
[5]

Toward de novo protein design from natural language.BioRxiv, pages 2024–08, 2024

Fengyuan Dai, Shiyang You, Yudian Zhu, Yuan Gao, Lihao Fu, Xibin Zhou, Jin Su, Chentong Wang, Yuliang Fan, Xiaoxiao Ma, et al. Toward de novo protein design from natural language.BioRxiv, pages 2024–08, 2024

work page 2024
[6]

Engineering protein-based therapeutics through structural and chemical design

Sasha B Ebrahimi and Devleena Samanta. Engineering protein-based therapeutics through structural and chemical design. Nature communications, 14(1):2411, 2023

work page 2023
[7]

Protgpt2 is a deep unsupervised language model for protein design

Noelia Ferruz, Steffen Schmidt, and Birte Höcker. Protgpt2 is a deep unsupervised language model for protein design. Nature communications, 13(1):4348, 2022

work page 2022
[8]

Simulating 500 million years of evolution with a language model

Thomas Hayes, Roshan Rao, Halil Akin, Nicholas J Sofroniew, Deniz Oktay, Zeming Lin, Robert Verkuil, Vincent Q Tran, Jonathan Deaton, Marius Wiggert, et al. Simulating 500 million years of evolution with a language model. Science, 387(6736):850–858, 2025

work page 2025
[9]

Protein–sol: a web tool for predicting protein solubility from sequence.Bioinformatics, 33(19):3098–3100, 2017

Max Hebditch, M Alejandro Carballo-Amador, Spyros Charonis, Robin Curtis, and Jim Warwicker. Protein–sol: a web tool for predicting protein solubility from sequence.Bioinformatics, 33(19):3098–3100, 2017

work page 2017
[10]

Rita: a study on scaling up generative protein sequence models.arXiv preprint arXiv:2205.05789, 2022

Daniel Hesslow, Niccoló Zanichelli, Pascal Notin, Iacopo Poli, and Debora Marks. Rita: a study on scaling up generative protein sequence models.arXiv preprint arXiv:2205.05789, 2022

work page arXiv 2022
[11]

Efficient evolution of human antibodies from general protein language models

Brian L Hie, Varun R Shanker, Duo Xu, Theodora UJ Bruun, Payton A Weidenbacher, Shaogeng Tang, Wesley Wu, John E Pak, and Peter S Kim. Efficient evolution of human antibodies from general protein language models. Nature biotechnology, 42(2):275–283, 2024

work page 2024
[12]

Property-driven protein inverse folding with multi-objective preference alignment.arXiv preprint arXiv:2603.06748, 2026

Xiaoyang Hou, Junqi Liu, Chence Shi, Xin Liu, Zhi Yang, and Jian Tang. Property-driven protein inverse folding with multi-objective preference alignment.arXiv preprint arXiv:2603.06748, 2026

work page arXiv 2026
[13]

Steering protein language models.arXiv preprint arXiv:2509.07983, 2025

Long-Kai Huang, Rongyi Zhu, Bing He, and Jianhua Yao. Steering protein language models.arXiv preprint arXiv:2509.07983, 2025

work page arXiv 2025
[14]

Highly accurate protein structure prediction with alphafold

John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. Highly accurate protein structure prediction with alphafold. nature, 596(7873):583–589, 2021

work page 2021
[15]

Synthetic biology: applications come of age.Nature Reviews Genetics, 11 (5):367–379, 2010

Ahmad S Khalil and James J Collins. Synthetic biology: applications come of age.Nature Reviews Genetics, 11 (5):367–379, 2010

work page 2010
[16]

Pdfbench: A benchmark for de novo protein design from function.arXiv preprint arXiv:2505.20346, 2025

Jiahao Kuang, Nuowei Liu, Jie Wang, Changzhi Sun, Tao Ji, and Yuanbin Wu. Pdfbench: A benchmark for de novo protein design from function.arXiv preprint arXiv:2505.20346, 2025

work page arXiv 2025
[17]

Language models of protein sequences at the scale of evolution enable accurate structure prediction.BioRxiv, 2022:500902, 2022

Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction.BioRxiv, 2022:500902, 2022

work page 2022
[18]

Protein design with dynamic protein vocabulary.arXiv preprint arXiv:2505.18966, 2025

Nuowei Liu, Jiahao Kuang, Yanting Liu, Tao Ji, Changzhi Sun, Man Lan, and Yuanbin Wu. Protein design with dynamic protein vocabulary.arXiv preprint arXiv:2505.18966, 2025

work page arXiv 2025
[19]

A text-guided protein design framework.Nature Machine Intelligence, 7(4): 580–591, 2025

Shengchao Liu, Yanjing Li, Zhuoxinran Li, Anthony Gitter, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Arvind Ramanathan, Chaowei Xiao, et al. A text-guided protein design framework.Nature Machine Intelligence, 7(4): 580–591, 2025. 11

work page 2025
[20]

Controllable protein sequence generation with llm preference optimization

Xiangyu Liu, Yi Liu, Silei Chen, and Wei Hu. Controllable protein sequence generation with llm preference optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 505–513, 2025

work page 2025
[21]

De novo design of drug-binding proteins with predictable binding energy and specificity.Science, 384(6691):106–112, 2024

Lei Lu, Xuxu Gou, Sophia K Tan, Samuel I Mann, Hyunjun Yang, Xiaofang Zhong, Dimitrios Gazgalis, Jesús Valdiviezo, Hyunil Jo, Yibing Wu, et al. De novo design of drug-binding proteins with predictable binding energy and specificity.Science, 384(6691):106–112, 2024

work page 2024
[22]

Flexible and controllable protein design by prefix-tuning large-scale protein language models.BioRxiv, pages 2023–12, 2023

Jiawei Luo, Xianliang Liu, Jiahao Li, Qingcai Chen, and Junjie Chen. Flexible and controllable protein design by prefix-tuning large-scale protein language models.BioRxiv, pages 2023–12, 2023

work page 2023
[23]

Prollama: A protein large language model for multi-task protein language processing.IEEE Transactions on Artificial Intelligence, 2025

Liuzhenghao Lv, Zongying Lin, Hao Li, Yuyang Liu, Jiaxi Cui, Calvin Yu-Chian Chen, Li Yuan, and Yonghong Tian. Prollama: A protein large language model for multi-task protein language processing.IEEE Transactions on Artificial Intelligence, 2025

work page 2025
[24]

Eguchi, Po - Ssu Huang, and Richard Socher

Ali Madani, Bryan McCann, Nikhil Naik, Nitish Shirish Keskar, Namrata Anand, Raphael R Eguchi, Po-Ssu Huang, and Richard Socher. Progen: Language modeling for protein generation.arXiv preprint arXiv:2004.03497, 2020

work page arXiv 2004
[25]

Large language models generate functional protein sequences across diverse families.Nature biotechnology, 41(8):1099–1106, 2023

Ali Madani, Ben Krause, Eric R Greene, Subu Subramanian, Benjamin P Mohr, James M Holton, Jose Luis Olmos Jr, Caiming Xiong, Zachary Z Sun, Richard Socher, et al. Large language models generate functional protein sequences across diverse families.Nature biotechnology, 41(8):1099–1106, 2023

work page 2023
[26]

Conditional language models enable the efficient design of proficient enzymes.bioRxiv, pages 2024–05, 2024

Geraldene Munsamy, Ramiro Illanes-Vicioso, Silvia Funcillo, Ioanna T Nakou, Sebastian Lindner, Gavin Ayres, Lesley S Sheehan, Steven Moss, Ulrich Eckhard, Philipp Lorenz, et al. Conditional language models enable the efficient design of proficient enzymes.bioRxiv, pages 2024–05, 2024

work page 2024
[27]

Progen2: exploring the boundaries of protein language models.Cell systems, 14(11):968–978, 2023

Erik Nijkamp, Jeffrey A Ruffolo, Eli N Weinstein, Nikhil Naik, and Ali Madani. Progen2: exploring the boundaries of protein language models.Cell systems, 14(11):968–978, 2023

work page 2023
[28]

Temberture: advancing protein thermostability prediction with deep learning and attention mechanisms.Bioinformatics Advances, 4(1):vbae103, 2024

Chiara Rodella, Symela Lazaridi, and Thomas Lemmin. Temberture: advancing protein thermostability prediction with deep learning and attention mechanisms.Bioinformatics Advances, 4(1):vbae103, 2024

work page 2024
[29]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[30]

Guiding generative protein language models with reinforcement learning.arXiv preprint arXiv:2412.12979, 2024

Filippo Stocco, Maria Artigues-Lleixa, Andrea Hunklinger, Talal Widatalla, Marc Guell, and Noelia Ferruz. Guiding generative protein language models with reinforcement learning.arXiv preprint arXiv:2412.12979, 2024

work page arXiv 2024
[31]

Steering generative models for protein design: Aligning and conditioning strategies

Filippo Stocco, Michele Garibbo, and Noelia Ferruz. Steering generative models for protein design: Aligning and conditioning strategies. Current Opinion in Structural Biology, 98:103250, 2026

work page 2026
[32]

Protrek: Navigating the protein universe through tri-modal contrastive learning.bioRxiv, pages 2024–05, 2024

Jin Su, Xibin Zhou, Xuting Zhang, and Fajie Yuan. Protrek: Navigating the protein universe through tri-modal contrastive learning.bioRxiv, pages 2024–05, 2024

work page 2024
[33]

Uniref clusters: a comprehensive and scalable alternative for improving sequence similarity searches.Bioinformatics, 31(6):926–932, 2015

Baris E Suzek, Yuqi Wang, Hongzhan Huang, Peter B McGarvey, Cathy H Wu, and the UniProt Consortium. Uniref clusters: a comprehensive and scalable alternative for improving sequence similarity searches.Bioinformatics, 31(6):926–932, 2015

work page 2015
[34]

Preference fine-tuning of LLMs should leverage suboptimal, on-policy data,

Fahim Tajwar, Anikait Singh, Archit Sharma, Rafael Rafailov, Jeff Schneider, Tengyang Xie, Stefano Ermon, Chelsea Finn, and Aviral Kumar. Preference fine-tuning of llms should leverage suboptimal, on-policy data.arXiv preprint arXiv:2404.14367, 2024

work page arXiv 2024
[35]

Pro- teinzero: Self-improving protein generation via online reinforcement learning.arXiv preprint arXiv:2506.07459, 2025

Ziwen Wang, Jiajun Fan, Ruihan Guo, Thao Nguyen, Heng Ji, and Ge Liu. Proteinzero: Self-improving protein generation via online reinforcement learning.arXiv preprint arXiv:2506.07459, 2025

work page arXiv 2025
[36]

Aligning protein generative models with experimental fitness via direct preference optimization.bioRxiv, pages 2024–05, 2024

Talal Widatalla, Rafael Rafailov, and Brian Hie. Aligning protein generative models with experimental fitness via direct preference optimization.bioRxiv, pages 2024–05, 2024

work page 2024
[37]

Applications of synthetic biology in medical and pharmaceutical fields.Signal transduction and targeted therapy, 8(1):199, 2023

Xu Yan, Xu Liu, Cuihuan Zhao, and Guo-Qiang Chen. Applications of synthetic biology in medical and pharmaceutical fields.Signal transduction and targeted therapy, 8(1):199, 2023

work page 2023
[38]

Annotation-guided protein design with multi-level domain alignment

Chaohao Yuan, Songyou Li, Geyan Ye, Yikun Zhang, Long-Kai Huang, Wenbing Huang, Wei Liu, Jianhua Yao, and Yu Rong. Annotation-guided protein design with multi-level domain alignment. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1, pages 1855–1866, 2025. 12

work page 2025
[39]

GLM-5: from Vibe Coding to Agentic Engineering

Aohan Zeng, Xin Lv, Zhenyu Hou, Zhengxiao Du, Qinkai Zheng, Bin Chen, Da Yin, Chendi Ge, Chenghua Huang, Chengxing Xie, et al. Glm-5: from vibe coding to agentic engineering.arXiv preprint arXiv:2602.15763, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[40]

Towards on-policy sft: Distribution discriminant theory and its applications in llm training

Miaosen Zhang, Yishan Liu, Shuxia Lin, Xu Yang, Qi Dai, Chong Luo, Weihao Jiang, Peng Hou, Anxiang Zeng, Xin Geng, et al. Towards on-policy sft: Distribution discriminant theory and its applications in llm training. arXiv preprint arXiv:2602.12222, 2026

work page arXiv 2026
[41]

Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models

Siyan Zhao, Zhihui Xie, Mengchen Liu, Jing Huang, Guan Pang, Feiyu Chen, and Aditya Grover. Self-distilled reasoner: On-policy self-distillation for large language models.arXiv preprint arXiv:2601.18734, 2026. 13 Appendix A Details of Metrics This section provides detailed definitions of the evaluation metrics used in our experiments. Perplexity.Perplexit...

work page internal anchor Pith review arXiv 2026