A-CODE presents a fully atomic one-stage multimodal diffusion model for protein co-design that claims superior unconditional generation performance over prior one- and two-stage models plus a tenfold success-rate gain on hard binder-design tasks.
hub Canonical reference
Diffusion language models are versatile protein learners
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 14roles
background 5polarities
background 5representative citing papers
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
Dual Triangle Attention achieves effective bidirectional attention with built-in positional inductive bias via dual triangular masks, outperforming standard bidirectional attention on position-sensitive tasks and showing strong masked language modeling results with or without positional embeddings.
Enhances Discrete Flow Matching with domain-specific couplings, latent edit-based rates, latent classifier-free guidance, and temperature scaling to reach SOTA on DNA and peptide sequence tasks.
AMix-2 unifies protein sequences and text in one LLM via shared tokens and block-wise diffusion modeling, introduces the ProteinArena benchmark, and reports competitive performance against task-specific protein models and frontier LLMs.
SurfDesign introduces surface-conditioned protein design via manifold modeling and equivariant message passing on surfaces integrated with pretrained language models, outperforming prior methods on binder and enzyme design benchmarks.
Yeti is a compact tokenizer for protein structures that delivers strong codebook use, token diversity, and reconstruction while enabling from-scratch multimodal generation of plausible sequences and structures with 10x fewer parameters than ESM3.
Primal-dual guided decoding casts constrained discrete diffusion as a KL-regularized optimization solved online with adaptive Lagrangian multipliers to satisfy constraints while staying close to the unconstrained model distribution.
Coupling Models enable single-step discrete sequence generation via learned couplings to Gaussian latents and outperform prior one-step baselines on text perplexity, biological FBD, and image FID metrics.
MP2D is a framework that guides discrete diffusion denoising with constrained MCTS and Pareto rewards to optimize protein sequences for four to five simultaneous objectives, outperforming baselines on antimicrobial peptide and binder design tasks.
DPLM-Evo introduces an evolutionary discrete diffusion framework with explicit edit prediction and contextual noising that claims SOTA single-sequence mutation effect prediction on ProteinGym while supporting variable-length evolution simulation.
MIMIC is a split-track encoder-decoder foundation model that unifies sequence reconstruction, prediction, and constrained design across nucleic acids, proteins, and regulatory context using partially observed multimodal inputs.
Discrete, Gaussian, and simplicial diffusion models for sequences are unified as parameterizations of the Wright-Fisher population genetics model, allowing multi-domain training and stable simplicial diffusion.
CodeFP jointly generates protein sequences and structures using functional local structures and auxiliary supervision, yielding 6.1% better functional consistency and 3.2% better foldability than prior baselines.
citing papers explorer
-
Flexible Flows for Biological Sequence Design
Enhances Discrete Flow Matching with domain-specific couplings, latent edit-based rates, latent classifier-free guidance, and temperature scaling to reach SOTA on DNA and peptide sequence tasks.
-
Coupling Models for One-Step Discrete Generation
Coupling Models enable single-step discrete sequence generation via learned couplings to Gaussian latents and outperform prior one-step baselines on text perplexity, biological FBD, and image FID metrics.
-
Towards A Generative Protein Evolution Machine with DPLM-Evo
DPLM-Evo introduces an evolutionary discrete diffusion framework with explicit edit prediction and contextual noising that claims SOTA single-sequence mutation effect prediction on ProteinGym while supporting variable-length evolution simulation.
-
A Unification of Discrete, Gaussian, and Simplicial Diffusion
Discrete, Gaussian, and simplicial diffusion models for sequences are unified as parameterizations of the Wright-Fisher population genetics model, allowing multi-domain training and stable simplicial diffusion.