pith. sign in

arxiv: 2503.21450 · v3 · pith:44A5J3EUnew · submitted 2025-03-27 · 💻 cs.CE · q-bio.BM

CMADiff: Cross-Modal Aligned Diffusion for Controllable Protein Generation

classification 💻 cs.CE q-bio.BM
keywords proteingenerationcmadiffconditionaldiffusionlatentphysicochemicalsequence
0
0 comments X
read the original abstract

AI-assisted protein design has emerged as a critical tool for advancing biotechnology, as deep generative models have demonstrated their reliability in this domain. However, most existing models primarily utilize protein sequence or structural data for training, neglecting the physicochemical properties of proteins.Moreover, they are deficient to control the generation of proteins in intuitive conditions. To address these limitations,we propose CMADiff here, a novel framework that enables controllable protein generation by aligning the physicochemical properties of protein sequences with text-based descriptions through a latent diffusion process. Specifically, CMADiff employs a Conditional Variational Autoencoder (CVAE) to integrate physicochemical features as conditional input, forming a robust latent space that captures biological traits. In this latent space, we apply a conditional diffusion process, which is guided by BioAligner, a contrastive learning-based module that aligns text descriptions with protein features, enabling text-driven control over protein sequence generation. Validated by a series of evaluations including AlphaFold3, the experimental results indicate that CMADiff outperforms protein sequence generation benchmarks and holds strong potential for future applications. The implementation and code are available at https://github.com/HPC-NEAU/PhysChemDiff.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Proteo-R1: Reasoning Foundation Models for De Novo Protein Design

    cs.LG 2026-05 unverdicted novelty 6.0

    Proteo-R1 decouples an MLLM-based understanding expert that selects functional residues from a diffusion-based generation expert that builds protein structures under those explicit constraints.