Calibrating Generative Models to Feature Distributions with MMD Finetuning
Pith reviewed 2026-06-26 21:09 UTC · model grok-4.3
The pith
kCGM calibrates generative models to target feature distributions by minimizing MMD while preserving validity through KL regularization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
kCGM minimizes a maximum mean discrepancy (MMD) between the feature distributions of generated samples and a target set, employing an unbiased score-function estimator for the MMD and KL regularization to keep the model close to its pretrained state. This calibration corrects distributional shifts without requiring full supervision on the target set. On a target of 174 antibiotics, kCGM enhances both feature alignment and chemical validity, in contrast to direct finetuning which improves matching at the cost of validity. The approach extends to adapting autoregressive models and both continuous and discrete diffusion models in protein and DNA generation tasks.
What carries the argument
kCGM, which performs MMD minimization on feature distributions via an unbiased score-function estimator combined with KL regularization to the pretrained model.
If this is right
- kCGM improves target feature matching while increasing validity on antibiotic generation tasks.
- Direct finetuning sacrifices validity for matching, but kCGM avoids this tradeoff.
- The method adapts autoregressive, continuous-space diffusion, and discrete diffusion models using only feature-level supervision.
- The same calibration applies to protein and DNA sequence generation tasks.
Where Pith is reading between the lines
- Feature-level supervision may allow effective calibration even when full target samples are scarce or expensive to obtain.
- The regularization approach could help preserve base model capabilities in other low-data generative settings beyond molecules.
- If the MMD estimator remains stable across kernels, the method might scale to larger or more diverse target feature sets without additional tuning.
Load-bearing premise
The unbiased score-function estimator for MMD can be stably optimized during finetuning without introducing optimization artifacts or unstated assumptions on the kernel or target set size.
What would settle it
An experiment on the 174-antibiotic target set in which kCGM either fails to improve feature matching over the pretrained model or reduces validity relative to direct finetuning.
Figures
read the original abstract
Generative models can produce individually plausible samples while deviating substantially from a target set in the distribution of key features. For example, a model pretrained on broad drug-like chemical space may generate molecules whose molecular features differ from those of a therapeutic class of interest, such as known antibiotics. Correcting such distributional miscalibration is challenging: direct finetuning on the target set can overfit and does not control which features are matched. To fill this gap, we introduce kernel Calibrating Generative Models (kCGM). kCGM minimizes a maximum mean discrepancy (MMD) between generated and target feature distributions using an unbiased score-function estimator, with KL regularization to remain close to the pretrained model. On a target set of 174 antibiotics, direct finetuning sacrifices chemical validity for feature-distribution matching, whereas kCGM improves target feature matching while increasing validity. We further demonstrate kCGM in protein and DNA generation tasks, showing it can adapt autoregressive, continuous-space diffusion, and discrete diffusion models using only feature-level supervision. Code is available at https://github.com/smithhenryd/cgm.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces kernel Calibrating Generative Models (kCGM), which finetunes generative models by minimizing an unbiased score-function estimator of the maximum mean discrepancy (MMD) between generated and target feature distributions, subject to KL regularization toward the pretrained model. On a target set of 174 antibiotics, it claims that kCGM improves target feature matching while increasing chemical validity, in contrast to direct finetuning which sacrifices validity; the method is further shown to adapt autoregressive, continuous-space diffusion, and discrete diffusion models on protein and DNA generation tasks using only feature-level supervision.
Significance. If the empirical claims hold under rigorous verification, the work supplies a practical mechanism for feature-distribution calibration of pretrained generative models without requiring paired sample supervision or direct overfitting to small target sets. The explicit use of an unbiased MMD estimator, the cross-model-type demonstrations, and the public code release are concrete strengths that would support adoption in molecular and biological sequence design.
major comments (2)
- [Method (unbiased MMD estimator and § on antibiotics experiments)] The headline empirical claim (improved validity and feature matching on n=174 antibiotics) rests on stable minimization of the unbiased score-function MMD estimator under KL regularization. For a target set of this size the estimator variance is O(1/n); the manuscript does not state kernel bandwidth selection, batch-size scaling, or variance-reduction steps, leaving open the possibility that observed validity gains are optimization artifacts rather than distributional calibration.
- [Abstract and Experiments section] The abstract asserts quantitative improvements in validity and feature matching versus direct finetuning, yet supplies no numerical metrics, error bars, ablation tables, or derivation of the score-function estimator. Without these, the central claim that kCGM simultaneously improves both objectives cannot be evaluated.
minor comments (2)
- [Method] Notation for the feature kernel and the precise form of the score-function estimator should be introduced with an equation number in the methods section for reproducibility.
- [Abstract] The extension to protein and DNA tasks is mentioned without any quantitative results or model-specific details in the abstract; a brief table summarizing performance across the three model classes would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. We address each major comment below. Where the manuscript was incomplete, we have revised it to add the requested details, derivations, and quantitative results.
read point-by-point responses
-
Referee: [Method (unbiased MMD estimator and § on antibiotics experiments)] The headline empirical claim (improved validity and feature matching on n=174 antibiotics) rests on stable minimization of the unbiased score-function MMD estimator under KL regularization. For a target set of this size the estimator variance is O(1/n); the manuscript does not state kernel bandwidth selection, batch-size scaling, or variance-reduction steps, leaving open the possibility that observed validity gains are optimization artifacts rather than distributional calibration.
Authors: We agree that implementation details were insufficient. The unbiased score-function estimator is derived in §3.2; we have now added an explicit statement of the derivation, the kernel (Gaussian with median-heuristic bandwidth computed on the target features), the batch size (32), and the number of independent runs (5) used to obtain means and standard deviations. A short variance analysis confirming that the observed gains exceed estimator variance has been inserted into the antibiotics experimental subsection. These changes remove the ambiguity about optimization artifacts. revision: yes
-
Referee: [Abstract and Experiments section] The abstract asserts quantitative improvements in validity and feature matching versus direct finetuning, yet supplies no numerical metrics, error bars, ablation tables, or derivation of the score-function estimator. Without these, the central claim that kCGM simultaneously improves both objectives cannot be evaluated.
Authors: We accept the criticism. The revised abstract now reports the key numerical results (validity and MMD values with error bars) for the antibiotics task. We have also added an ablation table comparing kCGM, direct finetuning, and the pretrained baseline, and moved the full derivation of the score-function estimator to a new appendix subsection. These additions make the central claim directly verifiable from the text. revision: yes
Circularity Check
No circularity; kCGM defined via standard MMD + KL without reduction to inputs by construction
full rationale
The paper introduces kCGM as an objective that minimizes an unbiased score-function MMD estimator between generated and target feature distributions, regularized by KL to the pretrained model. The claimed improvements (better feature matching and validity on 174 antibiotics, applicability to multiple model types) are presented as empirical outcomes of optimizing this objective, not as quantities derived by construction from the inputs or from self-citations. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the abstract or description. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
The Journal of Machine Learning Research , year=
A kernel two-sample test , author=. The Journal of Machine Learning Research , year=
-
[2]
Bouchacourt, Diane and Mudigonda, Pawan K and Nowozin, Sebastian , booktitle =
-
[3]
Journal of Chemical Information and Modeling , year=
Similarity coefficients for binary chemoinformatics data: Overview and extended comparison using simulated and real data sets , author=. Journal of Chemical Information and Modeling , year=
-
[4]
International Conference on Machine Learning , year =
Calibrating Generative Models to Distributional Constraints , author =. International Conference on Machine Learning , year =
-
[5]
Pampari, Anusri and Shcherbina, Anna and Kvon, Evgeny Z and Kosicki, Michael and Nair, Surag and Kundu, Soumya and Kathiria, Arwa S and Risca, Viviana I and Kuningas, Kristiina and Alasoo, Kaur and others , note=
-
[6]
Distributional diffusion models with scoring rules , author=
-
[7]
Advances in Neural Information Processing Systems , year=
Variational diffusion models , author=. Advances in Neural Information Processing Systems , year=
-
[8]
Out of many, one: Designing and scaffolding proteins at the scale of the structural universe with
Lin, Yeqing and Lee, Minji and Zhang, Zhao and AlQuraishi, Mohammed , note=. Out of many, one: Designing and scaffolding proteins at the scale of the structural universe with
-
[9]
and Ashford, Paul and Scholes, Harry M
Sillitoe, Ian and Bordin, Nicola and Dawson, Natalie and Waman, Vaishali P. and Ashford, Paul and Scholes, Harry M. and Pang, Camilla S. M. and Woodridge, Laurel and Rauer, Clemens and Sen, Neeladri and Abbasian, Mahnaz and. Nucleic Acids Research , year =
-
[10]
Nature Machine Intelligence , year=
Machine learning-aided generative molecular design , author=. Nature Machine Intelligence , year=
-
[11]
Estimation of the size of drug-like chemical space based on
Polishchuk, Pavel G and Madzhidov, Timur I and Varnek, Alexandre , journal=. Estimation of the size of drug-like chemical space based on
-
[12]
2025 , author =
Important challenges to finding new leads for new antibiotics , journal =. 2025 , author =
2025
-
[13]
Nature Biotechnology , year=
Deep-learning-based virtual screening of antibacterial compounds , author=. Nature Biotechnology , year=
-
[14]
The properties of known drugs
Bemis, Guy W and Murcko, Mark A , journal=. The properties of known drugs. 1
-
[15]
Preuer, Kristina and Renz, Philipp and Unterthiner, Thomas and Hochreiter, Sepp and Klambauer, Gunter , journal=. Fr
-
[16]
Single-molecule correlated chemical probing: A revolution in
Mustoe, Anthony M and Weidmann, Chase A and Weeks, Kevin M , journal=. Single-molecule correlated chemical probing: A revolution in
-
[17]
Advances in Neural Information Processing Systems , year=
Deep reinforcement learning from human preferences , author=. Advances in Neural Information Processing Systems , year=
-
[18]
Advances in Neural Information Processing Systems , year=
Direct preference optimization: Your language model is secretly a reward model , author=. Advances in Neural Information Processing Systems , year=
-
[19]
Fine-tuning of continuous-time diffusion models as entropy-regularized control , author=
-
[20]
Trust Region Constrained Measure Transport in Path Space for Stochastic Optimal Control and Inference , author=
-
[21]
Diffusion
Zheng, Kaiwen and Chen, Huayu and Ye, Haotian and Wang, Haoxiang and Zhang, Qinsheng and Jiang, Kai and Su, Hang and Ermon, Stefano and Zhu, Jun and Liu, Ming-Yu , note=. Diffusion
-
[22]
International Conference on Learning Representations , year=
Adjoint matching: Fine-tuning flow and diffusion generative models with memoryless stochastic optimal control , author=. International Conference on Learning Representations , year=
-
[23]
Advances in Neural Information Processing Systems , year=
Li, Chun-Liang and Chang, Wei-Cheng and Cheng, Yu and Yang, Yiming and P. Advances in Neural Information Processing Systems , year=
-
[24]
Deep generative
Wu, Hao and Mardt, Andreas and Pasquali, Luca and Noe, Frank , booktitle=. Deep generative
-
[25]
Biochemistry , year=
X-rays in the cryo-electron microscopy era: Structural biology's dynamic future , author=. Biochemistry , year=
-
[26]
Conference on Computer Vision and Pattern Recognition , year=
Training diffusion models towards diverse image generation with reinforcement learning , author=. Conference on Computer Vision and Pattern Recognition , year=
-
[27]
Conference on Computer Vision and Pattern Recognition , year=
Image generation diversity issues and how to tame them , author=. Conference on Computer Vision and Pattern Recognition , year=
-
[28]
Fan, Ying and Watkins, Olivia and Du, Yuqing and Liu, Hao and Ryu, Moonkyung and Boutilier, Craig and Abbeel, Pieter and Ghavamzadeh, Mohammad and Lee, Kangwook and Lee, Kimin , booktitle=
-
[29]
International Conference on Machine Learning , year =
Graph Generative Pre-trained Transformer , author =. International Conference on Machine Learning , year =
-
[30]
Brown, Nathan and Fiscato, Marco and Segler, Marwin HS and Vaucher, Alain C , journal=
-
[31]
Journal of Chemical Information and Modeling , year=
Extended-connectivity fingerprints , author=. Journal of Chemical Information and Modeling , year=
-
[32]
AAAI Conference on Artificial Intelligence , year=
On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions , author=. AAAI Conference on Artificial Intelligence , year=
-
[33]
Second Conference on Language Modeling , year=
Weight ensembling improves reasoning in language models , author=. Second Conference on Language Modeling , year=
-
[34]
A proof for the positive definiteness of the
Bouchard, Mathieu and Jousselme, Anne-Laure and Dor. A proof for the positive definiteness of the. International Journal of Approximate Reasoning , year=
-
[35]
Equivalence of distance-based and
Sejdinovic, Dino and Sriperumbudur, Bharath and Gretton, Arthur and Fukumizu, Kenji , journal=. Equivalence of distance-based and
-
[36]
Duvenaud, David , title =
-
[37]
Machine Learning , year=
Simple statistical gradient-following algorithms for connectionist reinforcement learning , author=. Machine Learning , year=
-
[38]
International Conference on Machine Learning , year=
Efficient projections onto the 1--ball for learning in high dimensions , author=. International Conference on Machine Learning , year=
-
[39]
Greg Landrum and Paolo Tosco and Brian Kelley and Ricardo Rodriguez and David Cosgrove and Riccardo Vianello and others , title =
-
[40]
Polygraph: A software framework for the systematic assessment of synthetic regulatory
Lal, Avantika and Gunsalus, Laura and Gupta, Anay and Biancalani, Tommaso and Eraslan, Gokcen , journal=. Polygraph: A software framework for the systematic assessment of synthetic regulatory
-
[41]
Genome Research , year=
Interpretation of allele-specific chromatin accessibility using cell state--aware deep learning , author=. Genome Research , year=
-
[42]
Advancing regulatory variant effect prediction with
Avsec,. Advancing regulatory variant effect prediction with. Nature , year=
-
[43]
Advances in Neural Information Processing Systems , year=
Simple and effective masked diffusion language models , author=. Advances in Neural Information Processing Systems , year=
-
[44]
Patel, Aman and Singhal, Arpita and Wang, Austin and Pampari, Anusri and Kasowski, Maya and Kundaje, Anshul , journal=
-
[45]
International Conference on Machine Learning , year=
Learning Latent Graph Structures and their Uncertainty , author=. International Conference on Machine Learning , year=
-
[46]
Diversity-inducing policy gradient: Using maximum mean discrepancy to find a set of diverse policies , author=
-
[47]
Advances in Neural Information Processing Systems , year=
How transferable are features in deep neural networks? , author=. Advances in Neural Information Processing Systems , year=
-
[48]
Advances in Neural Information Processing Systems , year=
Training language models to follow instructions with human feedback , author=. Advances in Neural Information Processing Systems , year=
-
[49]
Evaluating large language models trained on code , author=
-
[50]
ACS Central Science , year=
Generating focused molecule libraries for drug discovery with recurrent neural networks , author=. ACS Central Science , year=
-
[51]
Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute , author=
-
[52]
Journal of the American Statistical Association , year=
Strictly proper scoring rules, prediction, and estimation , author=. Journal of the American Statistical Association , year=
-
[53]
Nature Machine Intelligence , year=
Equivariant 3D-conditional diffusion model for molecular linker design , author=. Nature Machine Intelligence , year=
-
[54]
2018 , publisher =
Reinforcement Learning: An Introduction , author =. 2018 , publisher =
2018
-
[55]
Kool, Wouter and van Hoof, Herke and Welling, Max , booktitle =. Buy 4. 2019 , note =
2019
-
[56]
Advances in Neural Information Processing Systems , year=
Flow density control: Generative optimization beyond entropy-regularized fine-tuning , author=. Advances in Neural Information Processing Systems , year=
-
[57]
Assessing generative model coverage of protein structures with
Lu, Tianyu and Liu, Melissa and Chen, Yilin and Kim, Jinho and Huang, Po-Ssu , journal=. Assessing generative model coverage of protein structures with
-
[58]
Designing
Sarkar, Anirban and Duran, Alejandra and Yu, Yiyang and Lin, Da-Wei and Kang, Yijie and Somia, Nirali and Mantilla, Pablo and Zhou, Jessica and Nagai, Masayuki and Tang, Ziqi and others , note=. Designing
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.