arxiv: 2605.09439 · v1 · submitted 2026-05-10 · 💻 cs.LG · stat.ML

Recognition: 2 theorem links

· Lean Theorem

Inverse Design for Conditional Distribution Matching

Ori Meidler , Shaul Tolkovsky , Or Zuk

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:43 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords conditional distribution matchinginverse designdiffusion modelsgenerative modelinginference-time optimizationscore-based generative modelsdistributional targets

0 comments

The pith

A method finds inputs whose outputs match any user-specified distribution using only pretrained models at inference time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines Conditional Distribution Matching as the task of recovering an input x* such that the conditional output distribution induced by a joint model exactly matches a chosen target distribution G. This generalizes standard inverse design, which only seeks pointwise outputs, to cases where the full shape and uncertainty of Y matter. MLGD-F solves the problem by guiding a pretrained score-based diffusion model with gradients from a fast conditional sampler, all without retraining or fine-tuning. The approach is shown to recover matching inputs for synthetic cases, image transformations, and editing tasks that include discrete mixtures and low-rank continuous supports.

Core claim

Conditional Distribution Matching is defined as the inverse-design task of finding x* given joint P(X, Y) and target G(Y) so that P(Y | X = x*) equals G. MLGD-F achieves this by combining a pretrained score-based diffusion model with a single-step fast conditional sampler to compute matching-loss gradients tractably at inference time, enabling both sampling and optimization variants of the problem.

What carries the argument

MLGD-F (Matching-Loss Guided Diffusion with a Fast inner sampler), which uses single-step conditional sampling inside a diffusion process to estimate conditional distributions and their gradients memory-efficiently.

If this is right

Inputs can be recovered whose conditionals match discrete mixture targets using only pretrained components.
Continuous low-rank support distributions become reachable in generative editing without retraining.
Structured image transformations can be performed by optimizing for full distributional targets rather than single points.
The method remains lightweight because single-step sampling replaces repeated full diffusion runs during gradient steps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same inference-time guidance pattern could be tested on other generative backbones that admit fast conditional sampling.
Distributional targets could be used to encode higher-level requirements such as diversity or safety constraints in downstream design pipelines.
The framework naturally extends to sequential or multi-step design tasks where each step must induce a controlled output distribution.

Load-bearing premise

That a pretrained score-based diffusion model and a pretrained fast conditional sampler can be combined at inference time to compute accurate gradients for matching arbitrary target distributions without any model updates.

What would settle it

An experiment in which the empirical distribution of outputs generated from the recovered x* fails to match the modes, variances, or support of a specified discrete mixture or low-rank continuous target G.

Figures

Figures reproduced from arXiv: 2605.09439 by Ori Meidler, Or Zuk, Shaul Tolkovsky.

**Figure 1.** Figure 1: Synthetic 2D simulation (Top): Joint distribution and conditional density comparison. MLGD-F (green) recovers an input whose induced conditional matches the target G(Y ) with high fidelity (L 2 = 0.0076). (Bottom): Approximate CDMS β-sweep: orange is analytical Qβ, blue is empirical samples. As β increases, samples transition from the prior P(X) to a distribution closely aligned with Qβ, eventually concent… view at source ↗

**Figure 2.** Figure 2: MNIST rotation optimization. Left: Optimized digits x ∗ for uniform (row 1), bimodal (row 2), and unimodal (row 3) targets. Without digit-identity supervision, MLGD-F recovers semantically meaningful classes: circular “0” (uniform), narrow “0”,“1”,“8”,“6/9” (bimodal, 0 ◦ /180◦ ), and oriented “2”,“3”,“7” (unimodal, 0 ◦ ). Right: Polar histograms of P(Y | X=x ∗ ) (blue) vs. target G (orange), where each pa… view at source ↗

**Figure 3.** Figure 3: Top-left: Source male portrait scribble. Top-middle: MLGD-F optimized scribble after distribution-guided editing. Top-right: CLIP PCA projection fitted on binary gender poles (male, woman). Background clusters represent reference distributions (Woman, Man, and mixed-feature sets) colored by their position along the primary semantic axis (PC1). The MLGD-F evaluation outputs (gold markers) cluster in the int… view at source ↗

**Figure 4.** Figure 4: ECDF of the L 2 GMM distance across 25 independent optimization runs for LGD and MLGD-F in the 2D, 5D, and 10D settings. A curve shifted to the left indicates better (lower) L 2 GMM. In 2D, MLGD-F’s distribution is shifted clearly to the left of LGD’s; in 5D the two largely overlap with LGD holding a slight edge; in 10D MLGD-F’s distribution is again shifted to the left, consistent with better average-case… view at source ↗

**Figure 5.** Figure 5: Bimodal target Gbimodal — all 15 seeds. Each panel shows the optimized digit x ∗ with its seed index, final SWD loss, and classifier label (“None” indicates no digit exceeded the confidence threshold). Seeds 0–4 (top row), seeds 5–9 (middle row), and seeds 10–14 (bottom row). The optimizer consistently recovers rotationally symmetric digits such as “1”,“8”, “0”, “6”, and “9”, which remain visually valid un… view at source ↗

**Figure 6.** Figure 6: Uniform target Guniform — all 15 seeds. Seeds 0–4 (top row), seeds 5–9 (middle row), and seeds 10–14 (bottom row). The optimizer overwhelmingly recovers circular “0” digits across seeds, the only digit class that is plausibly valid at every rotation angle θ ∈ [0, 360). 23 [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: Unimodal target Gunimodal — all 15 seeds. Seeds 0–4 (top row), seeds 5–9 (middle row), and seeds 10–14 (bottom row). The optimizer recovers canonical upright digits such as “2”, “3”, and “7”, consistent with the unimodal target placing mass near 0 (upright orientation). B.7 Code and Reproducibility Full experimental code, training scripts, and inference scripts are available at https://github. com/orineo1/… view at source ↗

**Figure 8.** Figure 8: Where MLGD-F edits the scribble (balanced target). Left: Source scribble used to initialise the SDEdit trajectory. Middle: MLGD-F-optimized scribble targeting Gbal. Right: Signed difference (green = added, red = removed). The edits are not diffuse noise but concentrate on the hairline, eye and eyebrow region and hair outline. Quantitatively, 91.3% of pixels are unchanged, while 2.8% are added and 5.9% are … view at source ↗

**Figure 9.** Figure 9: Why those regions: gradient saliency and its alignment with MLGD-F edits. Left: Binned mean MLGD-F scribble change versus binned mean saliency, on line pixels only (n = 33,828). The monotone relationship (Spearman ρ = 0.56) shows that MLGD-F preferentially edits the sourcescribble strokes that are most gradient-sensitive for the downstream gender score. Right: Gender saliency |∂s/∂x| on the source scribbl… view at source ↗

**Figure 10.** Figure 10: Balanced target Gbal = 0.5 δmale + 0.5 δfemale. Top row, left to right: source scribble, average scribble (Avg), best SDEdit scribble (SDEdit Best), and our method’s scribble (MLGD-F). Rows 2–3, top to bottom: five sample target portraits used as guidance — male targets and female targets, reflecting the balanced 50/50 distribution. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_10.png] view at source ↗

**Figure 11.** Figure 11: Balanced target — conditional samples. Five representative portrait images generated from each method’s output scribble via SDXL-Turbo + ControlNet-Scribble with the neutral prompt, for the balanced target Gbal = 0.5 δmale + 0.5 δfemale. Each column is generated with an identical seed; differences across rows reflect the scribble x ∗ alone. MLGD-F produces a more balanced mix of male and female portraits … view at source ↗

**Figure 12.** Figure 12: Skewed target Gskew = 0.25 δmale + 0.75 δfemale. left to right: source scribble, average scribble (Avg), best SDEdit scribble (SDEdit Best), and our method’s scribble (MLGD-F). The target portraits are the same as in [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗

**Figure 13.** Figure 13: Skewed target — conditional samples. Five representative portrait images generated from each method’s output scribble, for the skewed target Gskew = 0.25 δmale + 0.75 δfemale.Each column is generated with an identical seed; differences across rows reflect the scribble x ∗ alone. MLGD-F yields a predominantly female output distribution consistent with the 75% female target proportion. 32 [PITH_FULL_IMAGE:… view at source ↗

**Figure 14.** Figure 14: Gender interpolation target GinterpGender = 1 4 P4 k=1 δck , a four-anchor discretisation of the 1-D feminine-to-masculine continuum in CLIP space at equal weight (25% each). Top row, left to right: source scribble, average scribble (Avg), best SDEdit scribble (SDEdit Best), and our method’s scribble (MLGD-F). Rows 2–5, top to bottom: five sample target portraits for each anchor spanning the gender axis —… view at source ↗

**Figure 15.** Figure 15: Gender interpolation target — conditional samples. Five representative portrait images generated from each method’s output scribble for the gender interpolation target GinterpGender = 1 4 P4 k=1 δck , which spans the feminine-to-masculine continuum in CLIP space. Each column shares the same diffusion seed across all rows; thus, any visual difference between rows is attributable solely to the candidate scr… view at source ↗

**Figure 16.** Figure 16: Age interpolation target GinterpAge, supported on a one-dimensional age continuum over male portraits (ages 40–79, uniform distribution). Top row, left to right: source scribble, average scribble (Avg), best SDEdit scribble (SDEdit Best), and our method’s scribble (MLGD-F). Rows 2–3, top to bottom: five sample target portraits at the two extremes of the age continuum — 40 years old (youngest) and 79 years… view at source ↗

**Figure 17.** Figure 17: Age interpolation target — conditional samples. Five representative portrait images generated from each method’s output scribble, for the age interpolation target GinterpAge (uniform over male ages 40–79). Each column is generated with an identical seed; differences across rows reflect the scribble x ∗ alone. Since ControlNet conditioning is applied at scale 0.5 and the neutral prompt contains no explicit… view at source ↗

**Figure 18.** Figure 18: Age interpolation Sixteen representative portrait images generated by MLGD-F for a uniform age target over [40, 80], selected via projection onto the age axis (every ∼6th image out of 100 total). All images are generated with the same optimized scribble and a neutral prompt. 36 [PITH_FULL_IMAGE:figures/full_fig_p036_18.png] view at source ↗

**Figure 19.** Figure 19: Empirical Jacobian fidelity of SDXL-Lightning relative to SDXL-Base. Top row: a well-aligned photo prompt (left, rˆg/rˆs = 1.55; “a young boy in a red jacket on an empty beach. . . ”) and an outlier cartoon prompt (right, rˆg/rˆs = 198.4; “a cartoon child flying a kite on a hill, bold outlines...”), each showing SDXL-Base (teacher-left) and SDXL-Lightning (student-right) outputs from the same noise η. The… view at source ↗

read the original abstract

Generative models are powerful tools for sampling from a learned distribution $\mathcal{P}(Y \mid X)$, and inverse-design methods invert this map to find an input $x$ that produces a desired point output $y^*$. However, many design goals are naturally distributional rather than pointwise, incorporating the inherent uncertainty of $Y$ and targeting a specific form for it, a task not addressed by standard inverse design. To address this issue we introduce Conditional Distribution Matching (CDM), a new inverse-design problem class in generative modeling: given a joint distribution $\mathcal{P}(X, Y)$ and a target distribution $\mathcal{G}(Y)$, find an input $x^*$ whose induced conditional distribution $\mathcal{P}(Y \mid X = x^*)$ matches $\mathcal{G}$. We formally define two variants: Conditional Distribution Matching Sampling (CDMS) and Conditional Distribution Matching Optimization (CDMO). To solve these problems, we propose MLGD-F (Matching-Loss Guided Diffusion with a Fast inner sampler), a plug-and-play inference-time algorithm that combines a pretrained score-based diffusion model with a pretrained fast conditional sampler, requiring no additional training or fine-tuning. By leveraging single-step conditional sampling, MLGD-F enables tractable gradient computation, making the estimation of $\mathcal{P}(Y \mid X)$ both memory-efficient and computationally lightweight. We validate MLGD-F on synthetic benchmarks, structured image transformations, and generative editing optimization, demonstrating reliable recovery of inputs whose conditional distributions match diverse user-specified targets, including discrete mixtures and continuous low-rank supports.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines Conditional Distribution Matching as a new inverse-design task for full distributions rather than points and gives a no-retrain inference-time algorithm, but the single-step sampling approximation for gradients is a real weak point.

read the letter

The core contribution is a clean formulation of Conditional Distribution Matching: given a joint P(X,Y) and target G(Y), recover an x* such that the induced conditional P(Y|X=x*) equals G. They split it into sampling and optimization variants and supply MLGD-F, which runs a pretrained score-based diffusion model together with a fast conditional sampler at inference time only. No fine-tuning is needed, and single-step sampling keeps the gradient computation cheap and low-memory. That setup is genuinely new in the inverse-design literature and practical for anyone who already has decent pretrained generators. On the experiments side, the synthetic benchmarks and image-editing tasks show it can hit discrete mixtures and low-rank supports, which is more than most pointwise inverse-design methods deliver. The plug-and-play nature is the part that actually works well. The soft spot is exactly the one the stress-test flags. Replacing the inner conditional expectation with a single draw from the fast sampler introduces an approximation whose bias is not bounded or even characterized. For simple targets the fixed point may still land near the right x*, but for mixtures or structured supports the outer optimizer can converge to an x whose true multi-step conditional deviates from G on mass or geometry. The abstract claims reliable recovery, yet without explicit checks that the final induced distribution (not the surrogate loss) matches the target, it is hard to know how much the approximation costs. The citation pattern looks standard and the math is straightforward, but the lack of error analysis on the key surrogate is the main gap. This paper is for people already working on conditional generative models who need distributional control in design loops. A reader who cares about uncertainty shaping or editing with shape constraints will get immediate value from the problem statement and the algorithm sketch. It is coherent on its own terms and deserves a serious referee, mainly to pressure-test the approximation and ask for direct distribution-matching diagnostics in the results.

Referee Report

2 major / 2 minor

Summary. The paper introduces Conditional Distribution Matching (CDM) as a new inverse-design problem class: given joint P(X,Y) and target G(Y), recover x* such that the induced conditional P(Y|X=x*) matches G. It defines CDMS and CDMO variants and proposes MLGD-F, a plug-and-play inference-time algorithm that pairs a pretrained score-based diffusion model with a pretrained fast conditional sampler. MLGD-F uses single-step sampling to enable tractable gradient estimation of a matching loss without any retraining or fine-tuning. Experiments on synthetic benchmarks, structured image transformations, and generative editing tasks are reported to demonstrate recovery of inputs whose conditionals match diverse targets, including discrete mixtures and continuous low-rank supports.

Significance. If the central claims hold, the work would be significant for extending inverse design beyond pointwise targets to distributional ones in a training-free manner. The plug-and-play use of independently pretrained models is a clear strength, as is the focus on memory-efficient gradient computation via single-step sampling. This could enable new applications in generative editing and optimization where the shape of the output distribution (rather than a single point) is the design goal.

major comments (2)

[§3] §3 (MLGD-F algorithm description): the gradient of the matching loss is computed by replacing the inner conditional expectation E_{Y~P(Y|X)} with a single draw from the pretrained fast conditional sampler. No error bound, bias analysis, or convergence guarantee is supplied for this one-step surrogate. This is load-bearing for the central claim, because the fixed point of the outer optimization is asserted to recover an x* whose true multi-step conditional equals arbitrary G(Y), including discrete mixtures and low-rank supports where one-step outputs are known to deviate from the true conditional.
[Experiments] Experiments section (synthetic and image-editing results): the reported successes on targets with discrete or low-rank structure are not accompanied by controls that compare single-step versus multi-step inner sampling, nor by quantitative metrics (e.g., Wasserstein distance or support overlap) between the induced P(Y|X=x*) and the target G after optimization. Without these, it is impossible to determine whether the method actually achieves distributional matching or merely pointwise proximity.

minor comments (2)

[Abstract and §2] The abstract and §2 claim the method is 'parameter-free' at inference time, yet the choice of step size, number of outer optimization steps, and the specific fast sampler architecture are hyperparameters that affect results; a short discussion of their sensitivity would improve clarity.
[§2 and Experiments] Notation for the two problem variants (CDMS vs. CDMO) is introduced but not used consistently in the experimental tables; labeling which tasks correspond to which variant would aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The comments highlight important aspects of the theoretical justification and empirical validation of MLGD-F. Below we respond point-by-point to the major comments and describe the revisions we will make.

read point-by-point responses

Referee: [§3] §3 (MLGD-F algorithm description): the gradient of the matching loss is computed by replacing the inner conditional expectation E_{Y~P(Y|X)} with a single draw from the pretrained fast conditional sampler. No error bound, bias analysis, or convergence guarantee is supplied for this one-step surrogate. This is load-bearing for the central claim, because the fixed point of the outer optimization is asserted to recover an x* whose true multi-step conditional equals arbitrary G(Y), including discrete mixtures and low-rank supports where one-step outputs are known to deviate from the true conditional.

Authors: We agree that the single-step approximation for the inner expectation introduces bias whose magnitude is not bounded in the current manuscript, and that this approximation is central to the tractability claim. The manuscript relies on the empirical observation that the outer optimization still recovers x* values whose full multi-step conditionals align with G, even on targets with discrete or low-rank structure. In the revision we will add a dedicated paragraph in §3 discussing the nature of the bias (including why single-step sampling from a fast conditional model can still yield useful gradients for the outer objective) and will include a short empirical study of the approximation error on the synthetic benchmarks. We will not claim formal convergence guarantees, as deriving them would require additional assumptions on the fast sampler that go beyond the plug-and-play setting. revision: partial
Referee: [Experiments] Experiments section (synthetic and image-editing results): the reported successes on targets with discrete or low-rank structure are not accompanied by controls that compare single-step versus multi-step inner sampling, nor by quantitative metrics (e.g., Wasserstein distance or support overlap) between the induced P(Y|X=x*) and the target G after optimization. Without these, it is impossible to determine whether the method actually achieves distributional matching or merely pointwise proximity.

Authors: We accept that the current experimental section would be strengthened by quantitative metrics and explicit single-step versus multi-step controls. In the revised manuscript we will augment the synthetic benchmark results with (i) Wasserstein-2 distances and support-overlap statistics between the induced conditional (evaluated with multi-step sampling) and the target G, and (ii) side-by-side tables comparing optimization outcomes when the inner sampler is restricted to one step versus the full multi-step procedure. These additions will be placed in the main experiments section and will directly address whether distributional matching (rather than pointwise proximity) is achieved. revision: yes

Circularity Check

0 steps flagged

No circularity in CDM definition or MLGD-F algorithm

full rationale

The paper defines a new inverse-design problem class (CDM) and presents MLGD-F as an inference-time procedure that combines two independently pretrained models with no additional training or fine-tuning. No equations reduce a claimed prediction or result to a fitted parameter defined by the same data, no self-citation chain is load-bearing for the central claim, and the derivation consists of problem formalization plus algorithmic description rather than tautological reduction. The method is explicitly positioned as operating on external pretrained components, making the overall chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on the assumption that two independently trained models (a score-based diffusion model and a fast conditional sampler) are already available and sufficiently accurate. No new entities are postulated and no parameters are fitted inside the paper itself.

axioms (2)

domain assumption A pretrained score-based diffusion model accurately captures the joint distribution P(X, Y) and its score function.
Invoked to enable gradient-based search over inputs using the diffusion model.
domain assumption A pretrained fast conditional sampler can produce single-step samples from P(Y | X) that are accurate enough for gradient estimation.
Required for the memory-efficient and tractable gradient computation claimed for MLGD-F.

pith-pipeline@v0.9.0 · 5575 in / 1614 out tokens · 55620 ms · 2026-05-12T04:43:09.127348+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MLGD-F combines a pretrained score-based diffusion model with a pretrained fast conditional sampler... single-step conditional sampling... tractable gradient computation
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

L(x) = ∥P(Y|X=x) − G(Y)∥... plug-in distributional-loss estimator... AUTOGRAD through fϕ

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 1 internal anchor

[1]

Nearly $d$- linear convergence bounds for diffusion models via stochastic localization

Joe Benton, Valentin De Bortoli, Arnaud Doucet, and George Deligiannidis. Nearly $d$- linear convergence bounds for diffusion models via stochastic localization. InInternational Conference on Learning Representations, 2024. URL https://openreview.net/forum? id=r5njV3BsuD

work page 2024
[2]

Lappalainen, Jakob H

Sebastian Bischoff, Alana Darcher, Michael Deistler, Richard Gao, Franziska Gerken, Manuel Gloeckler, Lisa Haxel, Jaivardhan Kapoor, Janne K. Lappalainen, Jakob H. Macke, Guy Moss, Matthijs Pals, Felix C. Pei, Rachel Rapp, A. Erdem Sa ˘gtekin, Cornelius Schröder, Auguste Schulz, Zinovia Stefanidi, Shoji Toyota, Linda Ulmer, and Julius Vetter. A practical ...

work page 2024
[3]

Convergence of denoising diffusion models under the manifold hy- pothesis.Transactions on Machine Learning Research, 2022

Valentin De Bortoli. Convergence of denoising diffusion models under the manifold hy- pothesis.Transactions on Machine Learning Research, 2022. ISSN 2835-8856. URL https://openreview.net/forum?id=MhK5aXo3gB. Expert Certification

work page 2022
[4]

Monte carlo guided denoising diffusion models for Bayesian linear inverse problems

Gabriel Cardoso, Yazid Janati El Idrissi, Sylvain Le Corff, and Eric Moulines. Monte carlo guided denoising diffusion models for Bayesian linear inverse problems. InInternational Conference on Learning Representations, 2024. URL https://openreview.net/forum? id=nHESwXvxWK

work page 2024
[5]

Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions

Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, and Anru Zhang. Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. In International Conference on Learning Representations, 2023. URL https://openreview. net/forum?id=zyLVMgsZ0U_

work page 2023
[6]

Diffusion posterior sampling for general noisy inverse problems

Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InInternational Con- ference on Learning Representations, 2023. URL https://openreview.net/forum?id= OnD9zGAGT0k

work page 2023
[7]

Sobolev training for neural networks

Wojciech M Czarnecki, Simon Osindero, Max Jaderberg, Grzegorz Swirszcz, and Razvan Pascanu. Sobolev training for neural networks. InAdvances in Neural Information Processing Systems, volume 30, 2017

work page 2017
[8]

Robust deep learning–based protein sequence design using proteinmpnn.Science, 378(6615):49–56, 2022

Justas Dauparas, Ivan Anishchenko, Nathaniel Bennett, Hua Bai, Robert J Ragotte, Lukas F Milles, Basile IM Wicky, Alexis Courbet, Rob J de Haas, Neville Bethel, et al. Robust deep learning–based protein sequence design using proteinmpnn.Science, 378(6615):49–56, 2022

work page 2022
[9]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. In Advances in Neural Information Processing Systems, volume 34, pages 8780–8794, 2021

work page 2021
[10]

Diffusion posterior sampling is computationally unstable

Zehao Dou and Yang Song. Diffusion posterior sampling is computationally unstable. In International Conference on Machine Learning, 2024

work page 2024
[11]

Distillation of discrete diffusion by exact conditional distribution matching.arXiv preprint arXiv:2512.12889, 2025

Yansong Gao and Yu Sun. Distillation of discrete diffusion by exact conditional distribution matching.arXiv preprint arXiv:2512.12889, 2025

work page arXiv 2025
[12]

Learning generative models with sinkhorn divergences

Aude Genevay, Gabriel Peyré, and Marco Cuturi. Learning generative models with sinkhorn divergences. InInternational Conference on Artificial Intelligence and Statistics, volume 84, pages 1608–1617. PMLR, 2018

work page 2018
[13]

Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018

Rafael Gómez-Bombarelli, Jennifer N Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D Hirzel, Ryan P Adams, and Alán Aspuru-Guzik. Automatic chemical design using a data-driven continuous representation of molecules.ACS central science, 4(2):268–276, 2018

work page 2018
[14]

A kernel method for the two-sample-problem

Arthur Gretton, Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, and Alex Smola. A kernel method for the two-sample-problem. InAdvances in Neural Information Processing Systems, volume 19, 2006. 10

work page 2006
[15]

Protein conformational switches: from nature to design

Jeung-Hoi Ha and Stewart N Loh. Protein conformational switches: from nature to design. Chemistry–A European Journal, 18(26):7984–7999, 2012

work page 2012
[16]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[17]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pages 6840–6851, 2020

work page 2020
[18]

A class of statistics with asymptotically normal distribution

Wassily Hoeffding. A class of statistics with asymptotically normal distribution. InBreak- throughs in statistics: Foundations and basic theory, pages 308–334. Springer, 1992

work page 1992
[19]

Diffusion model for image generation-a survey

Xinrong Hu, Yuxin Jin, Jinxing Liang, Junping Liu, Ruiqi Luo, Min Li, and Tao Peng. Diffusion model for image generation-a survey. In2023 2nd International Conference on Artificial Intelligence, Human-Computer Interaction and Robotics (AIHCIR), pages 416–424. IEEE, 2023

work page 2023
[20]

Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

work page 1998
[21]

Mmd gan: Towards deeper understanding of moment matching network

Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabás Póczos. Mmd gan: Towards deeper understanding of moment matching network. InAdvances in Neural Information Processing Systems, volume 30, pages 2203–2213, 2017

work page 2017
[22]

Generative moment matching networks

Yujia Li, Kevin Swersky, and Richard Zemel. Generative moment matching networks. In International Conference on Machine Learning, volume 37, pages 1718–1727. PMLR, 2015

work page 2015
[23]

arXiv preprint arXiv:2402.13929 (2024) 5

Shanchuan Lin, Anran Wang, and Xiao Yang. Sdxl-lightning: Progressive adversarial diffusion distillation, 2024. URLhttps://arxiv.org/abs/2402.13929

work page arXiv 2024
[24]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019. URL https://openreview.net/forum? id=Bkg6RiCqY7

work page 2019
[25]

On the method of bounded differences

Colin McDiarmid. On the method of bounded differences. In J. Siemons, editor,Surveys in Combinatorics, 1989, volume 141 ofLondon Mathematical Society Lecture Note Series, pages 148–188. Cambridge University Press, 1989

work page 1989
[26]

SDEdit: Guided image synthesis and editing with stochastic differential equations

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2022. URL https://openreview. net/forum?id=aBsCjcPu_tE

work page 2022
[27]

K. B. Petersen and M. S. Pedersen. The matrix cookbook, nov 2012. URL http://www2. compute.dtu.dk/pubdb/pubs/3274-full.html. Version 20121115

work page 2012
[28]

SDXL: Improving latent diffusion models for high-resolution image synthesis

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. SDXL: Improving latent diffusion models for high-resolution image synthesis. InInternational Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=di52zR8xgf

work page 2024
[29]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning, volume 139, pages 8748–8763. PMLR, 2021

work page 2021
[30]

Progressive distillation for fast sampling of diffusion models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Conference on Learning Representations, 2022. URL https://openreview. net/forum?id=TIdIXIpzhoI

work page 2022
[31]

Inverse molecular design using machine learning: Generative models for matter engineering.Science, 361(6400):360–365, 2018

Benjamin Sanchez-Lengeling and Alán Aspuru-Guzik. Inverse molecular design using machine learning: Generative models for matter engineering.Science, 361(6400):360–365, 2018. 11

work page 2018
[32]

Adversarial diffusion distillation

Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. InEuropean Conference on Computer Vision, volume 86, pages 87–103. Springer,

work page
[33]

doi: 10.1007/978-3-031-73016-0_6

work page doi:10.1007/978-3-031-73016-0_6
[34]

Deep unsuper- vised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InInternational Conference on Machine Learning, volume 37, pages 2256–2265. PMLR, 2015

work page 2015
[35]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021. URL https://openreview. net/forum?id=St1giarCHLP

work page 2021
[36]

Loss-guided diffusion models for plug-and-play controllable generation

Jiaming Song, Qinsheng Zhang, Hongxu Yin, Morteza Mardani, Ming-Yu Liu, Jan Kautz, Yongxin Chen, and Arash Vahdat. Loss-guided diffusion models for plug-and-play controllable generation. InInternational Conference on Machine Learning, volume 202, pages 32483–32498. PMLR, 2023

work page 2023
[37]

Improved techniques for training consistency models

Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models. In International Conference on Learning Representations, 2024. URL https://openreview. net/forum?id=WNzy9bRDvG

work page 2024
[38]

Score-based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021. URL https://openreview. net/forum?id=PxTIG12RRHS

work page 2021
[39]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In International Conference on Machine Learning, volume 202, pages 32211–32252. PMLR, 2023

work page 2023
[40]

Diffusion models for time series forecasting: A survey.arXiv preprint arXiv:2507.14507, 2025

Chen Su, Zhengzhou Cai, Yuanhe Tian, Zhuochao Chang, Zihong Zheng, and Yan Song. Diffusion models for time series forecasting: A survey.arXiv preprint arXiv:2507.14507, 2025

work page arXiv 2025
[41]

H. J. Terry Suh, Max Simchowitz, Kaiqing Zhang, and Russ Tedrake. Do differentiable simulators give better policy gradients? InInternational Conference on Machine Learning, volume 162, pages 20668–20696. PMLR, 2022

work page 2022
[42]

Domain adaptation with conditional distribution matching and generalized label shift

Remi Tachet des Combes, Han Zhao, Yu-Xiang Wang, and Geoff Gordon. Domain adaptation with conditional distribution matching and generalized label shift. InAdvances in Neural Information Processing Systems, volume 33, 2020. URL https://arxiv.org/abs/2003. 04475

work page 2020
[43]

Trippe, Christian A

Luhuan Wu, Brian L. Trippe, Christian A. Naesseth, David M. Blei, and John P. Cunningham. Practical and asymptotically exact conditional sampling in diffusion models. InAdvances in Neural Information Processing Systems, 2023

work page 2023
[44]

arXiv preprint arXiv:2303.13336 , year=

Chenshuang Zhang, Chaoning Zhang, Sheng Zheng, Mengchun Zhang, Maryam Qamar, Sung- Ho Bae, and In So Kweon. A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai, 2023. URLhttps://arxiv.org/abs/2303.13336

work page arXiv 2023
[45]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023. 12 A Synthetic Simulations: Experimental Details This appendix provides the full experimental setup for the synthetic simulations described in Sec- tion...

work page 2023
[46]

0”, “1”, “8

with a dropout probability of0.2. Sampling using DDIM [34] withη= 0. Conditional consistency model.An improved Consistency Training (iCT) model [ 36] is trained to approximate the conditional distribution P(Y|X=x) . We utilize the iCT discretization curriculum N(k) = min(s 0 ·2 ⌊k/K ′⌋, s1) + 1,(3) with s0 = 10 and s1 = 1280 . The model is optimized using...

work page
[47]

superrealistic portrait photograph of a woman, extremely feminine features, studio lighting

Woman: "superrealistic portrait photograph of a woman, extremely feminine features, studio lighting"

work page
[48]

a superrealistic portrait photograph of a woman with masculine features, heavy brow ridge, studio lighting

Woman with masculine features: "a superrealistic portrait photograph of a woman with masculine features, heavy brow ridge, studio lighting"

work page
[49]

a superrealistic portrait photograph of a man with extremely feminine features, soft delicate face, high cheekbones, studio lighting

Man with feminine features: "a superrealistic portrait photograph of a man with extremely feminine features, soft delicate face, high cheekbones, studio lighting"

work page
[50]

a superrealistic portrait photograph of a man, extremely masculine features, studio lighting

Man: "a superrealistic portrait photograph of a man, extremely masculine features, studio lighting" This four-anchor discretisation approximates a continuous 1-D target on the gender-axis submanifold of CLIP space. Age interpolation target.The target is a uniform distribution over male portrait ages {40,41, . . . ,79}. For each integer age, images are gen...

work page
[51]

Teacher (unrolled) gradient:reverse-mode evaluation of ∇xbL⋆(x) has per-sample chain depthΘ(K ⋆)and total stored activationsΘ(K ⋆ ·n cond)(absent checkpointing)

work page
[52]

subtract out

Few-step student:reverse-mode evaluation of ∇xbLϕ(x) has per-sample chain depth Θ(Ks) and total stored activationsΘ(K s ·n cond). The memory ratio is thereforeΘ(K ⋆/Ks). Gradient discrepancy.Fix x∈ X and assume ncond, ntarget ≥2 , so that the unbiased MMD U-statistics definingbLϕ and bL⋆ are well-defined. In addition to Assumption 1, suppose the follow- i...

work page