arxiv: 2605.07078 · v1 · submitted 2026-05-08 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Test-Time Compositional Generalization in Diffusion Models via Concept Discovery

Anant Gupta, Christopher J. MacLellan, Tianyi Zhu, Zekun Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:38 UTC · model grok-4.3

classification 💻 cs.LG

keywords diffusion modelscompositional generalizationtest-time adaptationconcept discoveryscore functionproduct of expertsdensity modes

0 comments

The pith

Pretrained diffusion models can discover reusable density modes from their time-indexed score functions and compose them at test time to generate novel concept combinations from a single out-of-distribution query.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a diffusion model's learned score geometry for noisy data distributions at different timesteps encodes local density modes that act as reusable concepts. For any new query, the method recovers these modes through gradient ascent on the score, projects them into clean data space as Gaussians, selects a relevant subset via submodular optimization, and combines them into a product-of-experts model whose analytic score supports direct sampling. This matters for a sympathetic reader because it removes the need for a hand-curated concept library or retraining on all possible combinations, letting the model generalize compositionally on benchmarks such as ColorMNIST and CelebA where it beats query-only and nearest-class baselines. The approach works either by using the composed score directly through classifier-free guidance or by distilling it into a low-rank adapter plus new class embedding.

Core claim

The central claim is that the time-indexed score function s_θ(x_t, t) of a pretrained diffusion model contains local density modes corresponding to meaningful concepts. Gradient ascent on this score at multiple noise levels recovers those modes; they are then mapped to clean-space Gaussians, greedily selected with a submodular likelihood objective, and fused into a product-of-experts teacher whose closed-form score can be sampled directly or used to fine-tune a lightweight adapter, enabling compositional generation on held-out queries without any predefined concept library.

What carries the argument

Gradient ascent on the time-indexed score function s_θ(x_t, t) ≈ ∇_{x_t} log p_t(x_t) to recover local density modes at multiple timesteps, followed by Gaussian mapping, submodular prototype selection, and product-of-experts composition with an analytic score.

If this is right

The analytic product-of-experts score can be sampled directly via classifier-free guidance without further training.
The discovered modes can be distilled into a new class embedding plus low-rank adapter that improves performance on the target query.
Performance exceeds both a query-only baseline and the nearest trained class on held-out ColorMNIST and CelebA composition tasks.
No external concept library or pre-defined conditioning signals are required for the test-time process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If score geometry reliably encodes concepts, the same ascent-and-compose procedure could be tested on other score-based or flow-based generative models.
The method might reduce the data needed for fine-tuning when new attribute combinations appear, by reusing modes already latent in the pretrained model.
Submodular selection could be replaced or augmented with other relevance criteria to handle cases where modes overlap or conflict.
Success on simple benchmarks raises the question of whether the same density-mode recovery scales to higher-resolution natural images without additional regularization.

Load-bearing premise

The local density modes found by ascending the score at different noise levels are meaningful, query-relevant concepts that can be mapped to Gaussians and combined without introducing artifacts or omitting key elements.

What would settle it

A controlled test on a new composition benchmark where the product-of-experts samples or the adapted model consistently fail to produce the intended novel attribute combinations while still matching the query elements, or where the recovered modes do not correspond to human-interpretable factors.

Figures

Figures reproduced from arXiv: 2605.07078 by Anant Gupta, Christopher J. MacLellan, Tianyi Zhu, Zekun Wang.

**Figure 2.** Figure 2: Found local modes in pretrained DDPMs on LSUN Churches and CelebA-HQ. Empirically, we observe a related hierarchy in the modes of the learned noisy marginals. As the noise level increases, fine instance-level modes progressively merge into coarser concept prototypes, from which clear object-level classes emerge. See Appendix A. Taken together, these results motivate treating the modes of pt at intermedia… view at source ↗

**Figure 3.** Figure 3: Examples of found prototypes, given an OOD query of unseen compositions, and generated [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Modes of the noisy marginals pt(xt) learned by a DDPM on Fashion-MNIST [54]. The t = 0 panel shows clean reference images, while larger t panels show prototypes recovered by mode ascent at progressively noisier marginals. As t increases, fine instance details are smoothed away and modes consolidate into coarser object-level prototypes, suggesting that diffusion marginals encode an implicit hierarchy of dis… view at source ↗

**Figure 5.** Figure 5: Additional qualitative results for the ColorMNIST and CelebA. [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Concept discovery on novel background primitives, pink digit fixed from ColorMNIST. [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

read the original abstract

Compositional generalization requires models to produce novel configurations from familiar parts. In diffusion models, prior compositional generation methods typically assume that the relevant concepts or conditioning signals are already available. We instead ask whether a pretrained diffusion model can discover query-specific concepts from the time-indexed scores it learns for the noisy marginals $p_t(x_t)$ and compose them at test time. Given a single out-of-distribution query, our method performs gradient ascent on $s_\theta(x_t,t) \approx \nabla_{x_t}\log p_t(x_t)$ at multiple noising timesteps to recover local density modes, maps these modes into clean-space Gaussians, greedily selects relevant prototypes with a submodular likelihood objective, and combines them into a product-of-experts (PoE) teacher model with an analytic score. This teacher model can be sampled directly through classifier-free guidance or used to generate a sample pool for training a new class embedding and low-rank adapter. On held-out composition benchmarks built from ColorMNIST and CelebA, both the analytic PoE sampler and the low-rank adapted model outperform query-only and nearest trained-class baselines. These results suggest that the time-indexed score geometry of the diffusion model contains reusable density-mode concepts that support test-time compositional generation without a predefined concept library.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches a test-time pipeline that pulls density modes from diffusion score gradients and composes them via analytic PoE, beating simple baselines on synthetic benchmarks, but the evidence is too thin to confirm the modes are reliable query-relevant concepts.

read the letter

The main thing to know is that the authors describe a way to discover and compose concepts from a pretrained diffusion model's time-indexed score function at test time, without a prebuilt concept library, and claim it improves over basic baselines on held-out combinations from ColorMNIST and CelebA. The pipeline uses multi-timestep gradient ascent on the score to find local modes, maps them to Gaussians, applies submodular selection, and builds an analytic product-of-experts model that can be sampled directly or used to train a LoRA adapter. That specific sequence of steps is new relative to prior compositional diffusion work that assumes predefined concepts or additive conditioning. The abstract presents a coherent method and notes outperformance, which at least shows the score geometry contains some reusable structure worth exploring. The soft spots are clear and material. No numbers, error bars, ablation results, or experimental protocol appear, so it is impossible to judge effect sizes or robustness. The central assumption that gradient-ascent modes correspond to stable, semantically aligned concepts rather than artifacts or training biases is not verified in the provided description, and the stress-test concern about non-semantic modes landing on the PoE composition is reasonable given the lack of checks. Free parameters around timestep choice and submodular objective also introduce degrees of freedom that need explicit testing. This is for researchers working on test-time adaptation and compositional generalization in generative models. A reader hunting for concrete ideas on leveraging pretrained diffusion scores without retraining would get value from the technical outline. It deserves a serious referee because the idea is distinct, the benchmarks are at least constructed to test the claim, and the method is grounded enough in standard optimization tools that review could clarify whether the modes actually deliver reliable composition. I would send it to peer review rather than desk reject.

Referee Report

2 major / 1 minor

Summary. The paper proposes a test-time method for compositional generalization in pretrained diffusion models without a predefined concept library. Given a single OOD query, it performs gradient ascent on the score function s_θ(x_t, t) at multiple noising timesteps to recover local density modes, maps these to clean-space Gaussians, greedily selects relevant prototypes via a submodular likelihood objective, and composes them into an analytic product-of-experts (PoE) teacher model. This teacher can be sampled directly via classifier-free guidance or used to generate data for training a new class embedding and LoRA adapter. The approach is evaluated on held-out composition benchmarks derived from ColorMNIST and CelebA, where both the PoE sampler and adapted model outperform query-only and nearest-class baselines.

Significance. If the recovered modes prove to be stable, distinct, and semantically aligned with query elements, the work would establish that the time-indexed score geometry of diffusion models encodes reusable density-mode concepts usable for test-time composition. This could reduce reliance on curated concept libraries and enable more flexible handling of novel combinations in generative models, with the analytic PoE and adaptation steps providing concrete implementation paths.

major comments (2)

[§3] §3 (Method): The central claim that gradient ascent on s_θ(x_t, t) at multiple timesteps recovers query-relevant, reusable concepts is load-bearing, yet the manuscript provides no verification (e.g., stability across runs, semantic alignment with query attributes, or distinction from diffusion artifacts) that these modes survive the Gaussian mapping and submodular selection without introducing spurious elements or missing key factors.
[§4] §4 (Experiments): The reported outperformance on ColorMNIST and CelebA benchmarks lacks quantitative metrics, error bars, ablation results on the number/choice of timesteps or submodular objective parameters, and full protocol details, preventing assessment of whether the gains are robust or attributable to the discovered concepts rather than procedural degrees of freedom.

minor comments (1)

[Abstract and §3] The abstract and method description would benefit from explicit notation for the submodular objective function and the precise form of the analytic PoE score to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed report. We address each major comment below and describe the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§3] §3 (Method): The central claim that gradient ascent on s_θ(x_t, t) at multiple timesteps recovers query-relevant, reusable concepts is load-bearing, yet the manuscript provides no verification (e.g., stability across runs, semantic alignment with query attributes, or distinction from diffusion artifacts) that these modes survive the Gaussian mapping and submodular selection without introducing spurious elements or missing key factors.

Authors: We agree that explicit verification of the recovered modes would better substantiate the central claim. The current manuscript emphasizes end-to-end compositional performance rather than intermediate diagnostics. In the revision we will add a dedicated analysis subsection to §3 that reports: (i) stability of the selected prototypes across five independent gradient-ascent runs (measured by set overlap and mean pairwise distance of the mapped Gaussians), (ii) semantic alignment via attribute classifiers trained on the source datasets and applied to the discovered modes, and (iii) a controlled comparison against modes obtained from random starting points to separate query-relevant concepts from generic diffusion artifacts. These additions will be presented without changing the core algorithm. revision: yes
Referee: [§4] §4 (Experiments): The reported outperformance on ColorMNIST and CelebA benchmarks lacks quantitative metrics, error bars, ablation results on the number/choice of timesteps or submodular objective parameters, and full protocol details, preventing assessment of whether the gains are robust or attributable to the discovered concepts rather than procedural degrees of freedom.

Authors: We concur that additional experimental rigor is required for a convincing evaluation. While the original submission already includes mean performance numbers on the held-out benchmarks, we will expand §4 and the appendix to provide: standard deviations across at least five random seeds as error bars, systematic ablations on the number and selection of timesteps (e.g., 5/10/20) and on the submodular objective hyperparameters, and a complete experimental protocol listing all hyperparameters, data splits, baseline implementations, and compute details. These changes will allow readers to assess both robustness and the contribution of the discovered concepts. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is an independent algorithmic procedure

full rationale

The paper describes a test-time procedure that applies standard gradient ascent to the pretrained score function s_θ(x_t, t) at multiple timesteps, maps recovered modes to Gaussians, performs submodular selection, and forms a product-of-experts score. None of these steps reduce to the target compositional result by construction, nor do they rely on fitted parameters tuned to the held-out benchmarks or on self-citation chains that presuppose the claimed discovery. The derivation is therefore self-contained: the method is a composition of off-the-shelf optimization primitives whose outputs are then evaluated empirically, without the result being presupposed in the inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The central claim depends on several optimization and modeling choices that are not derived from first principles and on the assumption that score-function modes are semantically meaningful and composable.

free parameters (2)

number and choice of noising timesteps
Selected to recover local density modes; not derived from the model
submodular objective parameters
Control greedy prototype selection; tuned for the likelihood objective

axioms (2)

standard math Gradient ascent on the score function recovers local modes of the noisy marginals
Invoked to discover concepts from s_theta(x_t, t)
domain assumption Local modes can be accurately mapped to clean-space Gaussians
Required for the product-of-experts construction

invented entities (1)

query-specific density-mode concepts no independent evidence
purpose: Reusable building blocks extracted at test time
Postulated as extractable from the pretrained score geometry without independent verification outside the method

pith-pipeline@v0.9.0 · 5539 in / 1556 out tokens · 44738 ms · 2026-05-11T01:38:41.756936+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
performs gradient ascent on s_θ(x_t,t)≈∇_{x_t} log p_t(x_t) at multiple noising timesteps to recover local density modes, maps these modes into clean-space Gaussians, greedily selects relevant prototypes with a submodular likelihood objective, and combines them into a product-of-experts (PoE) teacher model with an analytic score
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear
the time-indexed score geometry of the diffusion model contains reusable density-mode concepts

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 2 internal anchors

[1]

Fodor and Zenon W

Jerry A. Fodor and Zenon W. Pylyshyn. Connectionism and cognitive architecture: A critical analysis.Cognition, 28(1–2):3–71, 1988

work page 1988
[2]

Lake, Tomer D

Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, and Samuel J. Gershman. Building machines that learn and think like people.Behavioral and Brain Sciences, 40:e253, 2017

work page 2017
[3]

Lake and Marco Baroni

Brenden M. Lake and Marco Baroni. Generalization without systematicity: On the compo- sitional skills of sequence-to-sequence recurrent networks. InProceedings of the 35th Inter- national Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 2873–2882. PMLR, 2018

work page 2018
[4]

Laura Ruis, Jacob Andreas, Marco Baroni, Diane Bouchacourt, and Brenden M. Lake. A benchmark for systematic generalization in grounded language understanding. InAdvances in Neural Information Processing Systems, volume 33, pages 19861–19872, 2020

work page 2020
[5]

Neural module networks

Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Neural module networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 39–48, 2016

work page 2016
[6]

Learning to compose neural networks for question answering

Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Learning to compose neural networks for question answering. InProceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1545–1554. Association for Computational Linguistics, 2016

work page 2016
[7]

Hopfield

John J. Hopfield. Neural networks and physical systems with emergent collective computational abilities.Proceedings of the National Academy of Sciences, 79(8):2554–2558, 1982

work page 1982
[8]

Amit, Hanoch Gutfreund, and H

Daniel J. Amit, Hanoch Gutfreund, and H. Sompolinsky. Spin-glass models of neural networks. Physical Review A, 32(2):1007–1018, 1985

work page 1985
[9]

Geoffrey E. Hinton. Training products of experts by minimizing contrastive divergence.Neural Computation, 14(8):1771–1800, 2002

work page 2002
[10]

Compositional visual generation and inference with energy based models

Yilun Du, Shuang Li, and Igor Mordatch. Compositional visual generation and inference with energy based models. InAdvances in Neural Information Processing Systems, volume 33, pages 6637–6647, 2020

work page 2020
[11]

Tenenbaum

Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, and Joshua B. Tenenbaum. Compositional visual generation with composable diffusion models. InEuropean Conference on Computer Vision (ECCV), pages 423–439, 2022

work page 2022
[12]

Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, and Will Grathwohl

Yilun Du, Conor Durkan, Robin Strudel, Joshua B. Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, and Will Grathwohl. Reduce, reuse, recycle: Com- positional generation with energy-based diffusion models and mcmc. InProceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learni...

work page 2023
[13]

Network dissec- tion: Quantifying interpretability of deep visual representations

David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. Network dissec- tion: Quantifying interpretability of deep visual representations. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3319–3327, 2017

work page 2017
[14]

Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)

Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). InInternational Conference on Machine Learning, pages 2668–2677. PMLR, 2018

work page 2018
[15]

Zou, and Been Kim

Amirata Ghorbani, James Wexler, James Y . Zou, and Been Kim. Towards automatic concept- based explanations. InAdvances in Neural Information Processing Systems, volume 32, 2019

work page 2019
[16]

This looks like that: Deep learning for interpretable image recognition

Chaofan Chen, Oscar Li, Chaofan Tao, Alina Jade Barnett, Jonathan Su, and Cynthia Rudin. This looks like that: Deep learning for interpretable image recognition. InAdvances in Neural Information Processing Systems (NeurIPS), 2019. 10

work page 2019
[17]

Tenenbaum, and Antonio Torralba

Nan Liu, Yilun Du, Shuang Li, Joshua B. Tenenbaum, and Antonio Torralba. Unsupervised compositional concepts discovery with text-to-image generative models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2085–2095, 2023

work page 2085
[18]

Mean shift: A robust approach toward feature space analysis

Dorin Comaniciu and Peter Meer. Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603–619, 2002

work page 2002
[19]

Mode- seeking clustering and density ridge estimation via direct estimation of density-derivative-ratios

Hiroaki Sasaki, Takafumi Kanamori, Aapo Hyvärinen, Gang Niu, and Masashi Sugiyama. Mode- seeking clustering and density ridge estimation via direct estimation of density-derivative-ratios. Journal of Machine Learning Research, 18(180):1–47, 2018

work page 2018
[20]

Rates of convergence for the cluster tree.Advances in Neural Information Processing Systems, 23, 2010

Kamalika Chaudhuri and Sanjoy Dasgupta. Rates of convergence for the cluster tree.Advances in Neural Information Processing Systems, 23, 2010

work page 2010
[21]

MacLellan

Zekun Wang, Ethan Haarer, Tianyi Zhu, Zhiyi Dai, and Christopher J. MacLellan. Deep taxonomic networks for unsupervised hierarchical prototype discovery. InAdvances in Neural Information Processing Systems, 2025

work page 2025
[22]

Weiss, Niru Maheswaranathan, and Surya Ganguli

Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InInternational Conference on Machine Learning (ICML), 2015

work page 2015
[23]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pages 6840–6851, 2020

work page 2020
[24]

Kingma, Tim Salimans, Ben Poole, and Jonathan Ho

Diederik P. Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

work page 2021
[25]

Ladder variational autoencoders

Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. Ladder variational autoencoders. InAdvances in Neural Information Processing Systems (NeurIPS), 2016

work page 2016
[26]

NV AE: A deep hierarchical variational autoencoder

Arash Vahdat and Jan Kautz. NV AE: A deep hierarchical variational autoencoder. InAdvances in Neural Information Processing Systems (NeurIPS), 2020

work page 2020
[27]

A phase transition in diffusion models reveals the hierarchical nature of data.Proceedings of the National Academy of Sciences, 122(1):e2408799121, 2025

Antonio Sclocchi, Alessandro Favero, and Matthieu Wyart. A phase transition in diffusion models reveals the hierarchical nature of data.Proceedings of the National Academy of Sciences, 122(1):e2408799121, 2025

work page 2025
[28]

Dream- time: An improved optimization strategy for diffusion-guided 3d generation

Yukun Huang, Jianan Wang, Yukai Shi, Boshi Tang, Xianbiao Qi, and Lei Zhang. Dream- time: An improved optimization strategy for diffusion-guided 3d generation. InInternational Conference on Learning Representations (ICLR), 2024

work page 2024
[29]

Good-enough compositional data augmentation

Jacob Andreas. Good-enough compositional data augmentation. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7556–7566. Association for Computational Linguistics, 2020

work page 2020
[30]

Compositional generalization for neural semantic parsing via span-level supervised attention

Pengcheng Yin, Hao Fang, Graham Neubig, Adam Pauls, Emmanouil Antonios Platanios, Yu Su, Sam Thomson, and Jacob Andreas. Compositional generalization for neural semantic parsing via span-level supervised attention. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologi...

work page 2021
[31]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[32]

A connection between score matching and denoising autoencoders.Neural Computation, 23(7):1661–1674, 2011

Pascal Vincent. A connection between score matching and denoising autoencoders.Neural Computation, 23(7):1661–1674, 2011

work page 2011
[33]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

work page 2019
[34]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (ICLR), 2021. 11

work page 2021
[35]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022

work page 2022
[36]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInterna- tional Conference on Learning Representations (ICLR), 2015

work page 2015
[37]

An empirical Bayes approach to statistics

Herbert Robbins. An empirical Bayes approach to statistics. InProceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 157–163, 1956

work page 1956
[38]

Miyasawa

K. Miyasawa. An empirical Bayes estimator of the mean of a normal population.Bulletin of the International Statistical Institute, 38(4):181–188, 1961

work page 1961
[39]

Tweedie’s formula and selection bias.Journal of the American Statistical Association, 106(496):1602–1614, 2011

Bradley Efron. Tweedie’s formula and selection bias.Journal of the American Statistical Association, 106(496):1602–1614, 2011

work page 2011
[40]

Implicit generation and modeling with energy-based models

Yilun Du and Igor Mordatch. Implicit generation and modeling with energy-based models. In Advances in Neural Information Processing Systems (NeurIPS), 2019

work page 2019
[41]

Nemhauser, Laurence A

George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. An analysis of approxima- tions for maximizing submodular set functions—I.Mathematical Programming, 14(1):265–294, 1978

work page 1978
[42]

Submodular function maximization

Andreas Krause and Daniel Golovin. Submodular function maximization. In Lucas Bordeaux, Youssef Hamadi, and Pushmeet Kohli, editors,Tractability: Practical Approaches to Hard Problems, pages 71–104. Cambridge University Press, 2014

work page 2014
[43]

Bermano, Gal Chechik, and Daniel Cohen-Or

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, and Daniel Cohen-Or. An image is worth one word: Personalizing text-to-image generation using textual inversion. InInternational Conference on Learning Representations (ICLR), 2023

work page 2023
[44]

Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023

work page 2023
[45]

Multi- concept customization of text-to-image diffusion

Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. Multi- concept customization of text-to-image diffusion. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

work page 2023
[46]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. InProceedings of the IEEE International Conference on Computer Vision, pages 3730–3738, 2015

work page 2015
[47]

GANs trained by a two time-scale update rule converge to a local Nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. InAdvances in Neural Information Processing Systems, volume 30, 2017

work page 2017
[48]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InInterna- tional Conference on Machine Learning, 2021

work page 2021
[49]

Improved precision and recall metric for assessing generative models

Tuomas Kynkäänniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Improved precision and recall metric for assessing generative models. InAdvances in Neural Information Processing Systems, 2019

work page 2019
[50]

Hashimoto

Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, and Tatsunori B. Hashimoto. Diffusion-LM improves controllable text generation. InAdvances in Neural Information Processing Systems, 2022

work page 2022
[51]

Large language diffusion models

Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, JUN ZHOU, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 12

work page 2025
[52]

Diffusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. InRobotics: Science and Systems, 2023

work page 2023
[53]

Lake, Ruslan Salakhutdinov, and Joshua B

Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum. Human-level concept learning through probabilistic program induction.Science, 350(6266):1332–1338, 2015

work page 2015
[54]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017

work page internal anchor Pith review arXiv 2017
[55]

Hutchinson

Michael F. Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines.Communications in Statistics – Simulation and Computation, 19(2):433–450, 1990. A Diffusion marginals as a hierarchy of modes (a)t= 0 (b)t= 50 (c)t= 200 (d)t= 300 (e)t= 500 Figure 4: Modes of the noisy marginals pt(xt) learned by a DDPM on Fa...

work page 1990