pith. machine review for the scientific record. sign in

arxiv: 2604.06779 · v2 · submitted 2026-04-08 · 💻 cs.AI

Recognition: no theorem link

VASR: Variance-Aware Systematic Resampling for Reward-Guided Diffusion

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:04 UTC · model grok-4.3

classification 💻 cs.AI
keywords sequential monte carloreward-guided diffusionresamplinglineage collapsevariance decompositionsystematic resamplingdiffusion modelssample quality
0
0 comments X

The pith

A variance-decomposition framework for SMC resampling prevents lineage collapse in reward-guided diffusion by using optimal mass allocation and systematic steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard multinomial resampling in Sequential Monte Carlo for reward-guided diffusion causes rapid lineage collapse because a few high-reward particles take over the population. It decomposes the total variance into continuation variance from stochastic future steps and residual variance from the resampling procedure itself. This leads to VASR, which sets particle mass proportional to current weight times the exponential of the reward to minimize continuation variance, then applies systematic resampling to keep residual variance low. For cases with noisy intermediate rewards, VASR-Max deliberately biases toward high-reward selection. The resulting methods improve sample quality on image benchmarks and match stronger but slower baselines on text-to-image tasks while adding only linear cost.

Core claim

High offspring-count variance under multinomial resampling drives lineage collapse in reward-guided diffusion SMC; separating continuation variance V_t^cont from residual variance V_t^res reveals that variance-optimal mass allocation m_t proportional to w_t e^{r_t} minimizes the first term while systematic resampling controls the second, and the biased VASR-Max variant handles noisy rewards in latent diffusion models.

What carries the argument

The variance-decomposition into continuation variance V_t^cont and residual variance V_t^res, together with mass allocation m_t ∝ w_t e^{r_t} followed by systematic resampling.

If this is right

  • On MNIST and CIFAR-10, VASR reaches up to 26% better FID than prior SMC methods at matched compute.
  • VASR runs approximately 66 times faster than MCTS-based value methods while matching their compute budget.
  • On text-to-image generation, VASR-Max outperforms the strongest SMC baseline across compute budgets.
  • VASR-Max reaches within 2.5-3% of MCTS reward scores at high budgets while remaining substantially faster.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same variance split could be tested in non-diffusion sequential samplers such as particle filters for time-series data.
  • VASR-Max bias might be tuned further for tasks where reward noise scales with model size.
  • The linear overhead suggests the method could scale to video or 3D diffusion without extra training.
  • Pairing the mass allocation with existing low-variance estimators from statistics might compound the gains.

Load-bearing premise

That separating continuation variance from residual variance correctly identifies the main cause of lineage collapse and that allocating mass proportional to weights times exponential rewards optimally minimizes continuation variance under the given reward guidance.

What would settle it

Measure particle diversity or effective sample size over multiple resampling steps with and without VASR on the same diffusion trajectory; if diversity still drops at the same rate or final FID shows no improvement, the variance-minimization claim does not hold.

Figures

Figures reproduced from arXiv: 2604.06779 by Jia Yi Zhang, Sagnik Mukherjee, Shivanshu Shekhar, Tong Zhang.

Figure 1
Figure 1. Figure 1: Qualitative comparison on CIFAR-10 (class: car). Samples generated with 106 NFEs using reward-weighted final selection. FKD and TDS exhibit pronounced mode collapse, yielding visually similar samples, while DPS produces out-of-distribution images with degraded fidelity. In contrast, FVD and DTS preserve significantly higher diversity while maintaining alignment with the data distribution. Posterior Samplin… view at source ↗
Figure 2
Figure 2. Figure 2: Scaling behavior with increasing compute (NFEs). We plot FID as a function of the number of function evaluations (NFEs) on CIFAR-10 and MNIST. FVD consistently outperforms FKD and DTS across all compute budgets and settings, demonstrating favorable scaling while maintaining full parallelism. Results. On single-class MNIST, FVD achieves the lowest FID and MMD alongside the highest reward across all baseline… view at source ↗
Figure 3
Figure 3. Figure 3: Reward scaling with compute in text-to-image generation. All methods are evaluated under matched NFE budgets, and each point reports the benchmark average of the best reward obtained per prompt. FVD achieves strong performance across both reward models, avoiding the over-optimization artifacts observed in FKD while offering substantially better runtime efficiency than DTS. Text-to-Image Generation. In this… view at source ↗
Figure 4
Figure 4. Figure 4: Particle collapse dynamics under FKD vs. FVD (K=1000, λ=1.0, CIFAR-10). Left: FVD maintains a consistently lower death rate throughout denoising, whereas FKD exhibits aggres￾sive particle removal. Right: FVD preserves significantly more distinct ancestral lineages, avoiding the rapid collapse observed in FKD. Overall, FVD retains ∼ 10× more lineages, demonstrating improved diversity and stability. Particle… view at source ↗
Figure 5
Figure 5. Figure 5: FID vs. wall-clock time on CIFAR￾10. FVD achieves lower FID across all time budgets and is ∼66× faster than DTS at matched NFEs. Notably, even at its longest runtime, DTS fails to reach the worst FID achieved by FVD, highlighting the substantial efficiency gap. Inference Efficiency. Inference-time efficiency is a key consideration for practical deployment, as methods that significantly increase wall-clock … view at source ↗
Figure 6
Figure 6. Figure 6: Effect of target absorption rate α ∗ on reward–diversity trade-off (CIFAR-10, 106 NFEs). Left: Mean reward increases and variance decreases with α ∗ as stronger selection pressure favors high-reward samples. Right: MMD is minimized at intermediate α ∗ , indicating optimal distributional coverage; large α ∗ leads to increased MMD due to mode collapse. These results show that α ∗ provides a direct and interp… view at source ↗
Figure 7
Figure 7. Figure 7: Effect of adaptive vs. fixed λ on FID. Adaptive λ provides consistent improvements across a wide range of initializations, with the largest gains observed when the fixed λ is poorly calibrated. Adaptive vs. Fixed λ. We compare the pro￾posed adaptive λ scheme based on Robbins– Monro updates against fixed λ across λ0 ∈ {0.1, 0.3, 0.5, 0.7, 0.9} on CIFAR-10, while keep￾ing all other hyperparameters constant. … view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative comparison on aesthetic optimization. Samples from each method for a 50K NFE budget. As emphasized earlier, FKD achieves a high aesthetic score but produces samples that have overfit to the reward model and do not make sense [39], a symptom of over-optimization. FVD generates visually appealing images that remain faithful to the prompt. A.1 Particle Death Analysis We provide a detailed analysis… view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative comparison on DrawBench (ImageReward). Best samples generated by each method for each prompt at a 50K NFE budget. FVD produces samples with higher prompt alignment and perceptual quality compared to DTS and FKD, consistent with the quantitative results in Figure 3b. These results confirm that FVD achieves a more desirable balance between exploration and exploita￾tion: it maintains diversity by … view at source ↗
Figure 10
Figure 10. Figure 10: Reward-rank distribution of removed particles per resampling step (K=1000, λ=1.0, CIFAR-10 class 1). FKD removes particles across the entire reward spectrum, including many high-reward samples, indicating weak selection specificity. In contrast, FVD concentrates removals among low-reward particles, preserving high-reward trajectories and maintaining a more faithful approximation to the target distribution… view at source ↗
Figure 11
Figure 11. Figure 11: FID vs. rebirth_eta for FVD on CIFAR-10 (class 0) using 105 NFEs, averaged over five seeds (shaded region = ± std). Deter￾ministic rebirth (rebirth_eta = 0) leads to poor FID due to diversity collapse, while mod￾erate stochasticity (rebirth_eta = 0.4) yields the best performance. For rebirth_eta > 0, rebirth becomes stochas￾tic, causing the new particle to deviate from the donor and explore a nearby denoi… view at source ↗
read the original abstract

Sequential Monte Carlo (SMC) samplers for reward-guided diffusion models often suffer from rapid lineage collapse: a few high-reward particles dominate the population within a handful of resampling steps, destroying diversity and degrading sample quality. We propose a variance-decomposition framework for reward-guided diffusion SMC that separates continuation variance $V_t^{\mathrm{cont}}$ from residual variance $V_t^{\mathrm{res}}$, revealing that high offspring-count variance under the commonly used multinomial resampling drives this collapse. This motivates \textsc{VASR} (Variance-Aware Systematic Resampling), which addresses both variance terms via variance-optimal mass allocation $m_t \propto w_t e^{r_t}$ (minimizing $V_t^{\mathrm{cont}}$) and systematic resampling (controlling $V_t^{\mathrm{res}}$). For latent diffusion models where intermediate rewards are noisy due to stochastic continuations, we propose \textsc{VASR-Max}, a deliberately biased high-selection variant for variance-sensitive reward optimization. Both methods are training-free, fully parallelizable, and add only linear overhead. On MNIST and CIFAR-10, VASR achieves as high as $26\%$ better FID than prior SMC methods while remaining 66 times faster than MCTS-based value methods at matched compute. On text-to-image generation, \textsc{VASR-Max} consistently outperforms the strongest SMC baseline across compute budgets and matches MCTS-based methods within 2.5--3% reward at high budgets while being approximately times faster.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper claims that standard multinomial resampling in SMC for reward-guided diffusion causes rapid lineage collapse due to high offspring-count variance. It introduces a variance decomposition V_t = V_t^cont + V_t^res, derives a mass allocation m_t ∝ w_t e^{r_t} claimed to minimize the continuation term, and pairs it with systematic resampling to control the residual term. This yields the training-free VASR algorithm and a biased VASR-Max variant for noisy intermediate rewards. Empirical claims include up to 26% FID improvement over prior SMC on MNIST/CIFAR-10, 66x speedup versus MCTS at matched compute, and competitive reward matching on text-to-image tasks with substantial speed gains.

Significance. If the variance decomposition is correctly minimized by the stated allocation and the empirical gains are reproducible with proper controls, the work would offer a lightweight, parallelizable improvement to SMC sampling for reward-guided diffusion that directly targets lineage collapse without retraining. The training-free property and linear overhead are practical strengths that could see adoption in latent diffusion pipelines.

major comments (3)
  1. [Variance-decomposition framework (around the definition of V_t^cont and the mass-allocation rule)] The central claim that m_t ∝ w_t e^{r_t} is variance-optimal for minimizing V_t^cont rests on an unverified substitution of the diffusion transition kernel and noisy reward into the continuation variance expression. No explicit critical-point calculation or convexity argument is supplied to confirm that this proportionality attains the minimum (or even a local minimum) rather than requiring additional boundedness conditions on r_t. This step is load-bearing for the motivation of VASR over multinomial resampling.
  2. [Experimental results on MNIST, CIFAR-10 and text-to-image] Table or results section reporting MNIST/CIFAR-10 FID: the 26% improvement is stated without error bars, number of independent runs, or ablation isolating the contribution of the variance-optimal allocation versus systematic resampling alone. At matched compute, the comparison to MCTS-based value methods also lacks protocol details (particle count N, exact diffusion schedule, reward model), preventing attribution of gains to the claimed mechanism.
  3. [VASR-Max variant description and text-to-image experiments] For VASR-Max, the deliberate bias introduced for variance-sensitive optimization is described but not accompanied by a quantitative bound on the introduced bias or a demonstration that the bias-variance trade-off improves reward optimization under the noisy-reward regime. This is central to the claim that VASR-Max matches MCTS reward within 2.5–3% at high budgets.
minor comments (3)
  1. [Abstract] Abstract contains the incomplete phrase 'being approximately times faster'; the numerical factor is missing.
  2. [Method overview] Notation for continuation variance V_t^cont and residual variance V_t^res is introduced without an early equation block that defines them in terms of the offspring counts and the diffusion kernel.
  3. [Experiments] Reproducibility would benefit from an explicit statement of the number of particles, diffusion timesteps, and reward-function implementation details in the main experimental section rather than only in supplementary material.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below and will incorporate revisions to strengthen the variance derivation, experimental reporting, and analysis of VASR-Max.

read point-by-point responses
  1. Referee: [Variance-decomposition framework (around the definition of V_t^cont and the mass-allocation rule)] The central claim that m_t ∝ w_t e^{r_t} is variance-optimal for minimizing V_t^cont rests on an unverified substitution of the diffusion transition kernel and noisy reward into the continuation variance expression. No explicit critical-point calculation or convexity argument is supplied to confirm that this proportionality attains the minimum (or even a local minimum) rather than requiring additional boundedness conditions on r_t. This step is load-bearing for the motivation of VASR over multinomial resampling.

    Authors: We agree that an explicit derivation would improve clarity. In the revision we will add a step-by-step critical-point calculation of the continuation variance V_t^cont under the diffusion transition kernel and reward model, together with a convexity argument and the precise boundedness conditions required on r_t. This will rigorously establish that m_t ∝ w_t e^{r_t} attains the minimum. revision: yes

  2. Referee: [Experimental results on MNIST, CIFAR-10 and text-to-image] Table or results section reporting MNIST/CIFAR-10 FID: the 26% improvement is stated without error bars, number of independent runs, or ablation isolating the contribution of the variance-optimal allocation versus systematic resampling alone. At matched compute, the comparison to MCTS-based value methods also lacks protocol details (particle count N, exact diffusion schedule, reward model), preventing attribution of gains to the claimed mechanism.

    Authors: We will revise the experimental section to report FID results with error bars from at least five independent random seeds, include ablations that isolate the mass-allocation rule from systematic resampling, and supply full protocol details for all MCTS baselines (particle count N, diffusion schedule, reward model architecture and compute budget). These additions will enable clear attribution of the reported gains. revision: yes

  3. Referee: [VASR-Max variant description and text-to-image experiments] For VASR-Max, the deliberate bias introduced for variance-sensitive optimization is described but not accompanied by a quantitative bound on the introduced bias or a demonstration that the bias-variance trade-off improves reward optimization under the noisy-reward regime. This is central to the claim that VASR-Max matches MCTS reward within 2.5–3% at high budgets.

    Authors: We will augment the VASR-Max section with additional experiments that quantify the bias-variance trade-off across a range of bias parameters and noisy-reward settings, showing improved reward optimization relative to unbiased baselines. While a closed-form theoretical bound on the bias is difficult to obtain in the general noisy-reward case, the empirical results will substantiate the practical utility of the deliberate bias at high compute budgets. revision: partial

Circularity Check

0 steps flagged

No circularity; variance decomposition and mass allocation derived independently from first principles

full rationale

The paper introduces a variance decomposition V_t = V_t^cont + V_t^res for reward-guided SMC, then proposes m_t ∝ w_t e^{r_t} as the allocation that minimizes the continuation term and pairs it with systematic resampling to control residual variance. No step reduces a claimed result to a fitted parameter, self-citation, or input by construction; the optimality claim is presented as following from the decomposition rather than presupposing the algorithm. The method is training-free and the reported gains are empirical comparisons against baselines, not internal predictions. This is a standard non-circular derivation of a new resampling rule from an explicit variance model.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The framework rests on a domain-specific variance decomposition and the claim that multinomial resampling variance is the primary collapse driver; no free parameters are explicitly fitted in the abstract, and the new entities are algorithmic rather than physical postulates.

axioms (2)
  • domain assumption Reward-guided SMC variance decomposes cleanly into continuation variance V_t^cont and residual variance V_t^res
    Invoked to motivate the design of VASR and the mass allocation rule.
  • domain assumption High offspring-count variance under multinomial resampling is the main cause of rapid lineage collapse
    Used to justify switching to systematic resampling.
invented entities (2)
  • VASR no independent evidence
    purpose: Variance-aware systematic resampling algorithm
    New method that combines optimal mass allocation with systematic resampling.
  • VASR-Max no independent evidence
    purpose: Biased high-selection variant for noisy intermediate rewards
    Introduced for latent diffusion models where stochastic continuations make rewards noisy.

pith-pipeline@v0.9.0 · 5576 in / 1610 out tokens · 60108 ms · 2026-05-13T01:04:13.232647+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 10 internal anchors

  1. [1]

    Denoising Diffusion Probabilistic Models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models, 2020. URLhttps://arxiv.org/abs/2006.11239

  2. [2]

    Generative modeling by estimating gradients of the data distribu- tion.arXiv preprint arXiv:1907.05600,

    Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution, 2020. URLhttps://arxiv.org/abs/1907.05600

  3. [3]

    SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

    Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis, 2023. URLhttps://arxiv.org/abs/2307.01952

  4. [4]

    High-Resolution Image Synthesis with Latent Diffusion Models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models, 2022. URL https://arxiv.org/ abs/2112.10752

  5. [5]

    CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

    Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Yuxuan Zhang, Weihan Wang, Yean Cheng, Bin Xu, Xiaotao Gu, Yuxiao Dong, and Jie Tang. Cogvideox: Text-to-video diffusion models with an expert transformer, 2025. URLhttps://arxiv.org/abs/2408.06072

  6. [6]

    Google DeepMind. Veo 3. https://aistudio.google.com/models/veo-3, 2025. Ac- cessed: 2026-03-15

  7. [7]

    LLaDA2.0: Scaling Up Diffusion Language Models to 100B

    Tiwei Bie, Maosong Cao, Kun Chen, Lun Du, Mingliang Gong, Zhuochen Gong, Yanmei Gu, Jiaqi Hu, Zenan Huang, Zhenzhong Lan, Chengxi Li, Chongxuan Li, Jianguo Li, Zehuan Li, Huabin Liu, Lin Liu, Guoshan Lu, Xiaocheng Lu, Yuxin Ma, Jianfeng Tan, Lanning Wei, Ji-Rong Wen, Yipeng Xing, Xiaolu Zhang, Junbo Zhao, Da Zheng, Jun Zhou, Junlin Zhou, Zhanchao Zhou, Li...

  8. [8]

    Chiu, Alexander Rush, and Volodymyr Kuleshov

    Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models, 2024. URLhttps://arxiv.org/abs/2406.07524

  9. [9]

    Training Diffusion Models with Reinforcement Learning

    Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning, 2024. URLhttps://arxiv.org/abs/2305.13301

  10. [10]

    Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models, 2023

    Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models, 2023. URL https://arxiv.org/abs/2305. 16381

  11. [11]

    Using human feedback to fine-tune diffusion models without any reward model, 2024

    Kai Yang, Jian Tao, Jiafei Lyu, Chunjiang Ge, Jiaxin Chen, Qimai Li, Weihan Shen, Xiaolong Zhu, and Xiu Li. Using human feedback to fine-tune diffusion models without any reward model, 2024. URLhttps://arxiv.org/abs/2311.13231

  12. [12]

    Diffusion model alignment using direct preference optimization, 2023

    Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, and Nikhil Naik. Diffusion model alignment using direct preference optimization, 2023. URLhttps://arxiv.org/abs/2311.12908

  13. [13]

    Directly fine-tuning diffusion models on differentiable rewards.arXiv preprint arXiv:2309.17400,

    Kevin Clark, Paul Vicol, Kevin Swersky, and David J Fleet. Directly fine-tuning diffusion models on differentiable rewards, 2024. URLhttps://arxiv.org/abs/2309.17400

  14. [14]

    Universal guidance for diffusion models, 2023

    Arpit Bansal, Hong-Min Chu, Avi Schwarzschild, Soumyadip Sengupta, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Universal guidance for diffusion models, 2023. URL https://arxiv.org/abs/2302.07121

  15. [15]

    Diffusion Posterior Sampling for General Noisy Inverse Problems

    Hyungjin Chung, Jeongsol Kim, Michael T. Mccann, Marc L. Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems, 2024. URL https://arxiv. org/abs/2209.14687. 12

  16. [16]

    Monte carlo guided diffusion for bayesian linear inverse problems, 2023

    Gabriel Cardoso, Yazid Janati El Idrissi, Sylvain Le Corff, and Eric Moulines. Monte carlo guided diffusion for bayesian linear inverse problems, 2023. URL https://arxiv.org/abs/ 2308.07983

  17. [17]

    Diffusion posterior sampling for linear inverse problem solving: A filtering perspective

    Zehao Dou and Yang Song. Diffusion posterior sampling for linear inverse problem solving: A filtering perspective. InThe Twelfth International Conference on Learning Representations,

  18. [18]

    URLhttps://openreview.net/forum?id=tplXNcHZs1

  19. [19]

    arXiv preprint arXiv:2501.06848 (2025)

    Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A general framework for inference-time scaling and steering of diffusion models, 2025. URLhttps://arxiv.org/abs/2501.06848

  20. [20]

    arXiv preprint arXiv:2408.08252 , year =

    Xiner Li, Yulai Zhao, Chenyu Wang, Gabriele Scalia, Gokcen Eraslan, Surag Nair, Tommaso Biancalani, Shuiwang Ji, Aviv Regev, Sergey Levine, and Masatoshi Uehara. Derivative-free guidance in continuous and discrete diffusion models with soft value-based decoding, 2024. URLhttps://arxiv.org/abs/2408.08252

  21. [21]

    Dynamic search for inference-time alignment in diffusion models, 2025

    Xiner Li, Masatoshi Uehara, Xingyu Su, Gabriele Scalia, Tommaso Biancalani, Aviv Regev, Sergey Levine, and Shuiwang Ji. Dynamic search for inference-time alignment in diffusion models, 2025. URLhttps://arxiv.org/abs/2503.02039

  22. [22]

    Diffusion tree sampling: Scalable inference-time alignment of diffusion models, 2025

    Vineet Jain, Kusha Sareen, Mohammad Pedramfar, and Siamak Ravanbakhsh. Diffusion tree sampling: Scalable inference-time alignment of diffusion models, 2025. URL https: //arxiv.org/abs/2506.20701

  23. [23]

    Manifold preserv- ing guided diffusion.arXiv preprint arXiv:2311.16424, 2023

    Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J. Zico Kolter, Ruslan Salakhutdinov, and Stefano Ermon. Manifold preserving guided diffusion, 2023. URLhttps://arxiv.org/abs/2311.16424

  24. [24]

    Score-Based Generative Modeling through Stochastic Differential Equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations, 2021. URLhttps://arxiv.org/abs/2011.13456

  25. [25]

    Diffusion Models Beat GANs on Image Synthesis

    Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis, 2021. URL https://arxiv.org/abs/2105.05233

  26. [26]

    Pseudoinverse-guided diffusion models for inverse problems

    Jiaming Song, Arash Vahdat, Morteza Mardani, and Jan Kautz. Pseudoinverse-guided diffusion models for inverse problems. InInternational Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=9_gsMA8MRKQ

  27. [27]

    Practi- cal and asymptotically exact conditional sampling in diffusion models

    Luhuan Wu, Brian Trippe, Christian Naesseth, David Blei, and John P Cunningham. Practi- cal and asymptotically exact conditional sampling in diffusion models. In A. Oh, T. Nau- mann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neu- ral Information Processing Systems, volume 36, pages 31372–31403. Curran Associates, Inc., 2023. UR...

  28. [28]

    Trippe, Jason Yim, Doug Tischer, David Baker, Tamara Broderick, Regina Barzilay, and Tommi Jaakkola

    Brian L. Trippe, Jason Yim, Doug Tischer, David Baker, Tamara Broderick, Regina Barzilay, and Tommi Jaakkola. Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem, 2023. URLhttps://arxiv.org/abs/2206.04119

  29. [29]

    Pierre Del Moral.Feynman-Kac Formulae: Genealogical and Interacting Particle Systems With Applications, volume 100. 05 2004. ISBN 0387202684. doi: 10.1007/978-1-4684-9393-1

  30. [30]

    Inference-time scaling for diffu- sion models beyond scaling denoising steps.arXiv preprint arXiv:2501.09732, 2025

    Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, and Saining Xie. Inference-time scaling for diffusion models beyond scaling denoising steps, 2025. URL https://arxiv.org/abs/2501.09732

  31. [31]

    Vfscale: Intrinsic reasoning through verifier- free test-time scalable diffusion model, 2026

    Tao Zhang, Jia-Shu Pan, Ruiqi Feng, and Tailin Wu. Vfscale: Intrinsic reasoning through verifier- free test-time scalable diffusion model, 2026. URLhttps://arxiv.org/abs/2502.01989

  32. [32]

    Controllable graph generation with diffusion models via inference-time tree search guidance, 2026

    Jiachi Zhao, Zehong Wang, Yamei Liao, Chuxu Zhang, and Yanfang Ye. Controllable graph generation with diffusion models via inference-time tree search guidance, 2026. URL https: //arxiv.org/abs/2510.10402. 13

  33. [33]

    Diffusion language model inference with monte carlo tree search, 2026

    Zheng Huang, Kiran Ramnath, Yueyan Chen, Aosong Feng, Sangmin Woo, Balasubramaniam Srinivasan, Zhichao Xu, Kang Zhou, Shuai Wang, Haibo Ding, and Lin Lee Cheong. Diffusion language model inference with monte carlo tree search, 2026. URL https://arxiv.org/ abs/2512.12168

  34. [34]

    Monte carlo tree diffusion for system 2 planning, 2026

    Jaesik Yoon, Hyeonseo Cho, Doojin Baek, Yoshua Bengio, and Sungjin Ahn. Monte carlo tree diffusion for system 2 planning, 2026. URLhttps://arxiv.org/abs/2502.07202

  35. [35]

    Denoising Diffusion Implicit Models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models, 2022. URLhttps://arxiv.org/abs/2010.02502

  36. [36]

    S. N. Ethier and Thomas G. Kurtz. Fleming–viot processes in population genetics.SIAM Journal on Control and Optimization, 31(2):345–386, 1993. doi: 10.1137/0331019. URL https://doi.org/10.1137/0331019

  37. [37]

    Fleming-viot processes : two explicit examples, 2016

    Bertrand Cloez and Marie-Noémie Thai. Fleming-viot processes : two explicit examples, 2016. URLhttps://arxiv.org/abs/1603.04670

  38. [38]

    Springer US, Boston, MA, 2002

    Robbins-Monro Algorithm, pages 1–24. Springer US, Boston, MA, 2002. ISBN 978- 0-306-48166-6. doi: 10.1007/0-306-48166-9_1. URL https://doi.org/10.1007/ 0-306-48166-9_1

  39. [39]

    Test-time alignment of diffusion models without reward over-optimization, 2025

    Sunwoo Kim, Minkyu Kim, and Dongmin Park. Test-time alignment of diffusion models without reward over-optimization, 2025. URLhttps://arxiv.org/abs/2501.05803

  40. [40]

    See-dpo: Self entropy enhanced direct preference optimization, 2025

    Shivanshu Shekhar, Shreyas Singh, and Tong Zhang. See-dpo: Self entropy enhanced direct preference optimization, 2025. URLhttps://arxiv.org/abs/2411.04712

  41. [41]

    Photorealistic text-to-image diffusion models with deep language understanding.Advances in Neural Information Processing Systems, 35:36479–36494, 2022

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding.Advances in Neural Information Processing Systems, 35:36479–36494, 2022

  42. [42]

    Imagereward: Learning and evaluating human preferences for text-to-image generation,

    Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: Learning and evaluating human preferences for text-to-image generation,

  43. [43]

    URLhttps://arxiv.org/abs/2304.05977

  44. [44]

    Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, and Duen Horng Chau

    Zijie J. Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, and Duen Horng Chau. Large-scale prompt gallery dataset for text-to-image generative models. arXiv:2210.14896 [cs], 2022. URLhttps://arxiv.org/abs/2210.14896. 14 A Additional Experiments Figure 8:Qualitative comparison on aesthetic optimization.Samples from each method for a 50K ...

  45. [45]

    the number of particles satisfiesK→ ∞

  46. [46]

    the dependencies induced by the birth–death interaction become negligible in the large-K limit, as predicted by the usual mean-field / propagation-of-chaos theory for interacting particle systems [28, 35]. Then the empirical measure of particles produced byFVDshould converge to a reward-tilted distribution of the form π∗ t (xt:T )∝p θ(xt:T ) T−1Y s=t Gs(x...