pith. machine review for the scientific record. sign in

arxiv: 2211.01095 · v3 · submitted 2022-11-02 · 💻 cs.LG · cs.CV

Recognition: 2 theorem links

· Lean Theorem

DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models

Authors on Pith no claims yet

Pith reviewed 2026-05-16 07:50 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords diffusion probabilistic modelsguided samplingfast samplingDPM-Solver++high-order ODE solverDDIMtext-to-image generationlatent diffusion
0
0 comments X

The pith

DPM-Solver++ generates high-quality guided samples from diffusion models in 15 to 20 steps

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Diffusion models for high-resolution image synthesis need guided sampling with large guidance scales to reach best quality, yet the standard fast sampler DDIM still requires 100 to 250 steps. Earlier high-order ODE solvers that speed up unguided sampling become unstable or slower than DDIM once guidance grows large. The paper introduces DPM-Solver++ that solves the diffusion ODE from the data-prediction model, applies thresholding to keep solutions inside the training distribution, and adds a multistep variant that shrinks the effective step size. Experiments confirm that this combination produces high-quality samples for both pixel-space and latent-space models in only 15 to 20 steps.

Core claim

Previous high-order fast samplers for diffusion ODEs suffer from instability issues and become slower than DDIM when the guidance scale grows large. DPM-Solver++ solves the diffusion ODE with the data prediction model and adopts thresholding methods to keep the solution matching the training data distribution. Its multistep variant addresses the instability by reducing the effective step size, enabling high-quality guided sampling in 15 to 20 steps.

What carries the argument

DPM-Solver++, a high-order diffusion ODE solver that uses the data-prediction formulation together with thresholding and a multistep variant to restore stability under large guidance scales

If this is right

  • Guided sampling reaches high quality with far fewer steps than DDIM for both pixel and latent DPMs.
  • High-order solvers become usable for conditional generation once formulated around data prediction and thresholding.
  • Multistep correction stabilizes the solver without increasing the total number of function evaluations.
  • Computational cost of text-to-image generation drops sharply while preserving sample quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same data-prediction and thresholding choices may stabilize solvers in other conditional generation settings beyond classifier-free guidance.
  • Fifteen-step sampling could open real-time or interactive uses for large diffusion models.
  • Adaptive selection between single-step and multistep modes might further reduce average compute across varying guidance scales.

Load-bearing premise

The data-prediction formulation combined with thresholding and the multistep variant reliably removes instability at large guidance scales.

What would settle it

Running DPM-Solver++ for 15 steps at high guidance scale on a standard text-to-image benchmark and observing that sample quality falls below DDIM or exhibits visible artifacts or divergence.

read the original abstract

Diffusion probabilistic models (DPMs) have achieved impressive success in high-resolution image synthesis, especially in recent large-scale text-to-image generation applications. An essential technique for improving the sample quality of DPMs is guided sampling, which usually needs a large guidance scale to obtain the best sample quality. The commonly-used fast sampler for guided sampling is DDIM, a first-order diffusion ODE solver that generally needs 100 to 250 steps for high-quality samples. Although recent works propose dedicated high-order solvers and achieve a further speedup for sampling without guidance, their effectiveness for guided sampling has not been well-tested before. In this work, we demonstrate that previous high-order fast samplers suffer from instability issues, and they even become slower than DDIM when the guidance scale grows large. To further speed up guided sampling, we propose DPM-Solver++, a high-order solver for the guided sampling of DPMs. DPM-Solver++ solves the diffusion ODE with the data prediction model and adopts thresholding methods to keep the solution matches training data distribution. We further propose a multistep variant of DPM-Solver++ to address the instability issue by reducing the effective step size. Experiments show that DPM-Solver++ can generate high-quality samples within only 15 to 20 steps for guided sampling by pixel-space and latent-space DPMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes DPM-Solver++, a high-order solver for the guided sampling of diffusion probabilistic models (DPMs). It solves the diffusion ODE using a data-prediction formulation with thresholding and introduces a multistep variant to mitigate instability observed in prior high-order solvers at large guidance scales. Experiments are presented claiming that DPM-Solver++ generates high-quality samples in only 15-20 steps for both pixel-space and latent-space DPMs, outperforming DDIM and earlier high-order methods.

Significance. If the stability and speedup claims hold, the work would provide a practical acceleration for guided sampling in large-scale DPMs, which is central to text-to-image applications. The derivation is parameter-free and builds directly on the standard diffusion ODE without fitting additional parameters; the explicit algorithmic choices (data prediction, thresholding, multistep schedule) are a strength that could be reproducible if code and exact schedules are released.

major comments (2)
  1. [Abstract] Abstract: the claim that prior high-order solvers become unstable and slower than DDIM at large guidance scales is presented without quantitative metrics, error bars, or ablation details on how instability was measured or controlled; this weakens the motivation for the multistep fix.
  2. [Section 3 (Multistep DPM-Solver++)] The multistep variant is asserted to address instability by reducing effective step size, yet no derivation is given showing how the predictor-corrector or extrapolation coefficients achieve this reduction while preserving high-order accuracy; the central stability claim therefore rests on an unverified assumption about the specific multistep schedule.
minor comments (2)
  1. Add error bars or statistics over multiple random seeds to all sampling-quality plots and tables to support the 15-20 step claims.
  2. Clarify the exact thresholding implementation and its interaction with the data-prediction model in the main text rather than deferring all details to the appendix.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below with clarifications and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that prior high-order solvers become unstable and slower than DDIM at large guidance scales is presented without quantitative metrics, error bars, or ablation details on how instability was measured or controlled; this weakens the motivation for the multistep fix.

    Authors: We agree that the abstract would be strengthened by explicit quantitative support. In the revised manuscript we will expand the abstract to include concrete metrics: FID scores and sampling wall-clock times comparing a prior high-order solver (DPM-Solver) against DDIM at guidance scale 7.5, together with standard deviations computed over three independent runs. We will also add a short description in Section 4 of how instability was quantified (sample divergence measured by FID increase beyond a threshold and per-step norm of the update exceeding a stability bound). revision: yes

  2. Referee: [Section 3 (Multistep DPM-Solver++)] The multistep variant is asserted to address instability by reducing effective step size, yet no derivation is given showing how the predictor-corrector or extrapolation coefficients achieve this reduction while preserving high-order accuracy; the central stability claim therefore rests on an unverified assumption about the specific multistep schedule.

    Authors: We appreciate the request for a rigorous derivation. In the revised Section 3 we will insert a new subsection that derives the effective step-size reduction from the predictor-corrector coefficients and the linear extrapolation formula. Using Taylor expansion of the data-prediction ODE solution, we will show that the local truncation error order is retained while the leading error term is scaled by a factor proportional to the reduced effective step size. The derivation will be parameter-free and will directly reference the multistep schedule already given in Algorithm 2. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation starts from standard diffusion ODE with explicit algorithmic choices

full rationale

The paper starts from the standard diffusion ODE and introduces explicit algorithmic components (data-prediction formulation, thresholding, and a multistep variant) to improve guided sampling. These choices are presented as design decisions rather than parameters fitted to the target result or definitions that presuppose the claimed performance. No load-bearing equation or step reduces by construction to its own inputs, and any self-citations to prior DPM-Solver work are not used to justify the new stabilization claims for large guidance scales. The central claims rest on empirical validation rather than self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the standard reformulation of diffusion sampling as an ODE and the assumption that thresholding can enforce consistency with the training distribution; no new free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Diffusion probabilistic models admit an ODE formulation whose solution yields the generative process.
    This is the standard continuous-time view of DPMs used throughout the literature.

pith-pipeline@v0.9.0 · 5553 in / 1136 out tokens · 31784 ms · 2026-05-16T07:50:42.794854+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 21 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Is Monotonic Sampling Necessary in Diffusion Models?

    cs.LG 2026-05 unverdicted novelty 7.0

    Non-monotonic sampling schedules never improve upon monotonic baselines in diffusion models, with performance gaps ranging from substantial to negligible depending on the denoiser.

  2. TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment

    cs.LG 2026-05 unverdicted novelty 7.0

    TMPO replaces scalar reward maximization with trajectory-level matching to a Boltzmann distribution via Softmax-TB, improving generative diversity by 9.1% while keeping competitive reward performance.

  3. TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment

    cs.LG 2026-05 unverdicted novelty 7.0

    TMPO uses Softmax Trajectory Balance to match policy probabilities over multiple trajectories to a Boltzmann reward distribution, improving diversity by 9.1% in diffusion alignment tasks.

  4. Inverse Design of Multi-Layer Sub-Pixel-Resolution RF Passives Through Grayscale Diffusion with Flexible S-Parameter Conditioning

    eess.SP 2026-05 unverdicted novelty 7.0

    Grayscale diffusion model generates two-layer RF passives with sub-pixel resolution from partial S-parameters, achieving low error in surrogate predictions and validated on fabricated filters.

  5. Structured Diffusion Bridges: Inductive Bias for Denoising Diffusion Bridges

    cs.LG 2026-05 unverdicted novelty 7.0

    Structured diffusion bridges with alignment constraints achieve near fully-paired quality in modality translation while working effectively in unpaired and semi-paired regimes.

  6. DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching

    cs.CV 2026-02 unverdicted novelty 7.0

    DisCa replaces heuristic feature caching with a lightweight learnable neural predictor compatible with distillation, achieving 11.8× acceleration on video diffusion transformers with preserved generation quality.

  7. DiffusionNFT: Online Diffusion Reinforcement with Forward Process

    cs.LG 2025-09 unverdicted novelty 7.0

    DiffusionNFT performs online RL for diffusion models on the forward process via flow matching and positive-negative contrasts, delivering up to 25x efficiency gains and rapid benchmark improvements over prior reverse-...

  8. MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

    cs.AI 2025-07 unverdicted novelty 7.0

    MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.

  9. Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

    cs.CV 2023-10 unverdicted novelty 7.0

    Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.

  10. FIS-DiT: Breaking the Few-Step Video Inference Barrier via Training-Free Frame Interleaved Sparsity

    cs.CV 2026-05 unverdicted novelty 6.0

    FIS-DiT achieves 2.11-2.41x speedup on video DiT models in few-step regimes with negligible quality loss by exploiting frame-wise sparsity and consistency through a training-free interleaved execution strategy.

  11. The two clocks and the innovation window: When and how generative models learn rules

    cs.LG 2026-05 unverdicted novelty 6.0

    Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.

  12. Lookahead Drifting Model

    cs.LG 2026-04 unverdicted novelty 6.0

    The lookahead drifting model improves upon the drifting model by sequentially computing multiple drifting terms that incorporate higher-order gradient information, leading to better performance on toy examples and CIFAR10.

  13. Post-Hoc Guidance for Consistency Models by Joint Flow Distribution Learning

    cs.LG 2026-04 unverdicted novelty 6.0

    JFDL allows pre-trained Consistency Models to perform guided image generation post-hoc by aligning flow distributions, reducing FID scores on CIFAR-10 and ImageNet without needing a teacher model.

  14. Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

    cs.RO 2025-02 accept novelty 6.0

    OpenVLA-OFT fine-tuning boosts LIBERO success rate from 76.5% to 97.1%, speeds action generation 26x, and outperforms baselines on real bimanual dexterous tasks.

  15. SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

    cs.CV 2024-10 unverdicted novelty 6.0

    Sana-0.6B produces high-resolution images with strong text alignment at 20x smaller size and 100x higher throughput than Flux-12B by combining 32x image compression, linear DiT blocks, and a decoder-only LLM text encoder.

  16. IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

    cs.CV 2023-08 unverdicted novelty 6.0

    IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.

  17. Outlier-Robust Diffusion Solvers for Inverse Problems

    cs.CV 2026-05 unverdicted novelty 5.0

    Diffusion-based inverse problem solvers are made robust to outliers by combining explicit noise estimation with a Huber-loss IRLS objective solved via conjugate gradient.

  18. Lightning Unified Video Editing via In-Context Sparse Attention

    cs.CV 2026-05 unverdicted novelty 5.0

    ISA prunes low-saliency context tokens and routes queries by sharpness to either full or 0-th order Taylor sparse attention, enabling LIVEditor to cut attention latency ~60% while beating prior video editing methods o...

  19. Structured Diffusion Bridges: Inductive Bias for Denoising Diffusion Bridges

    cs.LG 2026-05 unverdicted novelty 5.0

    A structured diffusion bridge method achieves near fully-paired modality translation quality using alignment constraints even in unpaired or semi-paired regimes.

  20. Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

    cs.CV 2026-04 unverdicted novelty 5.0

    Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemph...

  21. From Euler to Dormand-Prince: ODE Solvers for Flow Matching Generative Models

    cs.LG 2026-04 accept novelty 4.0

    RK4 at 80 function evaluations matches Euler at 200 in sliced Wasserstein quality for flow matching sampling, with the adaptive solver concentrating steps near t=1 due to stiffening velocity fields.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · cited by 19 Pith papers · 3 internal anchors

  1. [1]

    Estimating the optimal covariance with imperfect mean in diffusion probabilistic models

    Fan Bao, Chongxuan Li, Jiacheng Sun, Jun Zhu, and Bo Zhang. Estimating the optimal covariance with imperfect mean in diffusion probabilistic models. arXiv preprint arXiv:2206.07309, 2022a. Fan Bao, Chongxuan Li, Jun Zhu, and Bo Zhang. Analytic-DPM: An analytic estimate of the optimal reverse variance in diffusion probabilistic models. In International Con...

  2. [2]

    Classifier-free diffusion guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications,

  3. [3]

    Gotta go fast when generating data with score-based models

    Alexia Jolicoeur-Martineau, Ke Li, Rémi Piché-Taillefer, Tal Kachman, and Ioannis Mitliagkas. Gotta go fast when generating data with score-based models. arXiv preprint arXiv:2105.14080,

  4. [4]

    On fast sampling of diffusion probabilistic models

    Zhifeng Kong and Wei Ping. On fast sampling of diffusion probabilistic models. arXiv preprint arXiv:2106.00132,

  5. [5]

    Bilateral denoising diffusion models

    Max WY Lam, Jun Wang, Rongjie Huang, Dan Su, and Dong Yu. Bilateral denoising diffusion models. arXiv preprint arXiv:2108.11514,

  6. [6]

    Diffsinger: Singing voice synthesis via shallow diffusion mechanism

    Jinglin Liu, Chengxi Li, Yi Ren, Feiyang Chen, and Zhou Zhao. Diffsinger: Singing voice synthesis via shallow diffusion mechanism. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 11020–11028, 2022a. Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. Pseudo numerical methods for diffusion models on manifolds. arXiv preprint arX...

  7. [7]

    Knowledge distillation in iterative generative models for improved sampling speed

    Eric Luhman and Troy Luhman. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388,

  8. [8]

    GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

    Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741,

  9. [9]

    Hierarchical Text-Conditional Image Generation with CLIP Latents

    Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text- conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125,

  10. [10]

    Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

    Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans, David Fleet, and Mohammad Norouzi. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings, pp. 1–10, 2022a. Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S ...

  11. [11]

    Noise estimation for generative diffusion models

    Robin San-Roman, Eliya Nachmani, and Lior Wolf. Noise estimation for generative diffusion models. arXiv preprint arXiv:2104.02600,

  12. [12]

    Denoising diffusion implicit models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In Interna- tional Conference on Learning Representations, 2021a. Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on L...

  13. [13]

    Lossy compression with gaussian diffusion

    Lucas Theis, Tim Salimans, Matthew D Hoffman, and Fabian Mentzer. Lossy compression with gaussian diffusion. arXiv preprint arXiv:2206.08889,

  14. [14]

    Diffusion-gan: Training gans with diffusion

    Zhendong Wang, Huangjie Zheng, Pengcheng He, Weizhu Chen, and Mingyuan Zhou. Diffusion-gan: Training gans with diffusion. arXiv preprint arXiv:2206.02262,

  15. [15]

    Diffusion-based molecule generation with informative prior bridges

    Lemeng Wu, Chengyue Gong, Xingchao Liu, Mao Ye, and Qiang Liu. Diffusion-based molecule generation with informative prior bridges. arXiv preprint arXiv:2209.00865,

  16. [16]

    Geodiff: A geometric diffusion model for molecular conformation generation

    Minkai Xu, Lantao Yu, Yang Song, Chence Shi, Stefano Ermon, and Jian Tang. Geodiff: A geometric diffusion model for molecular conformation generation. arXiv preprint arXiv:2203.02923,

  17. [17]

    Fast sampling of diffusion models with exponential integrator

    Qinsheng Zhang and Yongxin Chen. Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902,

  18. [18]

    gddim: Generalized denoising diffusion implicit models

    14 Preprint Qinsheng Zhang, Molei Tao, and Yongxin Chen. gddim: Generalized denoising diffusion implicit models. arXiv preprint arXiv:2206.05564,

  19. [19]

    Egsde: Unpaired image-to-image translation via energy-guided stochastic differential equations

    Min Zhao, Fan Bao, Chongxuan Li, and Jun Zhu. Egsde: Unpaired image-to-image translation via energy-guided stochastic differential equations. arXiv preprint arXiv:2207.06635,

  20. [20]

    A A DDITIONAL PROOFS A.1 P ROOF OF PROPOSITION 4.1 Taking derivative w.r.t.t in Eq. (8) yields dxt dt = dσt dt xs σs + dσt dt Z λt λs eλ ˆxθ(ˆxλ, λ)dλ + dλt dt σteλt ˆxθ(ˆxλt , λt) = dσt dt xt σt + dλt dt σteλt ˆxθ(ˆxλt , λt) = f(t) + g2(t) 2σ2 t xt σt − αtg2(t) 2σ2 t xθ(xt, t), where the last inequality follows from the definitions f(t) = d log αt dt , g...

  21. [21]

    r1 (ϵθ(xr, r) − ϵθ(xs, s)). (37) SDE-DPM-Solver++2M We have 2αt Z λt λs e−2(λt−λ)xθ(xλ, λ)dλ (38) ≈ 2αte−2λt Z λt λs e2λdλ ! xθ(xs, s) + 2αte−2λt Z λt λs e2λ(λ − λs)dλ ! xθ(xr, r) − xθ(xs, s) r1h (39) = αt(1 − e−2h)xθ(xs, s) + αt e−2h − 1 + 2h 2h xθ(xr, r) − xθ(xs, s) r1 (40) We can also applying the same approximation as in Lu et al. (2022) by e−2h − 1 +...

  22. [22]

    Taylor’s expansion yields xti − αti αti−1 xti−1 − αti(e−hi − 1)xθ(xti−1 , ti−1) − αti −e−hi − hi + 1 x(1) θ (xti−1 , ti−1) ≤ Ch 3 i , where C is a constant depends on x(2) θ

    Let ∆i := ∥˜xti − xti ∥. Taylor’s expansion yields xti − αti αti−1 xti−1 − αti(e−hi − 1)xθ(xti−1 , ti−1) − αti −e−hi − hi + 1 x(1) θ (xti−1 , ti−1) ≤ Ch 3 i , where C is a constant depends on x(2) θ . Also note that x(1) θ (xti−1 , ti−1) − 1 hi−1 xθ(xti−1 , ti−1) − xθ(xti−2 , ti−2) ≤ Ch i, Since ri is bounded away from zero, and e−hi = 1 − hi + h2 i /2 + ...

  23. [23]

    , N, where each xn is corresponding to the value at time tn

    train the noise prediction model ϵθ at N fixed time steps {tn}N n=1 and the noise prediction model is parameterized by ˜ϵθ(xn, 1000n N ) for n = 1, . . . , N, where each xn is corresponding to the value at time tn. In practice, these discrete-time DPMs usually choose uniform time steps between [0, T], thus tn = nT N , for n = 1, . . . , N. The smallest ti...

  24. [24]

    After obtained the βn sequence, the noise schedule αn is defined by αn = nY i=1 (1 − βn), (47) where each αn is corresponding to the continuous-time tn = nT N , i.e

    or cosine schedule (Nichol & Dhariwal, 2021). After obtained the βn sequence, the noise schedule αn is defined by αn = nY i=1 (1 − βn), (47) where each αn is corresponding to the continuous-time tn = nT N , i.e. αtn = αn. To generalize the discrete αn to the continuous version, we use a linear interpolation for the function log αn. Specifically, for each ...

  25. [25]

    In our experiments, we tune the time step schedule according to their power function choices

    on ImageNet 256x256 and vary the classifier guidance scale. In our experiments, we tune the time step schedule according to their power function choices. Specifically, let tM = 10−3 and t0 = 1, the time steps {ti}M i=0 satisfies ti = M − i M t 1 κ 0 + i M t 1 κ M κ , where κ is a hyperparameter. Following Zhang & Chen (2022), we search κ in 1, 2, 3 by DEI...

  26. [26]

    the uniform t for time steps

    We find that for all guidance scales, the best setting isκ = 1, i.e. the uniform t for time steps. We further compare uniform t and uniform λ and find that the uniform t time step schedule is still the best choice. Therefore, in all of our experiments, we use the uniform t for evaluations. Table 2: Sample quality measured by FID ↓ on ImageNet 256×256 (dis...

  27. [27]

    A beautiful castle beside a waterfall in the woods, by Josef Thoma, matte painting, trending on artstation HQ

    is designed for uniform λ (the intermediate time steps are a half of the step size w.r.t. λ), we also convert the intermediate time steps to ensure all the time steps are uniform t. We find that such conversion can improve the sample quality of both the singlestep DPM-Solver the singlestep DPM-Solver++. We run NFE in 10, 15, 20, 25 for the high-order solv...

  28. [28]

    Guidance scale is 7.5, which is the recommended setting for stable-diffusion (Rombach et al., 2022)

    10.18 8.63 8.20 7.98 \ \ \ DPM-Solver++(S) (ours) 9.18 8.17 7.77 7.56 \ \ \ DPM-Solver++(M) (ours) 9.19 8.47 8.17 8.07 \ \ \ Yes DDIM (Song et al., 2021a) 11.19 9.20 8.42 8.05 7.65 7.59 7.63 DPM-Solver++(S) (ours) 9.23 8.18 7.81 7.60 \ \ \ DPM-Solver++(M) (ours) 9.28 8.56 8.28 8.18 \ \ \ Table 4: Sample quality measured by MSE↓ on COCO2014 validation set ...

  29. [29]

    0.59 0.48 0.43 0.37 0.23 \ \ DEIS-1 (Zhang & Chen, 2022)0.47 0.39 0.34 0.29 0.16 \ \ DEIS-2 (Zhang & Chen,

  30. [30]

    23 Preprint NFE = 10 DPM-Solver++(2M) (ours) DDPM (Ho et al.,

    (ours) Figure 6: Samples using the pre-trained latent-space DPMs (Stable-Diffusion (Rombach et al., 2022)) with a classifier-free guidance scale 7.5 (the default setting), varying different samplers and different number of function evaluations N. 23 Preprint NFE = 10 DPM-Solver++(2M) (ours) DDPM (Ho et al.,