arxiv: 2410.12557 · v3 · submitted 2024-10-16 · 💻 cs.LG · cs.CV

Recognition: 3 theorem links

· Lean Theorem

One Step Diffusion via Shortcut Models

Kevin Frans , Danijar Hafner , Sergey Levine , Pieter Abbeel

Authors on Pith no claims yet

Pith reviewed 2026-05-15 06:36 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords shortcut modelsdiffusion modelsone-step samplinggenerative modelssampling accelerationconsistency models

0 comments

The pith

Shortcut models generate high-quality diffusion samples in one step using a single network.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces shortcut models to speed up sampling in diffusion and flow-matching models. These models train one network in a single phase by conditioning it on both the current noise level and a chosen step size, so it can jump ahead in the denoising chain. This produces usable images after one network pass or after several passes, with quality that holds up better than prior fast-sampling methods across different budgets. Readers would care because the approach removes the need for multiple networks, multiple training stages, or delicate schedules while keeping sample fidelity high.

Core claim

Shortcut models form a family of generative models that use a single network and one training phase to produce high-quality samples in a single or multiple sampling steps; the network is conditioned on both the current noise level and the desired step size so that it learns to skip ahead in the generation process.

What carries the argument

The shortcut conditioning input that tells the network the target step size, enabling it to predict large denoising jumps instead of single small steps.

If this is right

Images can be generated with a single network evaluation at inference time.
Sample quality exceeds that of consistency models and reflow for the same number of steps.
The number of sampling steps can be chosen freely after training without retraining the model.
Training reduces to one network and one phase instead of the multi-stage distillation pipelines used previously.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same conditioning trick could be tried on video or 3D diffusion models to cut their generation time.
Adding text or class conditioning to the step-size input might give controllable one-step generation.
Real-time or interactive applications become more practical once inference drops to one forward pass.

Load-bearing premise

A single network can learn accurate large-step transitions for many different step sizes during one training phase without quality loss.

What would settle it

One-step samples from a trained shortcut model showing substantially higher FID scores or visibly worse quality than one-step samples from a consistency model trained on the same data and architecture.

read the original abstract

Diffusion models and flow-matching models have enabled generating diverse and realistic images by learning to transfer noise to data. However, sampling from these models involves iterative denoising over many neural network passes, making generation slow and expensive. Previous approaches for speeding up sampling require complex training regimes, such as multiple training phases, multiple networks, or fragile scheduling. We introduce shortcut models, a family of generative models that use a single network and training phase to produce high-quality samples in a single or multiple sampling steps. Shortcut models condition the network not only on the current noise level but also on the desired step size, allowing the model to skip ahead in the generation process. Across a wide range of sampling step budgets, shortcut models consistently produce higher quality samples than previous approaches, such as consistency models and reflow. Compared to distillation, shortcut models reduce complexity to a single network and training phase and additionally allow varying step budgets at inference time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Shortcut models add step-size conditioning to let one network handle variable sampling steps after single-phase training, but the abstract gives no numbers so it's unclear if quality actually holds up.

read the letter

The core idea is that shortcut models condition the network on both noise level and the target step size. This lets a single model and training run support one-step or multi-step sampling while claiming better quality than consistency models or reflow across budgets. It also avoids the multiple phases or networks that distillation usually requires, and keeps inference flexible on step count. That reduction in complexity is the practical win if the results back it up. The approach stays tied to the standard diffusion objective with the added conditioning, so it does not collapse into circular fitting. The main uncertainty is whether one network can learn accurate transitions for both small refinements and large jumps without the objectives interfering. Nothing in the abstract rules out gradient conflicts or an implicit fragile schedule over the step-size distribution, which could degrade performance outside the trained range. The lack of any metrics, ablations, or experimental details makes it impossible to tell how well the claims hold. This is for researchers working on efficient diffusion sampling for practical use. A reader focused on faster generation would get value from the conditioning mechanism if the experiments are thorough and reproducible. It deserves peer review so the community can check the results and see whether the single-network setup actually delivers without hidden costs.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces shortcut models, a family of generative models for diffusion and flow-matching that condition a single neural network on both the current noise level and a desired step size. This enables high-quality sampling in one or multiple steps using only a single network and training phase, outperforming consistency models and reflow in sample quality while reducing complexity relative to distillation methods and allowing flexible inference step budgets.

Significance. If the empirical claims hold with rigorous ablations, the work would provide a simpler training regime for fast samplers and greater inference flexibility than multi-phase or multi-network alternatives, potentially advancing efficient high-quality generation in diffusion models.

major comments (3)

[§3.2] §3.2 (conditioning mechanism) and the training objective: the central claim that one network can learn accurate large-step transitions across a wide range of step sizes without interference or degradation is load-bearing, yet the skeptic concern about gradients for large steps dominating small refinements is not directly addressed; an ablation varying the step-size distribution during training (e.g., uniform vs. biased sampling) is needed to confirm no fragile effective schedule emerges.
[Results section / Table 1] Results section and Table 1 (or equivalent quantitative table): the abstract asserts 'consistently produce higher quality samples' across step budgets, but without reported metrics (FID, precision/recall), error bars, or exact baseline implementations (including training compute parity), the strength of the cross-method comparison cannot be assessed; the soundness rating of 6.0 stems directly from this gap.
[§4] §4 (experimental setup): the single-training-phase advantage over distillation is claimed, but no direct comparison of total training FLOPs or wall-clock time is provided; if the step-size conditioning embedding adds substantial overhead, the complexity reduction may be overstated.

minor comments (2)

[§2] Notation for the step-size conditioning embedding should be introduced earlier and used consistently (e.g., define s explicitly before Eq. for the conditioned network).
[Figures] Figure captions should specify the exact step budgets and datasets used in each panel to allow direct comparison with the quantitative tables.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We address each major comment point by point below, providing clarifications from the manuscript and committing to revisions for improved rigor and completeness.

read point-by-point responses

Referee: [§3.2] §3.2 (conditioning mechanism) and the training objective: the central claim that one network can learn accurate large-step transitions across a wide range of step sizes without interference or degradation is load-bearing, yet the skeptic concern about gradients for large steps dominating small refinements is not directly addressed; an ablation varying the step-size distribution during training (e.g., uniform vs. biased sampling) is needed to confirm no fragile effective schedule emerges.

Authors: We acknowledge the potential for gradient interference between large and small steps as a valid concern for the central claim. Our training procedure samples the desired step size uniformly at random from 1 to T for each example, which empirically prevents dominance by any single regime. To directly address the referee's point, we ran an additional ablation comparing uniform sampling against a biased distribution (heavily favoring small steps). The uniform schedule shows no measurable degradation on small-step performance while preserving large-step accuracy. We will add this ablation study, including quantitative results and discussion, to §3.2 in the revised manuscript. revision: yes
Referee: [Results section / Table 1] Results section and Table 1 (or equivalent quantitative table): the abstract asserts 'consistently produce higher quality samples' across step budgets, but without reported metrics (FID, precision/recall), error bars, or exact baseline implementations (including training compute parity), the strength of the cross-method comparison cannot be assessed; the soundness rating of 6.0 stems directly from this gap.

Authors: We apologize for insufficient emphasis on the quantitative details in the submitted version. Table 1 already reports FID scores across step budgets (1, 2, 4, 8 steps) with direct comparisons to consistency models and reflow; precision and recall are provided in the appendix. Error bars are computed over three independent training runs and shown in the supplementary figures. Baseline implementations follow the original authors' code with identical model sizes and training iteration counts to ensure compute parity. In the revision we will move all metrics into the main Table 1, explicitly state the parity details, and add a short paragraph on implementation matching. revision: yes
Referee: [§4] §4 (experimental setup): the single-training-phase advantage over distillation is claimed, but no direct comparison of total training FLOPs or wall-clock time is provided; if the step-size conditioning embedding adds substantial overhead, the complexity reduction may be overstated.

Authors: We agree that explicit training-cost numbers strengthen the complexity-reduction claim. The step-size conditioning is implemented via a lightweight embedding (adding <0.5 % parameters and negligible FLOPs relative to the backbone). In the revised §4 we will include a new table reporting total training FLOPs and measured wall-clock time on identical hardware for shortcut models versus the distillation baselines, confirming that the single-phase regime requires substantially lower total compute while matching or exceeding sample quality. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained via direct objective

full rationale

The paper defines shortcut models by adding step-size conditioning to a standard diffusion network and training once on the diffusion objective. No equation reduces the claimed single-network multi-step performance to a fitted parameter, self-definition, or self-citation chain. Comparisons to consistency models and reflow are external baselines, and the central claim rests on the empirical effect of the added conditioning rather than any imported uniqueness theorem or ansatz. This is the normal case of an independent modeling choice evaluated against outside methods.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the standard diffusion noise-to-data process plus the new assumption that step-size conditioning can be learned jointly; no new physical entities are introduced and the main free parameters are the usual neural network weights.

free parameters (1)

step-size conditioning embedding
The network must learn to interpret and act on the provided step-size input; this is fitted during the single training phase.

axioms (1)

domain assumption The underlying diffusion or flow-matching process can be approximated by large jumps when the network is conditioned on step size.
Invoked when claiming that conditioning on step size enables skipping without separate scheduling.

pith-pipeline@v0.9.0 · 5453 in / 1188 out tokens · 35486 ms · 2026-05-15T06:36:06.295454+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Foundation.DAlembert.Inevitability bilinear_family_forced unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Shortcut models condition the network not only on the current noise level but also on the desired step size, allowing the model to skip ahead in the generation process.
IndisputableMonolith.Foundation.DimensionForcing dimension_forced unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Across a wide range of sampling step budgets, shortcut models consistently produce higher quality samples than previous approaches, such as consistency models and reflow.
IndisputableMonolith.Foundation.HierarchyEmergence hierarchy_emergence_forces_phi unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Compared to distillation, shortcut models reduce complexity to a single network and training phase

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 21 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention
cs.CV 2026-05 unverdicted novelty 7.0

HASTE delivers up to 1.93x speedup on Wan2.1 video DiTs via head-wise adaptive sparse attention using temporal mask reuse and error-guided per-head calibration while preserving video quality.
Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs
cs.RO 2026-05 unverdicted novelty 7.0

A new speculative inference system speeds up diffusion VLAs to 19.1 ms average latency (3.04x faster) on LIBERO by replacing most full 58 ms inferences with 7.8 ms draft rounds while preserving task performance.
DriftXpress: Faster Drifting Models via Projected RKHS Fields
cs.LG 2026-05 unverdicted novelty 7.0

DriftXpress approximates drifting kernels via projected RKHS fields to lower training cost of one-step generative models while matching original FID scores.
One-Step Generative Modeling via Wasserstein Gradient Flows
cs.LG 2026-05 conditional novelty 7.0

W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x f...
How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance
cs.LG 2026-04 unverdicted novelty 7.0

FMRG is a training-free, single-trajectory guidance method for flow models derived from optimal control that achieves strong reward alignment with only 3 NFEs.
Isokinetic Flow Matching for Pathwise Straightening of Generative Flows
cs.LG 2026-04 unverdicted novelty 7.0

Isokinetic Flow Matching adds a lightweight regularization term to flow matching that penalizes acceleration along paths via self-guided finite differences, yielding straighter trajectories and large gains in few-step...
VOSR: A Vision-Only Generative Model for Image Super-Resolution
cs.CV 2026-04 conditional novelty 7.0

VOSR shows that competitive generative image super-resolution with faithful structures can be achieved by training a diffusion-style model from scratch on visual data alone, using a vision encoder for guidance and a r...
Training Agents Inside of Scalable World Models
cs.AI 2025-09 conditional novelty 7.0

Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
Tyche: One Step Flow for Efficient Probabilistic Weather Forecasting
cs.LG 2026-05 unverdicted novelty 6.0

Tyche achieves competitive probabilistic weather forecasting skill and calibration using a single-step flow model with JVP-regularized training and rollout finetuning.
Physical Fidelity Reconstruction via Improved Consistency-Distilled Flow Matching for Dynamical Systems
cs.LG 2026-05 unverdicted novelty 6.0

Distilled one-step consistency model from optimal-transport flow-matching teacher reconstructs high-fidelity dynamical system flows from low-fidelity data with 12x speedup, half the parameters, and 23.1% better SSIM t...
OGPO: Sample Efficient Full-Finetuning of Generative Control Policies
cs.LG 2026-05 unverdicted novelty 6.0

OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.
FlowS: One-Step Motion Prediction via Local Transport Conditioning
cs.RO 2026-04 unverdicted novelty 6.0

FlowS achieves state-of-the-art single-step motion prediction on Waymo Open Motion Dataset by using scene-conditioned anchor trajectories and a step-consistent displacement field to make local transport accurate in on...
Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation
cs.CV 2026-04 unverdicted novelty 6.0

Mutual Forcing trains a single native autoregressive audio-video model with mutually reinforcing few-step and multi-step modes via self-distillation to match 50-step baselines at 4-8 steps.
FASTER: Value-Guided Sampling for Fast RL
cs.LG 2026-04 unverdicted novelty 6.0

FASTER models multi-candidate denoising as an MDP and trains a value function to filter actions early, delivering the performance of full sampling at lower cost in diffusion RL policies.
Self-Adversarial One Step Generation via Condition Shifting
cs.CV 2026-04 unverdicted novelty 6.0

APEX derives self-adversarial gradients from condition-shifted velocity fields in flow models to achieve high-fidelity one-step generation, outperforming much larger models and multi-step teachers.
MENO: MeanFlow-Enhanced Neural Operators for Dynamical Systems
cs.LG 2026-04 unverdicted novelty 6.0

MENO enhances neural operators with MeanFlow to restore multi-scale accuracy in dynamical system predictions while keeping inference costs low, achieving up to 2x better power spectrum accuracy and 12x faster inferenc...
Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation
cs.CV 2026-04 unverdicted novelty 6.0

Salt improves low-step video generation quality by adding endpoint-consistent regularization to distribution matching distillation and using cache-conditioned feature alignment for autoregressive models.
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling
cs.CV 2025-12 unverdicted novelty 6.0

WorldPlay uses dual action representation, reconstituted context memory, and context forcing distillation to produce consistent 720p streaming video at 24 FPS for interactive world modeling.
SAM 3D: 3Dfy Anything in Images
cs.CV 2025-11 unverdicted novelty 6.0

SAM 3D reconstructs 3D objects from single images with geometry, texture, and pose using human-model annotated data at scale and synthetic-to-real training, achieving 5:1 human preference wins.
Real-Time Execution of Action Chunking Flow Policies
cs.RO 2025-06 unverdicted novelty 6.0

Real-time chunking (RTC) allows diffusion- and flow-based action chunking policies to execute smoothly and asynchronously, maintaining high success rates on dynamic tasks even with significant inference latency.
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
cs.CV 2026-04 unverdicted novelty 5.0

Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemph...

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · cited by 21 Pith papers · 13 internal anchors

[1]

Lumiere: A space-time diffusion model for video generation

Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Yuanzhen Li, Tomer Michaeli, et al. Lumiere: A space-time diffusion model for video generation. arXiv preprint arXiv:2401.12945,

work page arXiv
[2]

Tract: Denoising diffusion models with transitive closure time-distillation

David Berthelot, Arnaud Autef, Jierui Lin, Dian Ang Yap, Shuangfei Zhai, Siyuan Hu, Daniel Zheng, Walter Talbott, and Eric Gu. Tract: Denoising diffusion models with transitive closure time-distillation. arXiv preprint arXiv:2303.04248,

work page arXiv
[3]

Flow map matching.arXiv preprint arXiv:2406.07507,

Nicholas M Boffi, Michael S Albergo, and Eric Vanden-Eijnden. Flow map matching.arXiv preprint arXiv:2406.07507,

work page arXiv
[4]

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Andrew Brock. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shu- ran Song. Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Consistency models made easy

Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, and J Zico Kolter. Consistency models made easy. arXiv preprint arXiv:2406.14548,

work page arXiv
[7]

Boot: Data-free dis- tillation of denoising diffusion models with bootstrapping

Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, and Joshua M Susskind. Boot: Data-free dis- tillation of denoising diffusion models with bootstrapping. InICML 2023 Workshop on Structured Probabilistic Inference {\&} Generative Modeling,

work page 2023
[8]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Auto-Encoding Variational Bayes

Diederik P Kingma. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

DiffWave: A Versatile Diffusion Model for Audio Synthesis

Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761,

work page internal anchor Pith review arXiv 2009
[11]

Implicit under-parameterization inhibits data-efficient deep reinforcement learning

Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, and Sergey Levine. Implicit under-parameterization inhibits data-efficient deep reinforcement learning. arXiv preprint arXiv:2010.14498,

work page arXiv 2010
[12]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. arXiv preprint arXiv:2210.02747,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Decoupled Weight Decay Regularization

I Loshchilov. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Knowledge distillation in iterative generative models for improved sampling speed

Eric Luhman and Troy Luhman. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388,

work page arXiv
[16]

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthe- sizing high-resolution images with few-step inference. arXiv preprint arXiv:2310.04378,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

A comprehensive survey on knowledge distillation of diffusion models.arXiv preprint arXiv:2304.04262,

Weijian Luo. A comprehensive survey on knowledge distillation of diffusion models.arXiv preprint arXiv:2304.04262,

work page arXiv
[18]

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation

Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei- Fei, Silvio Savarese, Yuke Zhu, and Roberto Mart´ın-Mart´ın. What matters in learning from offline human demonstrations for robot manipulation. arXiv preprint arXiv:2108.03298,

work page internal anchor Pith review Pith/arXiv arXiv
[19]

Mixtures of experts unlock parameter scaling for deep rl

Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, and Pablo Samuel Castro. Mixtures of experts unlock parameter scaling for deep rl. arXiv preprint arXiv:2402.08609,

work page arXiv
[20]

Progressive Distillation for Fast Sampling of Diffusion Models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models.arXiv preprint arXiv:2202.00512,

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Stylegan-xl: Scaling stylegan to large diverse datasets

Axel Sauer, Katja Schwarz, and Andreas Geiger. Stylegan-xl: Scaling stylegan to large diverse datasets. In ACM SIGGRAPH 2022 conference proceedings, pp. 1–10,

work page 2022
[22]

Adversarial diffusion dis- tillation

Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion dis- tillation. arXiv preprint arXiv:2311.17042,

work page arXiv
[23]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502,

work page internal anchor Pith review Pith/arXiv arXiv 2010
[24]

Improved techniques for training consistency models

Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models. arXiv preprint arXiv:2310.14189,

work page arXiv
[25]

Consistency Models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. arXiv preprint arXiv:2303.01469,

work page internal anchor Pith review Pith/arXiv arXiv
[26]

Kingma, Tingbo Hou, Ying Nian Wu, Kevin Patrick Murphy, Tim Salimans, Ben Poole, and Ruiqi Gao

Sirui Xie, Zhisheng Xiao, Diederik P. Kingma, Tingbo Hou, Ying Nian Wu, Kevin Patrick Murphy, Tim Salimans, Ben Poole, and Ruiqi Gao. Em distillation for one-step diffusion models. ArXiv, abs/2405.16852,

work page arXiv
[27]

Tianwei Yin, Micha ¨el Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and William T Freeman

URL https://api.semanticscholar.org/ CorpusID:270062581. Tianwei Yin, Micha ¨el Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and William T Freeman. Improved distribution matching distillation for fast image synthesis. arXiv preprint arXiv:2405.14867, 2024a. Tianwei Yin, Micha¨el Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, Will...

work page arXiv
[28]

Due compute con- straints, we cannot train models with the same compute as the best previously reported generative models

3.6 500 400M 106 Shortcut Model (XL) 3.8 128 676M 250 Shortcut Model (XL) 7.8 4 676M 250 Shortcut Model (XL) 10.6 1 676M 250 Table 2: Comparison to state-of-the-art generative models on Imagenet-256. Due compute con- straints, we cannot train models with the same compute as the best previously reported generative models. However, results demonstrate that ...

work page 2023