Recognition: 2 theorem links
· Lean TheoremDual-End Consistency Model
Pith reviewed 2026-05-16 05:36 UTC · model grok-4.3
The pith
The Dual-End Consistency Model stabilizes training and enables one-step generation by selecting three critical sub-trajectories from the PF-ODE and using a noise-to-noisy mapping.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Dual-End Consistency Model decomposes the PF-ODE trajectory and selects three critical sub-trajectories as optimization targets. It leverages continuous-time consistency model objectives for few-step distillation and flow matching as a boundary regularizer to stabilize training, while introducing a noise-to-noisy mapping to map noise to any point and alleviate first-step error accumulation, achieving a state-of-the-art FID score of 1.70 in one-step generation on the ImageNet 256x256 dataset.
What carries the argument
Dual-End Consistency Model, which selects three critical sub-trajectories from the PF-ODE decomposition and applies a noise-to-noisy mapping to ensure stable training and flexible sampling.
If this is right
- Training becomes stable by avoiding loss divergence through targeted sub-trajectory optimization.
- Few-step distillation is enabled via continuous-time CM objectives.
- First-step error accumulation is reduced by the noise-to-noisy mapping.
- The model outperforms prior CM-based one-step methods on ImageNet 256x256 with FID 1.70.
Where Pith is reading between the lines
- The selection of specific sub-trajectories could be adapted to improve stability in other ODE-based generative approaches.
- This might allow consistency models to scale to higher resolutions without additional regularization techniques.
- Real-time image generation systems could incorporate this for lower latency in applications like video synthesis.
Load-bearing premise
The assumption that exactly three critical sub-trajectories from the PF-ODE decomposition will eliminate loss divergence and first-step error accumulation without introducing new instabilities.
What would settle it
Failing to achieve an FID score below 2.0 in one-step generation on ImageNet 256x256 when implementing the three sub-trajectories and N2N mapping would falsify the effectiveness claim.
Figures
read the original abstract
The slow iterative sampling nature remains a major bottleneck for the practical deployment of diffusion and flow-based generative models. While consistency models (CMs) represent a state-of-the-art distillation-based approach for efficient generation, their large-scale application is still limited by two key issues: training instability and inflexible sampling. Existing methods seek to mitigate these problems through architectural adjustments or regularized objectives, yet overlook the critical reliance on trajectory selection. In this work, we first conduct an analysis on these two limitations: training instability originates from loss divergence induced by unstable self-supervised term, whereas sampling inflexibility arises from error accumulation. Based on these insights and analysis, we propose the Dual-End Consistency Model (DE-CM) that selects vital sub-trajectory clusters to achieve stable and effective training. DE-CM decomposes the PF-ODE trajectory and selects three critical sub-trajectories as optimization targets. Specifically, our approach leverages continuous-time CMs objectives to achieve few-step distillation and utilizes flow matching as a boundary regularizer to stabilize the training process. Furthermore, we propose a novel noise-to-noisy (N2N) mapping that can map noise to any point, thereby alleviating the error accumulation in the first step. Extensive experimental results show the effectiveness of our method: it achieves a state-of-the-art FID score of 1.70 in one-step generation on the ImageNet 256x256 dataset, outperforming existing CM-based one-step approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes the Dual-End Consistency Model (DE-CM) to overcome training instability and inflexible sampling in consistency models for diffusion/flow-based generative models. It analyzes instability as arising from loss divergence in the self-supervised term and sampling issues from first-step error accumulation. The method decomposes the PF-ODE trajectory, selects three critical sub-trajectories as optimization targets, employs continuous-time CM objectives with a flow-matching boundary regularizer, and introduces a noise-to-noisy (N2N) mapping to map noise to arbitrary points. Experiments claim a state-of-the-art one-step FID of 1.70 on ImageNet 256×256, outperforming prior CM-based approaches.
Significance. If the central claims hold under rigorous validation, the work would be significant for advancing practical one-step sampling in large-scale generative models by directly targeting trajectory-dependent instabilities. The explicit decomposition analysis and introduction of N2N mapping represent potentially useful technical contributions, but the absence of ablations, error bars, or independent verification of the three-trajectory choice reduces the strength of the significance assessment.
major comments (3)
- [Abstract / §3] Abstract and method description: The selection of exactly three critical sub-trajectories from the PF-ODE decomposition is presented as key to eliminating loss divergence and error accumulation, yet no explicit criterion for identifying 'critical' points, optimality condition, or ablation comparing 2/4/5 trajectories is provided; this choice appears load-bearing for the stability and FID claims but is unsupported by sensitivity analysis.
- [Experiments] Experimental results: The reported SOTA FID of 1.70 for one-step generation lacks error bars, multiple random seeds, or ablation tables isolating the contribution of the three sub-trajectories versus the N2N mapping and flow-matching regularizer; without these, the robustness of the central performance claim cannot be verified.
- [§4] §4 (method): The N2N mapping is introduced as a novel component to alleviate first-step error accumulation, but its effectiveness is demonstrated solely through the final FID score with no independent derivation, external benchmark, or controlled comparison decoupling it from the overall training procedure.
minor comments (2)
- [§3] Notation for PF-ODE decomposition and sub-trajectory selection could be clarified with an explicit equation or diagram showing how the three points are extracted.
- [Abstract] The abstract mentions 'extensive experimental results' but the provided details focus primarily on the final FID; additional tables or figures on training stability metrics (e.g., loss curves) would strengthen presentation.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing clarifications based on the analysis and experiments in the paper while committing to revisions that strengthen the presentation and robustness of our claims.
read point-by-point responses
-
Referee: [Abstract / §3] Abstract and method description: The selection of exactly three critical sub-trajectories from the PF-ODE decomposition is presented as key to eliminating loss divergence and error accumulation, yet no explicit criterion for identifying 'critical' points, optimality condition, or ablation comparing 2/4/5 trajectories is provided; this choice appears load-bearing for the stability and FID claims but is unsupported by sensitivity analysis.
Authors: We appreciate the referee drawing attention to this aspect. Section 3 presents the PF-ODE trajectory decomposition analysis, which identifies the three critical sub-trajectories at the specific points where the self-supervised loss term begins to diverge and where first-step sampling errors accumulate most rapidly. These correspond to the initial high-noise regime, the intermediate transition phase, and the low-noise final stage, chosen to directly target the sources of instability identified in our preliminary study. While the original submission did not include an exhaustive sensitivity analysis across alternative numbers of trajectories (due to the high computational cost of ImageNet-scale training), we agree that such analysis would improve clarity. In the revision we will add an explicit statement of the selection criterion derived from the divergence analysis and include a sensitivity table comparing results with 2, 3, 4, and 5 sub-trajectories. revision: yes
-
Referee: [Experiments] Experimental results: The reported SOTA FID of 1.70 for one-step generation lacks error bars, multiple random seeds, or ablation tables isolating the contribution of the three sub-trajectories versus the N2N mapping and flow-matching regularizer; without these, the robustness of the central performance claim cannot be verified.
Authors: We acknowledge that reporting statistical variability strengthens confidence in the results. The FID of 1.70 was obtained using the standard ImageNet 256×256 evaluation protocol and official evaluation code employed by prior consistency-model works. To address the concern, we will rerun the one-step generation experiments with at least three independent random seeds and report mean FID scores together with standard deviations. We will also expand the ablation studies in Section 5 to include tables that isolate the individual contributions of the three-sub-trajectory selection, the N2N mapping, and the flow-matching boundary regularizer, thereby clarifying their respective impacts on final performance. revision: yes
-
Referee: [§4] §4 (method): The N2N mapping is introduced as a novel component to alleviate first-step error accumulation, but its effectiveness is demonstrated solely through the final FID score with no independent derivation, external benchmark, or controlled comparison decoupling it from the overall training procedure.
Authors: The N2N mapping is formally introduced in Section 4 as a continuous mapping from pure Gaussian noise to arbitrary noisy points along the PF-ODE trajectory, directly derived to counteract the first-step error accumulation identified in our analysis of sampling inflexibility. Its effectiveness is shown through both the end-to-end FID improvement and the ablation studies that compare variants with and without the mapping. To provide a more decoupled validation, we will add a controlled experiment in the revised manuscript that fixes all other components (including the three-sub-trajectory objectives and flow-matching regularizer) and varies only the presence of N2N, reporting both trajectory-level error metrics and one-step FID. revision: partial
Circularity Check
No significant circularity detected in derivation chain
full rationale
The paper conducts an analysis of CM limitations (loss divergence from self-supervised terms and error accumulation in sampling), then proposes DE-CM as a new architecture that decomposes PF-ODE trajectories, selects three sub-trajectories, adds flow-matching regularization, and introduces an N2N mapping. These are presented as design decisions motivated by the analysis rather than any closed-form derivation or prediction that reduces to the inputs by construction. The reported 1.70 FID is an empirical training outcome on ImageNet, not a quantity forced by re-using fitted parameters or self-referential definitions. No self-citations, uniqueness theorems, or ansatzes from prior author work appear in the text to bear load on the central claims. The method remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- choice of three critical sub-trajectories
axioms (2)
- domain assumption PF-ODE trajectory exists and can be decomposed into sub-trajectories whose separate optimization yields global consistency
- domain assumption flow matching acts as an effective boundary regularizer without altering the target distribution
invented entities (1)
-
noise-to-noisy (N2N) mapping
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DE-CM decomposes the PF-ODE trajectory and selects three critical sub-trajectories as optimization targets... consistency trajectories, instantaneous trajectories and noise-to-noisy trajectories
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
utilizes flow matching as a boundary regularizer to stabilize the training process
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Blattmann, A., Dockhorn, T., Kulal, S., Mendelevitch, D., Kilian, M., Lorenz, D., Levi, Y., English, Z., Voleti, V., Letts, A., et al.: Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
arXiv preprint arXiv:2406.075072(3), 9 (2024)
Boffi, N.M., Albergo, M.S., Vanden-Eijnden, E.: Flow map matching. arXiv preprint arXiv:2406.075072(3), 9 (2024)
-
[3]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Chang, H., Zhang, H., Jiang, L., Liu, C., Freeman, W.T.: Maskgit: Masked genera- tive image transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11315–11325 (2022)
work page 2022
-
[4]
In: 2009 IEEE conference on computer vision and pattern recognition
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large- scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. pp. 248–255. Ieee (2009)
work page 2009
-
[5]
Advances in neural information processing systems34, 8780–8794 (2021)
Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. Advances in neural information processing systems34, 8780–8794 (2021)
work page 2021
-
[6]
In: Forty-first international conference on machine learning (2024)
Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al.: Scaling rectified flow transformers for high-resolution image synthesis. In: Forty-first international conference on machine learning (2024)
work page 2024
-
[7]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12873–12883 (2021)
work page 2021
-
[8]
In: Forty-first International Conference on Machine Learning (2024) 32 Linwei Dong et al
Evans, Z., Carr, C., Taylor, J., Hawley, S.H., Pons, J.: Fast timing-conditioned latent audio diffusion. In: Forty-first International Conference on Machine Learning (2024) 32 Linwei Dong et al
work page 2024
-
[9]
One Step Diffusion via Shortcut Models
Frans,K.,Hafner,D.,Levine,S.,Abbeel,P.:Onestepdiffusionviashortcutmodels. arXiv preprint arXiv:2410.12557 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[10]
Mean Flows for One-step Generative Modeling
Geng,Z.,Deng,M.,Bai,X.,Kolter,J.Z.,He,K.:Meanflowsforone-stepgenerative modeling. arXiv preprint arXiv:2505.13447 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[11]
arXiv preprint arXiv:2406.14548 (2024)
Geng, Z., Pokle, A., Luo, W., Lin, J., Kolter, J.Z.: Consistency models made easy. arXiv preprint arXiv:2406.14548 (2024)
-
[12]
arXiv preprint arXiv:2507.16884 (2025)
Guo, Y., Wang, W., Yuan, Z., Cao, R., Chen, K., Chen, Z., Huo, Y., Zhang, Y., Wang, Y., Liu, S., et al.: Splitmeanflow: Interval splitting consistency in few-step generative modeling. arXiv preprint arXiv:2507.16884 (2025)
-
[13]
Advances in neural information processing systems30(2017)
Heusel,M.,Ramsauer,H.,Unterthiner,T.,Nessler,B.,Hochreiter,S.:Ganstrained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems30(2017)
work page 2017
-
[14]
Advances in neural information processing systems33, 6840–6851 (2020)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)
work page 2020
-
[15]
Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: Low-rank adaptation of large language models. ICLR1(2), 3 (2022)
work page 2022
-
[16]
Jackyhate: Text-to-image-2m.https://huggingface.co/datasets/jackyhate/ text-to-image-2M(2025).https://doi.org/10.57967/hf/3066
-
[17]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Kang,M.,Zhu,J.Y.,Zhang,R.,Park,J.,Shechtman,E.,Paris,S.,Park,T.:Scaling up gans for text-to-image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10124–10134 (2023)
work page 2023
-
[18]
Advances in neural information processing systems35, 26565–26577 (2022)
Karras,T.,Aittala,M.,Aila,T.,Laine,S.:Elucidatingthedesignspaceofdiffusion- based generative models. Advances in neural information processing systems35, 26565–26577 (2022)
work page 2022
-
[19]
arXiv preprint arXiv:2310.02279 (2023)
Kim, D., Lai, C.H., Liao, W.H., Murata, N., Takida, Y., Uesaka, T., He, Y., Mitsu- fuji, Y., Ermon, S.: Consistency trajectory models: Learning probability flow ode trajectory of diffusion. arXiv preprint arXiv:2310.02279 (2023)
-
[20]
black-forest labs: Flux.1-dev.https://huggingface.co/black- forest- labs/ FLUX.1-dev(2024)
work page 2024
-
[21]
In: International confer- ence on machine learning
Li, J., Li, D., Xiong, C., Hoi, S.: Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In: International confer- ence on machine learning. pp. 12888–12900. PMLR (2022)
work page 2022
-
[22]
Flow Matching for Generative Modeling
Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. arXiv preprint arXiv:2210.02747 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[23]
Liu, H., Chen, Z., Yuan, Y., Mei, X., Liu, X., Mandic, D., Wang, W., Plumb- ley, M.D.: Audioldm: Text-to-audio generation with latent diffusion models. arXiv preprint arXiv:2301.12503 (2023)
-
[24]
In: Proceedings of the IEEE/CVF inter- national conference on computer vision
Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero- 1-to-3: Zero-shot one image to 3d object. In: Proceedings of the IEEE/CVF inter- national conference on computer vision. pp. 9298–9309 (2023)
work page 2023
-
[25]
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Liu, X., Gong, C., Liu, Q.: Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[26]
Decoupled Weight Decay Regularization
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[27]
Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models
Lu, C., Song, Y.: Simplifying, stabilizing and scaling continuous-time consistency models. arXiv preprint arXiv:2410.11081 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[28]
Advances in neural information processing systems35, 5775–5787 (2022) DE-CM 33
Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., Zhu, J.: Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in neural information processing systems35, 5775–5787 (2022) DE-CM 33
work page 2022
-
[29]
Machine Intelligence Research pp
Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., Zhu, J.: Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. Machine Intelligence Research pp. 1–22 (2025)
work page 2025
-
[30]
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Luo, S., Tan, Y., Huang, L., Li, J., Zhao, H.: Latent consistency mod- els: Synthesizing high-resolution images with few-step inference. arXiv preprint arXiv:2310.04378 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[31]
In: European Conference on Computer Vision
Ma, N., Goldstein, M., Albergo, M.S., Boffi, N.M., Vanden-Eijnden, E., Xie, S.: Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers. In: European Conference on Computer Vision. pp. 23–40. Springer (2024)
work page 2024
-
[32]
Peebles,W.,Xie,S.:Scalablediffusionmodelswithtransformers.In:Proceedingsof the IEEE/CVF international conference on computer vision. pp. 4195–4205 (2023)
work page 2023
-
[33]
arXiv preprint arXiv:2507.03738 (2025)
Peng, Y., Zhu, K., Liu, Y., Wu, P., Li, H., Sun, X., Wu, F.: Flow-anchored consis- tency models. arXiv preprint arXiv:2507.03738 (2025)
-
[34]
DreamFusion: Text-to-3D using 2D Diffusion
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[35]
In: International conference on machine learning
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021)
work page 2021
-
[36]
Hierarchical Text-Conditional Image Generation with CLIP Latents
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text- conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1(2), 3 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[37]
Advances in Neural Information Processing Systems37, 117340–117362 (2024)
Ren, Y., Xia, X., Lu, Y., Zhang, J., Wu, J., Xie, P., Wang, X., Xiao, X.: Hyper-sd: Trajectory segmented consistency model for efficient image synthesis. Advances in Neural Information Processing Systems37, 117340–117362 (2024)
work page 2024
-
[38]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022)
work page 2022
-
[39]
arXiv preprint arXiv:2506.14603 (2025)
Sabour, A., Fidler, S., Kreis, K.: Align your flow: Scaling continuous-time flow map distillation. arXiv preprint arXiv:2506.14603 (2025)
-
[40]
In: European Conference on Computer Vision
Sauer, A., Lorenz, D., Blattmann, A., Rombach, R.: Adversarial diffusion distilla- tion. In: European Conference on Computer Vision. pp. 87–103. Springer (2024)
work page 2024
-
[41]
In: ACM SIGGRAPH 2022 conference proceedings
Sauer, A., Schwarz, K., Geiger, A.: Stylegan-xl: Scaling stylegan to large diverse datasets. In: ACM SIGGRAPH 2022 conference proceedings. pp. 1–10 (2022)
work page 2022
-
[42]
In: International conference on machine learning
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsuper- vised learning using nonequilibrium thermodynamics. In: International conference on machine learning. pp. 2256–2265. pmlr (2015)
work page 2015
-
[43]
Denoising Diffusion Implicit Models
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[44]
arXiv preprint arXiv:2310.14189 (2023)
Song, Y., Dhariwal, P.: Improved techniques for training consistency models. arXiv preprint arXiv:2310.14189 (2023)
-
[45]
Song, Y., Dhariwal, P., Chen, M., Sutskever, I.: Consistency models (2023)
work page 2023
-
[46]
Advances in neural information processing systems32(2019)
Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems32(2019)
work page 2019
-
[47]
Score-Based Generative Modeling through Stochastic Differential Equations
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score- based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[48]
Tian, K., Jiang, Y., Yuan, Z., Peng, B., Wang, L.: Visual autoregressive modeling: Scalableimagegenerationvianext-scaleprediction.Advancesinneuralinformation processing systems37, 84839–84865 (2024) 34 Linwei Dong et al
work page 2024
-
[49]
Wan: Open and Advanced Large-Scale Video Generative Models
Wan, T., Wang, A., Ai, B., Wen, B., Mao, C., Xie, C.W., Chen, D., Yu, F., Zhao, H., Yang, J., Zeng, J., Wang, J., Zhang, J., Zhou, J., Wang, J., Chen, J., Zhu, K., Zhao, K., Yan, K., Huang, L., Feng, M., Zhang, N., Li, P., Wu, P., Chu, R., Feng, R., Zhang, S., Sun, S., Fang, T., Wang, T., Gui, T., Weng, T., Shen, T., Lin, W., Wang, W., Wang, W., Zhou, W.,...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[50]
Advances in neural information processing systems37, 83951–84009 (2024)
Wang, F.Y., Huang, Z., Bergman, A., Shen, D., Gao, P., Lingelbach, M., Sun, K., Bian, W., Song, G., Liu, Y., et al.: Phased consistency models. Advances in neural information processing systems37, 83951–84009 (2024)
work page 2024
-
[51]
Advances in neural information processing systems36, 8406–8441 (2023)
Wang, Z., Lu, C., Wang, Y., Bao, F., Li, C., Su, H., Zhu, J.: Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. Advances in neural information processing systems36, 8406–8441 (2023)
work page 2023
-
[52]
arXiv preprint arXiv:2502.16972 (2025)
Wu, Z., Fan, X., Wu, H., Cao, L.: Traflow: Trajectory distillation on pre-trained rectified flow. arXiv preprint arXiv:2502.16972 (2025)
-
[53]
In: Proceedingsofthe37thInternationalConferenceonNeuralInformationProcessing Systems
Xu, J., Liu, X., Wu, Y., Tong, Y., Li, Q., Ding, M., Tang, J., Dong, Y.: Imagere- ward: learning and evaluating human preferences for text-to-image generation. In: Proceedingsofthe37thInternationalConferenceonNeuralInformationProcessing Systems. pp. 15903–15935 (2023)
work page 2023
-
[54]
generation: Taming optimization dilemma in latent diffusion models
Yao, J., Yang, B., Wang, X.: Reconstruction vs. generation: Taming optimization dilemma in latent diffusion models. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 15703–15712 (2025)
work page 2025
-
[55]
Advances in neural information processing systems37, 47455–47487 (2024)
Yin, T., Gharbi, M., Park, T., Zhang, R., Shechtman, E., Durand, F., Freeman, B.: Improved distribution matching distillation for fast image synthesis. Advances in neural information processing systems37, 47455–47487 (2024)
work page 2024
-
[56]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Yin, T., Gharbi, M., Zhang, R., Shechtman, E., Durand, F., Freeman, W.T., Park, T.: One-step diffusion with distribution matching distillation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6613– 6623 (2024)
work page 2024
-
[57]
In: Proceedings of the IEEE/CVF international conference on computer vision
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 3836–3847 (2023)
work page 2023
-
[58]
arXiv preprint arXiv:2503.07565 (2025)
Zhou, L., Ermon, S., Song, J.: Inductive moment matching. arXiv preprint arXiv:2503.07565 (2025)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.