pith. sign in

simple diffusion: End-to-end diffusion for high resolution images

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 1 method 1

citation-polarity summary

fields

cs.CV 4 cs.LG 1

clear filters

representative citing papers

Voxify3D: Pixel Art Meets Volumetric Rendering

cs.CV · 2025-12-08 · unverdicted · novelty 7.0

Voxify3D generates voxel art from 3D meshes via orthographic pixel supervision, patch-based CLIP alignment, and palette-constrained Gumbel-Softmax quantization, achieving 37.12 CLIP-IQA and 77.90% user preference.

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

cs.CV · 2023-11-25 · conditional · novelty 6.0

Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.

Improved Techniques for Training Consistency Models

cs.LG · 2023-10-22 · accept · novelty 6.0

Improved consistency training techniques achieve FID scores of 2.51 on CIFAR-10 and 3.25 on ImageNet 64x64 in one sampling step, outperforming prior consistency training and distillation methods.

citing papers explorer

Showing 5 of 5 citing papers.

  • Voxify3D: Pixel Art Meets Volumetric Rendering cs.CV · 2025-12-08 · unverdicted · none · ref 49

    Voxify3D generates voxel art from 3D meshes via orthographic pixel supervision, patch-based CLIP alignment, and palette-constrained Gumbel-Softmax quantization, achieving 37.12 CLIP-IQA and 77.90% user preference.

  • FREPix: Frequency-Heterogeneous Flow Matching for Pixel-Space Image Generation cs.CV · 2026-05-07 · unverdicted · none · ref 51

    FREPix achieves competitive FID scores on ImageNet by decomposing image generation into separate low- and high-frequency paths within a flow matching framework.

  • Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets cs.CV · 2023-11-25 · conditional · none · ref 44

    Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.

  • Improved Techniques for Training Consistency Models cs.LG · 2023-10-22 · accept · none · ref 6

    Improved consistency training techniques achieve FID scores of 2.51 on CIFAR-10 and 3.25 on ImageNet 64x64 in one sampling step, outperforming prior consistency training and distillation methods.

  • SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis cs.CV · 2023-07-04 · conditional · none · ref 16

    SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-the-art generators.