Mobilediffusion: Subsecond text-to-image generation on mobile devices.arXiv preprint arXiv:2311.16567, 2(3):4

Yang Zhao, Yanwu Xu, Zhisheng Xiao, Tingbo Hou · 2023 · arXiv 2311.16567

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.

DSA: Dynamic Step Allocation for Fast Autoregressive Video Generation

cs.CV · 2026-06-03 · unverdicted · novelty 6.0

DSA adds a jointly trained confidence head to autoregressive video diffusion models that dynamically allocates fewer or more denoising steps per frame, achieving 22.63 FPS real-time generation on H100 while matching VBench quality.

ElasticDiT: Efficient Diffusion Transformers via Elastic Architecture and Sparse Attention for High-Resolution Image Generation on Mobile Devices

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

ElasticDiT introduces an elastic DiT architecture with adjustable spatial compression and block depth plus Shift Sparse Block Attention and a distilled VAE to enable a single model to cover multiple fidelity-latency points for high-resolution image generation on mobile devices.

ELT: Elastic Looped Transformers for Visual Generation

cs.CV · 2026-04-10 · unverdicted · novelty 6.0

Elastic Looped Transformers share weights across recurrent blocks and apply intra-loop self-distillation to deliver 4x parameter reduction while matching competitive FID and FVD scores on ImageNet and UCF-101.

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

cs.CV · 2024-10-14 · unverdicted · novelty 6.0

Sana-0.6B produces high-resolution images with strong text alignment at 20x smaller size and 100x higher throughput than Flux-12B by combining 32x image compression, linear DiT blocks, and a decoder-only LLM text encoder.

citing papers explorer

Showing 1 of 1 citing paper after filters.

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers cs.CV · 2024-10-14 · unverdicted · none · ref 22
Sana-0.6B produces high-resolution images with strong text alignment at 20x smaller size and 100x higher throughput than Flux-12B by combining 32x image compression, linear DiT blocks, and a decoder-only LLM text encoder.

Mobilediffusion: Subsecond text-to-image generation on mobile devices.arXiv preprint arXiv:2311.16567, 2(3):4

fields

years

verdicts

representative citing papers

citing papers explorer