Mobilediffusion: Subsecond text-to-image generation on mobile devices.arXiv preprint arXiv:2311.16567, 2(3):4

Yang Zhao, Yanwu Xu, Zhisheng Xiao, Tingbo Hou · 2024 · arXiv 2311.16567

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.

ElasticDiT: Efficient Diffusion Transformers via Elastic Architecture and Sparse Attention for High-Resolution Image Generation on Mobile Devices

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

ElasticDiT introduces an elastic DiT architecture with adjustable spatial compression and block depth plus Shift Sparse Block Attention and a distilled VAE to enable a single model to cover multiple fidelity-latency points for high-resolution image generation on mobile devices.

ELT: Elastic Looped Transformers for Visual Generation

cs.CV · 2026-04-10 · unverdicted · novelty 6.0

Elastic Looped Transformers share weights across recurrent blocks and apply intra-loop self-distillation to deliver 4x parameter reduction while matching competitive FID and FVD scores on ImageNet and UCF-101.

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

cs.CV · 2024-10-14 · unverdicted · novelty 6.0

Sana-0.6B produces high-resolution images with strong text alignment at 20x smaller size and 100x higher throughput than Flux-12B by combining 32x image compression, linear DiT blocks, and a decoder-only LLM text encoder.

citing papers explorer

Showing 4 of 4 citing papers.

VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation cs.CV · 2026-05-22 · unverdicted · none · ref 54
VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.
ElasticDiT: Efficient Diffusion Transformers via Elastic Architecture and Sparse Attention for High-Resolution Image Generation on Mobile Devices cs.CV · 2026-05-15 · unverdicted · none · ref 53
ElasticDiT introduces an elastic DiT architecture with adjustable spatial compression and block depth plus Shift Sparse Block Attention and a distilled VAE to enable a single model to cover multiple fidelity-latency points for high-resolution image generation on mobile devices.
ELT: Elastic Looped Transformers for Visual Generation cs.CV · 2026-04-10 · unverdicted · none · ref 84
Elastic Looped Transformers share weights across recurrent blocks and apply intra-loop self-distillation to deliver 4x parameter reduction while matching competitive FID and FVD scores on ImageNet and UCF-101.
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers cs.CV · 2024-10-14 · unverdicted · none · ref 22
Sana-0.6B produces high-resolution images with strong text alignment at 20x smaller size and 100x higher throughput than Flux-12B by combining 32x image compression, linear DiT blocks, and a decoder-only LLM text encoder.

Mobilediffusion: Subsecond text-to-image generation on mobile devices.arXiv preprint arXiv:2311.16567, 2(3):4

fields

years

verdicts

representative citing papers

citing papers explorer