DeepCache: Accelerating Diffusion Models for Free

Xinyin Ma, Gongfan Fang, Xinchao Wang · 2023 · arXiv 2312.00858

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models

cs.LG · 2026-05-26 · unverdicted · novelty 6.0

RT-Lynx shifts DiT sparsity from weights to activations, reports up to 1.55x linear-layer speedup while preserving generation quality across multiple diffusion models.

OTCache: Optimal Transport for Geometry-Aware Caching in Diffusion Models

cs.LG · 2026-06-30 · unverdicted · novelty 5.0

OTCache uses optimal transport to interpolate caching schedules between a graph-based reference and an Optuna-optimized anchor, delivering 3.66x-4.7x speedups on FLUX.1, Qwen-Image and HunyuanVideo with improved fidelity.

Inside the Latent Flow: Causal Deciphering of Attention Dynamics in Audio Separation Foundation Models

cs.SD · 2026-06-08 · unverdicted · novelty 5.0

Causal probing of attention in audio separation transformers identifies dual pathways and asynchronous convergence, enabling a training-free Layer-Selective Attention Caching method that reduces self-attention computation by ~25% with negligible quality loss.

Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs

cs.LG · 2026-06-10 · unverdicted · novelty 4.0

INT8 W8A8 post-training quantization of Ideogram 4.0 preserves FP8 quality on a 200-prompt benchmark while outperforming NF4 on CLIP score and offering a favorable quality-memory trade-off via GGUF Q4_K.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Inside the Latent Flow: Causal Deciphering of Attention Dynamics in Audio Separation Foundation Models cs.SD · 2026-06-08 · unverdicted · none · ref 46
Causal probing of attention in audio separation transformers identifies dual pathways and asynchronous convergence, enabling a training-free Layer-Selective Attention Caching method that reduces self-attention computation by ~25% with negligible quality loss.

DeepCache: Accelerating Diffusion Models for Free

fields

years

verdicts

representative citing papers

citing papers explorer