RT-Lynx shifts DiT sparsity from weights to activations, reports up to 1.55x linear-layer speedup while preserving generation quality across multiple diffusion models.
DeepCache: Accelerating Diffusion Models for Free
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
OTCache uses optimal transport to interpolate caching schedules between a graph-based reference and an Optuna-optimized anchor, delivering 3.66x-4.7x speedups on FLUX.1, Qwen-Image and HunyuanVideo with improved fidelity.
Causal probing of attention in audio separation transformers identifies dual pathways and asynchronous convergence, enabling a training-free Layer-Selective Attention Caching method that reduces self-attention computation by ~25% with negligible quality loss.
INT8 W8A8 post-training quantization of Ideogram 4.0 preserves FP8 quality on a 200-prompt benchmark while outperforming NF4 on CLIP score and offering a favorable quality-memory trade-off via GGUF Q4_K.
citing papers explorer
-
Inside the Latent Flow: Causal Deciphering of Attention Dynamics in Audio Separation Foundation Models
Causal probing of attention in audio separation transformers identifies dual pathways and asynchronous convergence, enabling a training-free Layer-Selective Attention Caching method that reduces self-attention computation by ~25% with negligible quality loss.