Make a cheap scaling: A self-cascade diffusion model for higher-resolution adaptation

Lanqing Guo, Yingqing He, Haoxin Chen, Menghan Xia, Xiaodong Cun, Yufei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, et al · 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

ExtraVAR: Stage-Aware RoPE Remapping for Resolution Extrapolation in Visual Autoregressive Models

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

ExtraVAR enables resolution extrapolation in visual autoregressive models by stage-aware RoPE remapping and entropy-driven attention scaling, suppressing repetition and detail loss.

SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

cs.CV · 2026-05-21 · unverdicted · novelty 6.0

SEGA adaptively scales RoPE attention components using spectral-energy guidance from the latent to improve structural coherence and fine details in high-resolution DiT synthesis.

SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation

cs.CV · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

SwiftI2V achieves comparable 2K I2V quality to end-to-end models on VBench-I2V while cutting GPU time by 202x through low-resolution motion planning followed by strongly image-conditioned segment-wise high-resolution synthesis.

citing papers explorer

Showing 3 of 3 citing papers.

ExtraVAR: Stage-Aware RoPE Remapping for Resolution Extrapolation in Visual Autoregressive Models cs.CV · 2026-05-11 · unverdicted · none · ref 10
ExtraVAR enables resolution extrapolation in visual autoregressive models by stage-aware RoPE remapping and entropy-driven attention scaling, suppressing repetition and detail loss.
SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers cs.CV · 2026-05-21 · unverdicted · none · ref 9
SEGA adaptively scales RoPE attention components using spectral-energy guidance from the latent to improve structural coherence and fine details in high-resolution DiT synthesis.
SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation cs.CV · 2026-05-07 · unverdicted · none · ref 4 · 2 links
SwiftI2V achieves comparable 2K I2V quality to end-to-end models on VBench-I2V while cutting GPU time by 202x through low-resolution motion planning followed by strongly image-conditioned segment-wise high-resolution synthesis.

Make a cheap scaling: A self-cascade diffusion model for higher-resolution adaptation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer