YOSE accelerates DiT video object removal up to 2.5x by using BVI for adaptive token selection and DiffSim to simulate unmasked token effects, while preserving visual quality.
Outdreamer: Video out- painting with a diffusion transformer
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Seen-to-Scene unifies propagation-based and generation-based approaches for video outpainting via fine-tuned flow completion and reference-guided latent propagation to deliver superior temporal coherence and efficiency.
citing papers explorer
-
YOSE: You Only Select Essential Tokens for Efficient DiT-based Video Object Removal
YOSE accelerates DiT video object removal up to 2.5x by using BVI for adaptive token selection and DiffSim to simulate unmasked token effects, while preserving visual quality.
-
Seen-to-Scene: Keep the Seen, Generate the Unseen for Video Outpainting
Seen-to-Scene unifies propagation-based and generation-based approaches for video outpainting via fine-tuned flow completion and reference-guided latent propagation to deliver superior temporal coherence and efficiency.