iTryOn is a diffusion-based framework that adds spatial 3D hand guidance and semantic action-aware embeddings to handle complex garment deformations during human-clothing interactions in videos.
arXiv preprint arXiv:2306.02018 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
CameraNoise embeds camera motion into the noise space of video diffusion via Geometry-guided Reprojection Flow and noise warping to achieve faithful trajectory control while preserving the diffusion prior.
Open-source text-to-video and image-to-video diffusion models generate high-quality 1024x576 videos, with the I2V variant claimed as the first to strictly preserve reference image content.
InfiniVerse reconstructs 3D occupancy from one frame, extends scenes autoregressively, converts to video via diffusion, and uses re-projection feedback to achieve SOTA FID 6.4 and FVD 67.97 on Waymo and nuScenes.
ModelScopeT2V is a 1.7-billion-parameter text-to-video model built on Stable Diffusion that adds temporal modeling and outperforms prior methods on three evaluation metrics.
citing papers explorer
-
ModelScope Text-to-Video Technical Report
ModelScopeT2V is a 1.7-billion-parameter text-to-video model built on Stable Diffusion that adds temporal modeling and outperforms prior methods on three evaluation metrics.