VAnim creates open-domain text-to-SVG animations via sparse state updates on a persistent DOM tree, identification-first planning, and rendering-aware RL with a new 134k-example benchmark.
Styleclipdraw: Coupling content and style in text-to-drawing translation
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3verdicts
UNVERDICTED 3representative citing papers
VISTA introduces a new synthetic triplet dataset and diffusion-transformer framework with style adapter that jointly models style, content, and motion to achieve state-of-the-art video style transfer.
Diffusion models exhibit a structural limitation when generating samples on low-dimensional feasible regions for constrained tasks, and sequential autoregressive generation using RL and MCTS improves constraint satisfaction.
citing papers explorer
-
VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation
VAnim creates open-domain text-to-SVG animations via sparse state updates on a persistent DOM tree, identification-first planning, and rendering-aware RL with a new 134k-example benchmark.
-
VISTA: Triplet-Supervised Video Style Transfer with Diffusion Transformers
VISTA introduces a new synthetic triplet dataset and diffusion-transformer framework with style adapter that jointly models style, content, and motion to achieve state-of-the-art video style transfer.
-
When Diffusion Breaks Constraints: Sequential Autoregressive Generation with RL and MCTS
Diffusion models exhibit a structural limitation when generating samples on low-dimensional feasible regions for constrained tasks, and sequential autoregressive generation using RL and MCTS improves constraint satisfaction.