INSPATIO-WORLD is a real-time framework for high-fidelity 4D scene generation and navigation from monocular videos via STAR architecture with implicit caching, explicit geometric constraints, and distribution-matching distillation.
arXiv preprint arXiv:2412.12095 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4representative citing papers
Self Forcing trains autoregressive video diffusion models by performing autoregressive rollout with KV caching during training to close the exposure bias gap, using a holistic video-level loss and few-step diffusion for efficiency.
RepFlow combines representation learning and conditional flow matching to estimate both point and distributional causal effects while mitigating selection bias via entropically regularized Wasserstein distance on normalized latent representations.
BAGEL is a unified decoder-only model that develops emerging complex multimodal reasoning abilities after pretraining on large-scale interleaved data and outperforms prior open-source unified models.
citing papers explorer
-
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling
INSPATIO-WORLD is a real-time framework for high-fidelity 4D scene generation and navigation from monocular videos via STAR architecture with implicit caching, explicit geometric constraints, and distribution-matching distillation.
-
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Self Forcing trains autoregressive video diffusion models by performing autoregressive rollout with KV caching during training to close the exposure bias gap, using a holistic video-level loss and few-step diffusion for efficiency.
-
RepFlow: Representation Enhanced Flow Matching for Causal Effect Estimation
RepFlow combines representation learning and conditional flow matching to estimate both point and distributional causal effects while mitigating selection bias via entropically regularized Wasserstein distance on normalized latent representations.
-
Emerging Properties in Unified Multimodal Pretraining
BAGEL is a unified decoder-only model that develops emerging complex multimodal reasoning abilities after pretraining on large-scale interleaved data and outperforms prior open-source unified models.