pith. machine review for the scientific record. sign in

arxiv: 1804.01523 · v1 · submitted 2018-04-04 · 💻 cs.CV · cs.AI· cs.LG· cs.RO

Recognition: unknown

Stochastic Adversarial Video Prediction

Authors on Pith no claims yet
classification 💻 cs.CV cs.AIcs.LGcs.RO
keywords modellatentproduceableadversarially-traineddistinctfuturefutures
0
0 comments X
read the original abstract

Being able to predict what may happen in the future requires an in-depth understanding of the physical and causal rules that govern the world. A model that is able to do so has a number of appealing applications, from robotic planning to representation learning. However, learning to predict raw future observations, such as frames in a video, is exceedingly challenging -- the ambiguous nature of the problem can cause a naively designed model to average together possible futures into a single, blurry prediction. Recently, this has been addressed by two distinct approaches: (a) latent variational variable models that explicitly model underlying stochasticity and (b) adversarially-trained models that aim to produce naturalistic images. However, a standard latent variable model can struggle to produce realistic results, and a standard adversarially-trained model underutilizes latent variables and fails to produce diverse predictions. We show that these distinct methods are in fact complementary. Combining the two produces predictions that look more realistic to human raters and better cover the range of possible futures. Our method outperforms prior and concurrent work in these aspects.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Learning Interactive Real-World Simulators

    cs.AI 2023-10 conditional novelty 7.0

    UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.

  2. Video Diffusion Models

    cs.CV 2022-04 unverdicted novelty 7.0

    A diffusion model for video generation extends image architectures with joint image-video training and improved conditional sampling, delivering first large-scale text-to-video results and state-of-the-art performance...

  3. PFGNet: A Fully Convolutional Frequency-Guided Peripheral Gating Network for Efficient Spatiotemporal Predictive Learning

    cs.CV 2026-02 unverdicted novelty 6.0

    PFGNet introduces a frequency-guided peripheral gating block in a pure convolutional architecture to enable adaptive receptive fields for efficient spatiotemporal prediction with fewer parameters than prior methods.

  4. Video Generators are Robot Policies

    cs.RO 2025-08 conditional novelty 6.0

    Training models to generate videos of robot actions produces policies that generalize better to new objects and tasks while using far less demonstration data than standard behavior cloning.

  5. Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

    cs.CV 2023-11 conditional novelty 6.0

    Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results...

  6. Is Conditional Generative Modeling all you need for Decision-Making?

    cs.LG 2022-11 unverdicted novelty 6.0

    Return-conditional diffusion models for policies outperform offline RL on benchmarks by circumventing dynamic programming and enable constraint or skill composition.

  7. VideoGPT: Video Generation using VQ-VAE and Transformers

    cs.CV 2021-04 accept novelty 6.0

    VideoGPT generates competitive natural videos by learning discrete latents with VQ-VAE and modeling them autoregressively with a transformer.