ContextVP: Fully Context-Aware Video Prediction

Petros Koumoutsakos; Qin Wang; Rupesh Kumar Srivastava; Wonmin Byeon

arxiv: 1710.08518 · v3 · pith:VBTPJCGQnew · submitted 2017-10-23 · 💻 cs.CV

ContextVP: Fully Context-Aware Video Prediction

Wonmin Byeon , Qin Wang , Rupesh Kumar Srivastava , Petros Koumoutsakos This is my paper

classification 💻 cs.CV

keywords predictionvideoconvolutionalnetworkspastcontextcontext-awarefully

0 comments

read the original abstract

Video prediction models based on convolutional networks, recurrent networks, and their combinations often result in blurry predictions. We identify an important contributing factor for imprecise predictions that has not been studied adequately in the literature: blind spots, i.e., lack of access to all relevant past information for accurately predicting the future. To address this issue, we introduce a fully context-aware architecture that captures the entire available past context for each pixel using Parallel Multi-Dimensional LSTM units and aggregates it using blending units. Our model outperforms a strong baseline network of 20 recurrent convolutional layers and yields state-of-the-art performance for next step prediction on three challenging real-world video datasets: Human 3.6M, Caltech Pedestrian, and UCF-101. Moreover, it does so with fewer parameters than several recently proposed models, and does not rely on deep convolutional networks, multi-scale architectures, separation of background and foreground modeling, motion flow learning, or adversarial training. These results highlight that full awareness of past context is of crucial importance for video prediction.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

World Action Models: The Next Frontier in Embodied AI
cs.RO 2026-05 unverdicted novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.