arxiv: 1910.01442 · v2 · submitted 2019-10-03 · 💻 cs.CV · cs.AI· cs.CL· cs.LG

Recognition: 2 theorem links

· Lean Theorem

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

Kexin Yi , Chuang Gan , Yunzhu Li , Pushmeet Kohli , Jiajun Wu , Antonio Torralba , Joshua B. Tenenbaum

Authors on Pith no claims yet

Pith reviewed 2026-05-16 17:57 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.CLcs.LG

keywords CLEVRERvideo reasoningcausal reasoningcollision eventsvisual question answeringtemporal reasoningcounterfactual reasoningsynthetic dataset

0 comments

The pith

CLEVRER shows video models describe collisions accurately but fail at explaining causes, predicting outcomes, or reasoning about alternatives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CLEVRER, a synthetic video dataset of simple object collisions designed to test temporal and causal reasoning rather than just visual pattern recognition. It defines four question categories drawn from theories of human causal judgment: descriptive questions about object properties, explanatory questions about what caused an event, predictive questions about future states, and counterfactual questions about what would happen under different conditions. Evaluations of existing state-of-the-art models show strong results on descriptive tasks but sharp drops on the three causal categories. The authors also present an oracle that combines perception with explicit symbolic dynamics modeling and performs much better across all question types.

Core claim

CLEVRER generates videos of colliding objects with simple appearances and annotates them with questions spanning descriptive, explanatory, predictive, and counterfactual types, demonstrating that current models succeed at perceiving visual and language inputs yet lack the ability to represent underlying dynamics and causal relations needed for the non-descriptive tasks.

What carries the argument

The CLEVRER dataset itself, which produces controlled collision videos and supplies questions in four categories to isolate perception from causal understanding.

If this is right

Video reasoning systems must combine visual perception with explicit modeling of physical dynamics and causal structure to handle explanatory, predictive, and counterfactual questions.
Symbolic representations can serve as an effective bridge between raw perception and causal inference, as shown by the oracle model's gains.
Diagnostic benchmarks that separate perception from causation can reveal limitations hidden by tasks that reward only pattern matching.
Progress on CLEVRER-style causal tasks would require architectures capable of simulating or reasoning over possible future and alternative trajectories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the collision setting to real-world footage could test whether models that pass CLEVRER also generalize when visual complexity increases.
Success on the counterfactual questions may predict better performance in planning tasks such as robotic manipulation where agents must imagine action outcomes.
The four-question structure could be adapted to other domains like human activity videos to diagnose causal gaps in social reasoning models.

Load-bearing premise

That the gap between descriptive and causal performance arises mainly from missing causal reasoning mechanisms rather than from training procedure differences or dataset-specific artifacts.

What would settle it

A model achieving near-ceiling accuracy on explanatory, predictive, and counterfactual questions after training only on CLEVRER videos and questions without any explicit physics or causal graph component would falsify the claim.

read the original abstract

The ability to reason about temporal and causal events from videos lies at the core of human intelligence. Most video reasoning benchmarks, however, focus on pattern recognition from complex visual and language input, instead of on causal structure. We study the complementary problem, exploring the temporal and causal structures behind videos of objects with simple visual appearance. To this end, we introduce the CoLlision Events for Video REpresentation and Reasoning (CLEVRER), a diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks. Motivated by the theory of human casual judgment, CLEVRER includes four types of questions: descriptive (e.g., "what color"), explanatory ("what is responsible for"), predictive ("what will happen next"), and counterfactual ("what if"). We evaluate various state-of-the-art models for visual reasoning on our benchmark. While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations. We also study an oracle model that explicitly combines these components via symbolic representations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces the CLEVRER dataset, a diagnostic benchmark consisting of videos of colliding objects with simple appearances, paired with four categories of questions (descriptive, explanatory, predictive, and counterfactual) designed to probe temporal and causal reasoning. It reports that state-of-the-art visual reasoning models achieve strong results on descriptive questions but substantially lower accuracy on the three causal question types, and shows that an oracle model combining perception modules with explicit symbolic dynamics representations obtains markedly higher causal-task performance.

Significance. If the benchmark construction and evaluation protocol are sound, the work supplies a controlled testbed that isolates causal reasoning from low-level perception challenges, thereby providing a clear signal for the community to develop models that jointly handle visual dynamics and causal inference. The oracle result supplies a concrete existence proof that hybrid symbolic-perception approaches can close the observed gap.

major comments (3)

[§3.3] §3.3 (Question Generation): the procedure for constructing and validating counterfactual questions is described only at a high level; no details are given on how the underlying physics simulator is queried to guarantee that each 'what if' question has a unique, determinate answer or on any human verification step used to filter ambiguous cases.
[§5.1] §5.1 (Model Training Protocol): the training regime applied to the evaluated baselines (MAC, NS-VQA, etc.) is not specified with respect to number of epochs, learning-rate schedule, whether perception and reasoning modules were jointly optimized on the CLEVRER training split, or whether the reported numbers reflect zero-shot transfer versus task-specific fine-tuning; this information is load-bearing for the central claim that low causal-task accuracy demonstrates an absence of causal understanding rather than an artifact of training regime.
[Table 3] Table 3 (Oracle vs. Baseline Comparison): the oracle model results are presented without standard deviations across random seeds or statistical significance tests against the strongest baseline, weakening the quantitative support for the claim that explicit symbolic dynamics yield a reliable improvement.

minor comments (2)

[Abstract] Abstract, line 4: 'human casual judgment' is a typographical error and should read 'human causal judgment'.
[Figure 2] Figure 2 caption: the description of the rendered scenes does not specify the camera viewpoint or lighting conditions used, which could affect reproducibility of the visual input.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major point below and will revise the manuscript to improve clarity on dataset construction and evaluation details.

read point-by-point responses

Referee: [§3.3] §3.3 (Question Generation): the procedure for constructing and validating counterfactual questions is described only at a high level; no details are given on how the underlying physics simulator is queried to guarantee that each 'what if' question has a unique, determinate answer or on any human verification step used to filter ambiguous cases.

Authors: We agree that additional implementation details would strengthen the presentation. In the revised manuscript we will expand §3.3 with a step-by-step description of the counterfactual generation pipeline: for each 'what-if' question we (i) parse the original scene graph and question template, (ii) edit the initial conditions in the MuJoCo-based simulator (e.g., remove the colliding object or alter its velocity), (iii) re-simulate the full trajectory to obtain a unique deterministic outcome, and (iv) map the resulting state to the answer. We also performed a human verification study on a random sample of 1,000 counterfactual questions (three annotators per question) and will report the 94% inter-annotator agreement together with the filtering criteria used to discard ambiguous cases. revision: yes
Referee: [§5.1] §5.1 (Model Training Protocol): the training regime applied to the evaluated baselines (MAC, NS-VQA, etc.) is not specified with respect to number of epochs, learning-rate schedule, whether perception and reasoning modules were jointly optimized on the CLEVRER training split, or whether the reported numbers reflect zero-shot transfer versus task-specific fine-tuning; this information is load-bearing for the central claim that low causal-task accuracy demonstrates an absence of causal understanding rather than an artifact of training regime.

Authors: We acknowledge that the training protocol details are essential for interpreting the performance gap. In the revision we will augment §5.1 with the following information: all baselines were trained from scratch on the CLEVRER training split for 25 epochs using the Adam optimizer (initial learning rate 1e-4, halved every 5 epochs if validation accuracy plateaued). Perception and reasoning modules were jointly optimized end-to-end. The numbers reported in the paper reflect task-specific fine-tuning rather than zero-shot transfer. These clarifications will make explicit that the observed weakness on causal questions persists even after full supervised training on CLEVRER. revision: yes
Referee: [Table 3] Table 3 (Oracle vs. Baseline Comparison): the oracle model results are presented without standard deviations across random seeds or statistical significance tests against the strongest baseline, weakening the quantitative support for the claim that explicit symbolic dynamics yield a reliable improvement.

Authors: We agree that statistical rigor would strengthen the comparison. Because the oracle model is fully deterministic (symbolic dynamics with perfect perception), its performance has zero variance across runs. For the neural baselines we will re-run each model with three random seeds, report mean ± standard deviation in the revised Table 3, and add a paired t-test (p < 0.01) against the strongest baseline to confirm the improvement is statistically significant. These additions will be included in the camera-ready version. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark with independent evaluations

full rationale

The paper introduces the CLEVRER dataset and reports empirical performance of existing visual-reasoning models on its four question types. No equations, parameter fits, or derivations appear in the provided text. Claims rest on direct model evaluations rather than any self-referential reduction, self-citation load-bearing premise, or ansatz smuggled via prior work. The work is therefore self-contained against external benchmarks and receives the default non-finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical dataset and benchmark paper. No free parameters, mathematical axioms, or invented entities are introduced; the contribution rests on dataset design and model evaluations.

pith-pipeline@v0.9.0 · 5543 in / 1022 out tokens · 40295 ms · 2026-05-16T17:57:07.896393+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Foundation.LawOfExistence defect_zero_iff_one unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CLEVRER includes four types of questions: descriptive (e.g., 'what color'), explanatory ('what is responsible for'), predictive ('what will happen next'), and counterfactual ('what if').

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding
cs.CV 2026-01 unverdicted novelty 8.0

Molmo2 delivers state-of-the-art open-weight video VLMs with new grounding datasets and training methods that outperform prior open models and match or exceed some proprietary ones on pointing and tracking tasks.
SYNCR: A Cross-Video Reasoning Benchmark with Synthetic Grounding
cs.CV 2026-05 unverdicted novelty 7.0

SYNCR benchmark shows leading MLLMs reach only 52.5% average accuracy on cross-video reasoning tasks against an 89.5% human baseline, with major weaknesses in physical and spatial reasoning.
Tracing the Arrow of Time: Diagnosing Temporal Information Flow in Video-LLMs
cs.CV 2026-05 unverdicted novelty 7.0

Temporal information in Video-LLMs is encoded well by video-centric encoders but disrupted by standard projectors; time-preserved MLPs plus AoT supervision yield 98.1% accuracy on arrow-of-time and gains on other temp...
PhysCodeBench: Benchmarking Physics-Aware Symbolic Simulation of 3D Scenes via Self-Corrective Multi-Agent Refinement
cs.RO 2026-04 unverdicted novelty 7.0

PhysCodeBench benchmark and SMRF multi-agent framework enable better AI generation of physically accurate 3D simulation code, boosting performance by 31 points over baselines.
Reasoning Resides in Layers: Restoring Temporal Reasoning in Video-Language Models with Layer-Selective Merging
cs.CV 2026-04 unverdicted novelty 7.0

MERIT restores temporal reasoning in VLMs via layer-selective self-attention merging guided by a TR-improving objective that penalizes TP degradation.
Bridging Time and Space: Decoupled Spatio-Temporal Alignment for Video Grounding
cs.CV 2026-04 unverdicted novelty 7.0

Bridge-STG decouples spatio-temporal alignment via semantic bridging and query-guided localization modules to achieve state-of-the-art m_vIoU of 34.3 on VidSTG among MLLM methods.
SCP: Spatial Causal Prediction in Video
cs.CV 2026-03 unverdicted novelty 7.0

SCP defines a new benchmark task for predicting spatial causal outcomes beyond direct observation and shows that 23 leading models lag far behind humans on it.
SpatialMosaic: A Multiview VLM Dataset for Partial Visibility
cs.CV 2025-12 unverdicted novelty 7.0

SpatialMosaic introduces a 2M-pair multi-view QA dataset and 1M-pair benchmark for MLLMs on spatial reasoning under partial visibility, plus a hybrid baseline that integrates 3D reconstruction models as geometry encoders.
Video Active Perception: Effective Inference-Time Long-Form Video Understanding with Vision-Language Models
cs.CV 2026-05 unverdicted novelty 6.0

VAP is a training-free active-perception method that improves zero-shot long-form video QA performance and frame efficiency up to 5.6x in VLMs by selecting keyframes that differ from priors generated by a text-conditi...
PhyCo: Learning Controllable Physical Priors for Generative Motion
cs.CV 2026-04 unverdicted novelty 6.0

PhyCo adds continuous physical control to video diffusion models via physics-supervised fine-tuning on a large simulation dataset and VLM-guided rewards, yielding measurable gains in physical realism on the Physics-IQ...
PhysLayer: Language-Guided Layered Animation with Depth-Aware Physics
cs.CV 2026-04 unverdicted novelty 6.0

PhysLayer is a framework that decomposes images into depth layers, simulates physics with depth awareness, and synthesizes videos guided by language for more plausible animations.
One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding
cs.CV 2026-04 unverdicted novelty 6.0

XComp reaches extreme video compression (one token per selective frame) via learnable progressive token compression and question-conditioned frame selection, lifting LVBench accuracy from 42.9 percent to 46.2 percent ...
MAGI-1: Autoregressive Video Generation at Scale
cs.CV 2025-05 unverdicted novelty 6.0

MAGI-1 is a 24B-parameter autoregressive video world model that predicts denoised frame chunks sequentially with increasing noise to enable causal, scalable, streaming generation up to 4M token contexts.
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
cs.CV 2024-12 unverdicted novelty 6.0

InternVL 2.5 is the first open-source MLLM to surpass 70% on the MMMU benchmark via model, data, and test-time scaling, with a 3.7-point gain from chain-of-thought reasoning.
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
cs.CV 2024-10 unverdicted novelty 6.0

LongVU adaptively compresses long video tokens using DINOv2-based frame deduplication, text-guided cross-modal selection, and temporal spatial reduction to improve video-language understanding in MLLMs with minimal de...
Long Context Transfer from Language to Vision
cs.CV 2024-06 unverdicted novelty 6.0

Extending language model context length enables LMMs to process over 200K visual tokens from long videos without video training, achieving SOTA on Video-MME via dense frame sampling.
LychSim: A Controllable and Interactive Simulation Framework for Vision Research
cs.CV 2026-05 unverdicted novelty 4.0

LychSim introduces a controllable simulation platform on Unreal Engine 5 with Python API, procedural generation, and LLM integration for vision research tasks.
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
cs.CV 2024-06 unverdicted novelty 4.0

VideoLLaMA 2 improves video LLMs via a new STC connector for spatial-temporal dynamics and joint audio training, reaching competitive results on video QA and captioning benchmarks.

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · cited by 18 Pith papers · 1 internal anchor

[1]

Generating the future with adversarial transformers , author=

work page
[2]

, author=

How, whether, why: Causal judgments as counterfactual contrasts. , author=. CogSci , year=

work page
[3]

TIST , volume=

Learning perceptual causality from video , author=. TIST , volume=. 2016 , publisher=

work page 2016
[4]

2009 , publisher=

Causality , author=. 2009 , publisher=

work page 2009
[5]

Self-supervised visual planning with temporal skip connections , author=

work page
[6]

Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids , author=

work page
[7]

2017 , file =

Ehrhardt, Sebastien and Monszpart, Aron and Mitra, Niloy and Vedaldi, Andrea , title =. 2017 , file =

work page 2017
[8]

A differentiable physics engine for deep learning in robotics , author=

work page
[9]

Two-stream convolutional networks for action recognition in videos , author=

work page
[10]

Temporal action detection with structured segment networks , author=

work page
[11]

Devnet: A deep event network for multimedia event detection and evidence recounting , author=

work page
[12]

Semantic compositional networks for visual captioning , author=

work page
[13]

Activitynet: A large-scale video benchmark for human activity understanding , author=

work page
[14]

The kinetics human action video dataset , author=

work page
[15]

Makarand Tapaswi and Yukun Zhu and Rainer Stiefelhagen and Antonio Torralba and Raquel Urtasun and Sanja Fidler , title =

work page
[16]

Learning by Asking Questions , author=

work page
[17]

Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition , author=

work page
[18]

Sequence to sequence-video to text , author=

work page
[19]

Tall: Temporal activity localization via language query , author=

work page
[20]

Localizing moments in video with natural language , author=

work page
[21]

Visual7w: Grounded question answering in images , author=

work page
[22]

Convolutional LSTM network: A machine learning approach for precipitation nowcasting , author=

work page
[23]

Blender - a 3D modelling and rendering package , author =. url =

work page
[24]

Distributed representations of words and phrases and their compositionality , author=

work page
[25]

Simple baseline for visual question answering , author=

work page
[26]

2016 , url =

Blender - a 3D modelling and rendering package , author =. 2016 , url =

work page 2016
[27]

Tian Ye and Xiaolong Wang and James Davidson and Abhinav Gupta , Title =

work page
[28]

CVPR , year=

Transparency by design: Closing the gap between performance and interpretability in visual reasoning , author=. CVPR , year=

work page
[29]

ICLR , year=

The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision , author=. ICLR , year=

work page
[30]

Lei, Jie and Yu, Licheng and Bansal, Mohit and Berg, Tamara L , booktitle=

work page
[31]

MONet: Unsupervised Scene Decomposition and Representation

Monet: Unsupervised scene decomposition and representation , author=. arXiv preprint arXiv:1901.11390 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1901
[32]

NIPS , pages=

Interaction networks for learning about objects, relations and physics , author=. NIPS , pages=

work page
[33]

CVPR , year=

Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering , author=. CVPR , year=

work page
[34]

Hudson, Drew A and Manning, Christopher D , booktitle=

work page
[35]

Zadeh, Amir and Chan, Michael and Liang, Paul Pu and Tong, Edmund and Morency, Louis-Philippe , booktitle=

work page
[36]

CVPR , year=

From recognition to cognition: Visual commonsense reasoning , author=. CVPR , year=

work page
[37]

ICLR , year=

CoPhy: Counterfactual Learning of Physical Dynamics , author=. ICLR , year=

work page
[38]

ICLR , year=

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning , author=. ICLR , year=

work page
[39]

and Neal, Radford M

Dayan, Peter and Hinton, Geoffrey E. and Neal, Radford M. and Zemel, Richard S. , title =. 1995 , volume =

work page 1995
[40]

2002 , file =

Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , title =. 2002 , file =

work page 2002
[41]

, title =

Sutskever, Ilya and Vinyals, Oriol and Le, Quoc V. , title =. 2014 , file =

work page 2014
[42]

and Zhang, Yi and Zhang, Yuting and Lee, Honglak , title =

Reed, Scott E. and Zhang, Yi and Zhang, Yuting and Lee, Honglak , title =. 2015 , file =

work page 2015
[43]

Lawrence and Farhadi, Ali , title =

Sadeghi, Fereshteh and Zitnick, C. Lawrence and Farhadi, Ali , title =. 2015 , file =

work page 2015
[44]

and Chintala, Soumith and Fergus, Rob , title =

Denton, Emily L. and Chintala, Soumith and Fergus, Rob , title =. 2015 , file =

work page 2015
[45]

and Parikh, Devi , title =

Vedantam, Ramakrishna and Lin, Xiao and Batra, Tanmay and Lawrence Zitnick, C. and Parikh, Devi , title =. 2015 , file =

work page 2015
[46]

2015 , file =

Ortiz, Luis Gilberto Mateos and Wolff, Clemens and Lapata, Mirella , title =. 2015 , file =

work page 2015
[47]

2011 , file =

Collobert, Ronan and Kavukcuoglu, Koray and Farabet, Cl. 2011 , file =

work page 2011
[48]

, title =

Mnih, Andriy and Rezende, Danilo J. , title =. 2016 , file =

work page 2016
[49]

Lan and LeCun, Yann , title =

Ranzato, Marc'Aurelio and Huang, Fu Jie and Boureau, Y. Lan and LeCun, Yann , title =. 2007 , file =

work page 2007
[50]

2015 , file =

Karpathy, Andrej and Fei-Fei, Li , title =. 2015 , file =

work page 2015
[51]

2009 , file =

Bengio, Yoshua and Louradour, J. 2009 , file =

work page 2009
[52]

, title =

Williams, Ronald J. , title =. 1992 , volume =

work page 1992
[53]

Rezende, Danilo Jimenez and Eslami, S. M. and Mohamed, Shakir and Battaglia, Peter and Jaderberg, Max and Heess, Nicolas , title =. 2016 , file =

work page 2016
[54]

2016 , file =

Dai, Jifeng and He, Kaiming and Sun, Jian , title =. 2016 , file =

work page 2016
[55]

Lawrence and Parikh, Devi , title =

Zitnick, C. Lawrence and Parikh, Devi , title =. 2013 , file =

work page 2013
[56]

2016 , file =

Johnson, Matthew and Hofmann, Katja and Hutton, Tim and Bignell, David , title =. 2016 , file =

work page 2016
[57]

2010 , volume =

Tu, Zhuowen and Bai, Xiang , title =. 2010 , volume =

work page 2010
[58]

and Welling, Max , title =

Kingma, Diederik P. and Welling, Max , title =. 2014 , file =

work page 2014
[59]

2014 , file =

Goodfellow, Ian and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua , title =. 2014 , file =

work page 2014
[60]

2016 , file =

He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , title =. 2016 , file =

work page 2016
[61]

1997 , volume =

Hochreiter, Sepp and Schmidhuber, J. 1997 , volume =

work page 1997
[62]

2015 , file =

Huang, Jonathan and Murphy, Kevin , title =. 2015 , file =

work page 2015
[63]

2015 , file =

Gregor, Karol and Danihelka, Ivo and Graves, Alex and Rezende, Danilo and Wierstra, Daan , title =. 2015 , file =

work page 2015
[64]

2016 , file =

Santoro, Adam and Bartunov, Sergey and Botvinick, Matthew and Wierstra, Daan and Lillicrap, Timothy , title =. 2016 , file =

work page 2016
[65]

2015 , file =

Ba, Jimmy and Mnih, Volodymyr and Kavukcuoglu, Koray , title =. 2015 , file =

work page 2015
[66]

2016 , file =

Rezende, Danilo Jimenez and Mohamed, Shakir and Danihelka, Ivo and Gregor, Karol and Wierstra, Daan , title =. 2016 , file =

work page 2016
[67]

and Whitney, William F

Kulkarni, Tejas D. and Whitney, William F. and Kohli, Pushmeet and Tenenbaum, Joshua B. , title =. 2015 , file =

work page 2015
[68]

2016 , file =

Chen, Xi and Duan, Yan and Houthooft, Rein and Schulman, John and Sutskever, Ilya and Abbeel, Pieter , title =. 2016 , file =

work page 2016
[69]

2016 , file =

Higgins, Irina and Matthey, Loic and Glorot, Xavier and Pal, Arka and Uria, Benigno and Blundell, Charles and Mohamed, Shakir and Lerchner, Alexander , title =. 2016 , file =

work page 2016
[70]

and Yang, Ming-Hsuan and Lee, Honglak , title =

Yang, Jimei and Reed, Scott E. and Yang, Ming-Hsuan and Lee, Honglak , title =. 2015 , file =

work page 2015
[71]

and Tian, Yuandong and Tenenbaum, Joshua B

Wu, Jiajun and Xue, Tianfan and Lim, Joseph J. and Tian, Yuandong and Tenenbaum, Joshua B. and Torralba, Antonio and Freeman, William T. , title =. 2016 , file =

work page 2016
[72]

and Dayan, Peter and Frey, Brendan J

Hinton, Geoffrey E. and Dayan, Peter and Frey, Brendan J. and Neal, Radford M. , title =. 1995 , volume =

work page 1995
[73]

2007 , volume =

Zhu, Song-Chun and Mumford, David , title =. 2007 , volume =

work page 2007
[74]

2002 , volume =

Tu, Zhuowen and Zhu, Song-Chun , title =. 2002 , volume =

work page 2002
[75]

2006 , volume =

Yuille, Alan and Kersten, Daniel , title =. 2006 , volume =

work page 2006
[76]

, title =

Jampani, Varun and Nowozin, Sebastian and Loper, Matthew and Gehler, Peter V. , title =. 2015 , volume =

work page 2015
[77]

and Freeman, William T

Wu, Jiajun and Yildirim, Ilker and Lim, Joseph J. and Freeman, William T. and Tenenbaum, Joshua B. , title =. 2015 , file =

work page 2015
[78]

and Poeppel, David , title =

Bever, Thomas G. and Poeppel, David , title =. Biolinguistics , year =

work page
[79]

and Kohli, Pushmeet and Tenenbaum, Joshua B

Kulkarni, Tejas D. and Kohli, Pushmeet and Tenenbaum, Joshua B. and Mansinghka, Vikash , title =. 2015 , file =

work page 2015
[80]

and Malik, Jitendra , title =

Barron, Jonathan T. and Malik, Jitendra , title =. 2015 , volume =

work page 2015

Showing first 80 references.