Learning to Perform Physics Experiments via Deep Reinforcement Learning

Misha Denil, Pulkit Agrawal, Tejas D Kulkarni, Tom Erez, Peter Battaglia, Nando De Freitas · 2016 · stat.ML · arXiv 1611.01843

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

When encountering novel objects, humans are able to infer a wide range of physical properties such as mass, friction and deformability by interacting with them in a goal driven way. This process of active interaction is in the same spirit as a scientist performing experiments to discover hidden facts. Recent advances in artificial intelligence have yielded machines that can achieve superhuman performance in Go, Atari, natural language processing, and complex control problems; however, it is not clear that these systems can rival the scientific intuition of even a young child. In this work we introduce a basic set of tasks that require agents to estimate properties such as mass and cohesion of objects in an interactive simulated environment where they can manipulate the objects and observe the consequences. We found that state of art deep reinforcement learning methods can learn to perform the experiments necessary to discover such hidden properties. By systematically manipulating the problem difficulty and the cost incurred by the agent for performing experiments, we found that agents learn different strategies that balance the cost of gathering information against the cost of making mistakes in different situations.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Environment Probing Interaction Policies

cs.RO · 2019-07-26 · unverdicted · novelty 6.0

EPI policies use a transition-predictability reward to probe environments and condition task policies, outperforming standard generalization methods on novel test environments.

Learning What Matters: Adaptive Information-Theoretic Objectives for Robot Exploration

cs.RO · 2026-05-12 · unverdicted · novelty 6.0

QOED selects identifiable parameter directions via Fisher matrix eigenspace analysis and modifies exploration objectives to approximate ideal information gain under bounded nuisance assumptions, yielding 21-35% performance gains in robotic tasks.

citing papers explorer

Showing 2 of 2 citing papers.

Environment Probing Interaction Policies cs.RO · 2019-07-26 · unverdicted · none · ref 5 · internal anchor
EPI policies use a transition-predictability reward to probe environments and condition task policies, outperforming standard generalization methods on novel test environments.
Learning What Matters: Adaptive Information-Theoretic Objectives for Robot Exploration cs.RO · 2026-05-12 · unverdicted · none · ref 27
QOED selects identifiable parameter directions via Fisher matrix eigenspace analysis and modifies exploration objectives to approximate ideal information gain under bounded nuisance assumptions, yielding 21-35% performance gains in robotic tasks.

Learning to Perform Physics Experiments via Deep Reinforcement Learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer