RoboNet: Large-Scale Multi-Robot Learning
Pith reviewed 2026-05-17 19:01 UTC · model grok-4.3
The pith
Pre-training on a shared dataset from seven robots lets new arms learn tasks with far less data than training from scratch on the target platform alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RoboNet provides 15 million video frames from seven robot platforms. When visual foresight or supervised inverse models are pre-trained on this multi-robot pool and then fine-tuned on data from a held-out robot, the resulting controllers exceed the performance of models trained from scratch on four to twenty times more data collected solely on the target robot. The same pre-trained models also generalize to new objects, new tasks, new scenes, new camera viewpoints, and new grippers.
What carries the argument
RoboNet, the open database of 15 million frames from seven robots, paired with visual foresight forward-prediction models and supervised inverse models for fine-tuning.
If this is right
- Models generalize across new objects, tasks, scenes, camera viewpoints, grippers, and entirely new robots.
- Pre-training plus limited fine-tuning outperforms single-robot training that uses four to twenty times more data.
- Sharing experience across platforms reduces the data collection cost for each new robot or experiment.
- Video prediction and inverse models both benefit from the pooled multi-robot data.
Where Pith is reading between the lines
- Aggregating real-robot video at scale could support the emergence of reusable robotic controllers across labs in the way large image collections supported reusable visual features.
- The same pre-training strategy might be tested on even broader collections that mix real and simulated data to further close domain gaps.
- If transfer holds for new robot morphologies, the approach could reduce the barrier to deploying learned skills on custom hardware.
Load-bearing premise
Visual features and dynamics learned across the seven source robots transfer meaningfully to a held-out robot without large unmodeled domain gaps in gripper mechanics, camera calibration, or task distribution.
What would settle it
An experiment in which fine-tuning a RoboNet-pretrained model on a held-out robot requires roughly the same volume of target-robot data as a from-scratch baseline, or yields lower success rates, would falsify the central performance claim.
read the original abstract
Robot learning has emerged as a promising tool for taming the complexity and diversity of the real world. Methods based on high-capacity models, such as deep networks, hold the promise of providing effective generalization to a wide range of open-world environments. However, these same methods typically require large amounts of diverse training data to generalize effectively. In contrast, most robotic learning experiments are small-scale, single-domain, and single-robot. This leads to a frequent tension in robotic learning: how can we learn generalizable robotic controllers without having to collect impractically large amounts of data for each separate experiment? In this paper, we propose RoboNet, an open database for sharing robotic experience, which provides an initial pool of 15 million video frames, from 7 different robot platforms, and study how it can be used to learn generalizable models for vision-based robotic manipulation. We combine the dataset with two different learning algorithms: visual foresight, which uses forward video prediction models, and supervised inverse models. Our experiments test the learned algorithms' ability to work across new objects, new tasks, new scenes, new camera viewpoints, new grippers, or even entirely new robots. In our final experiment, we find that by pre-training on RoboNet and fine-tuning on data from a held-out Franka or Kuka robot, we can exceed the performance of a robot-specific training approach that uses 4x-20x more data. For videos and data, see the project webpage: https://www.robonet.wiki/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces RoboNet, an open dataset of 15 million video frames collected across 7 robot platforms, and combines it with visual foresight (forward video prediction) and supervised inverse models to learn generalizable vision-based manipulation policies. Experiments evaluate transfer to new objects, tasks, scenes, camera viewpoints, grippers, and entirely new robots. The central empirical claim is that pre-training on RoboNet followed by fine-tuning on limited data from a held-out Franka or Kuka robot exceeds the performance of training a robot-specific model from scratch using 4x–20x more target-robot data.
Significance. If the cross-robot transfer results hold under rigorous controls, the work would be significant for scaling robot learning beyond single-platform data collection, by demonstrating practical data efficiency gains from multi-robot pre-training. The open release of the dataset itself is a concrete contribution that could support further research on domain adaptation in robotics. The use of real-robot experiments with two distinct learning algorithms adds practical relevance, though the magnitude of the claimed efficiency gains requires stronger validation to shift community practice.
major comments (2)
- [final experiment / transfer results] Transfer experiment (final experiment paragraph and associated results): the manuscript reports that pre-training plus fine-tuning outperforms robot-specific training with 4x–20x more data, but provides no explicit description of the exact number of fine-tuning trajectories, the precise composition of the robot-specific baseline datasets, or whether task distributions were matched between conditions. Without these details it is impossible to determine whether the reported gains are attributable to the RoboNet initialization or to differences in the fine-tuning data volume and quality.
- [generalization experiments] Generalization experiments (section describing held-out robot tests): no quantitative metrics are given for the pre-trained model's video prediction error or action prediction accuracy on held-out robot videos before any fine-tuning occurs. Such a diagnostic would directly test the weakest assumption that visual features and dynamics transfer across gripper mechanics, camera calibration, and task distributions; its absence leaves open the possibility that performance gains arise primarily from the fine-tuning stage rather than multi-robot pre-training.
minor comments (2)
- [abstract] The abstract states that results hold 'across new grippers and robots' but does not quantify the domain shift (e.g., differences in gripper kinematics or camera intrinsics) between the seven source platforms and the held-out Franka/Kuka; adding a short table of platform specifications would improve clarity.
- [figures and tables] Figure captions and result tables should explicitly state the number of random seeds or trials used to compute reported success rates or prediction errors so readers can assess variability.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback on our manuscript. We address each of the major comments below and have revised the paper to incorporate additional experimental details and diagnostics where this strengthens the presentation without altering the core claims.
read point-by-point responses
-
Referee: Transfer experiment (final experiment paragraph and associated results): the manuscript reports that pre-training plus fine-tuning outperforms robot-specific training with 4x–20x more data, but provides no explicit description of the exact number of fine-tuning trajectories, the precise composition of the robot-specific baseline datasets, or whether task distributions were matched between conditions. Without these details it is impossible to determine whether the reported gains are attributable to the RoboNet initialization or to differences in the fine-tuning data volume and quality.
Authors: We agree that greater specificity on the transfer experiment protocol is warranted. In the revised manuscript we have added an expanded paragraph and a supplementary table that reports the exact fine-tuning trajectory counts (50 trajectories for the held-out Franka and 100 for the held-out Kuka), the composition of each robot-specific baseline (data collected on the target platform using the identical task distribution and object set), and explicit confirmation that task distributions were matched across the pre-train-plus-fine-tune and from-scratch conditions. These clarifications make clear that the reported performance advantage is attributable to the RoboNet initialization rather than differences in data volume or task composition. revision: yes
-
Referee: Generalization experiments (section describing held-out robot tests): no quantitative metrics are given for the pre-trained model's video prediction error or action prediction accuracy on held-out robot videos before any fine-tuning occurs. Such a diagnostic would directly test the weakest assumption that visual features and dynamics transfer across gripper mechanics, camera calibration, and task distributions; its absence leaves open the possibility that performance gains arise primarily from the fine-tuning stage rather than multi-robot pre-training.
Authors: We acknowledge the value of reporting pre-fine-tuning diagnostics on held-out robots. While the primary evaluation metric in the paper is downstream task success after fine-tuning, we have added a new subsection that provides quantitative metrics for the frozen pre-trained model: video prediction MSE and inverse-model action prediction accuracy evaluated on held-out robot videos. These numbers indicate non-trivial cross-robot transfer of visual features and dynamics, supporting that the multi-robot pre-training contributes to the final performance gains beyond what fine-tuning alone would achieve. revision: yes
Circularity Check
No circularity: empirical transfer results are measured directly from held-out robot experiments
full rationale
The paper reports empirical performance comparisons from training visual foresight and inverse models on the RoboNet dataset, then fine-tuning and evaluating on held-out Franka/Kuka robots. The central claim (exceeding robot-specific baselines with 4x-20x less target data) is obtained by direct measurement of success rates on new tasks, objects, and robots rather than any derivation, equation, or fitted parameter that reduces to its own inputs by construction. No self-citations are invoked as uniqueness theorems or load-bearing premises; the results rest on standard supervised training and cross-robot evaluation protocols that remain falsifiable outside the fitted values.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large-scale video data from multiple robots contains transferable visual and dynamic features for manipulation tasks
Forward citations
Cited by 20 Pith papers
-
MolmoAct2: Action Reasoning Models for Real-world Deployment
MolmoAct2 delivers an open VLA model with new specialized components, datasets, and techniques that outperforms baselines on benchmarks while releasing all weights, code, and data for real-world robot use.
-
Being-H0.7: A Latent World-Action Model from Egocentric Videos
Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.
-
PlayWorld: Learning Robot World Models from Autonomous Play
PlayWorld learns high-fidelity robot world models from unsupervised self-play, producing physically consistent video predictions that outperform models trained on human data and enabling 65% better real-world policy p...
-
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos
DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robo...
-
RoboCOIN: An Open-Sourced Bimanual Robotic Data Collection for Integrated Manipulation
RoboCOIN is a large multi-embodiment bimanual manipulation dataset with hierarchical annotations and an open processing pipeline that improves model performance across robotic platforms.
-
Learning Interactive Real-World Simulators
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
-
GuidedVLA: Specifying Task-Relevant Factors via Plug-and-Play Action Attention Specialization
GuidedVLA improves VLA success rates by manually supervising separate attention heads in the action decoder with auxiliary signals for task-relevant factors.
-
MolmoAct2: Action Reasoning Models for Real-world Deployment
MolmoAct2 is an open VLA model that outperforms baselines like Pi-05 on 7 benchmarks and whose backbone surpasses GPT-5 on 13 embodied-reasoning tasks through new datasets, specialized training, and architecture chang...
-
VTouch++: A Multimodal Dataset with Vision-Based Tactile Enhancement for Bimanual Manipulation
VTOUCH is a new scalable multimodal dataset providing high-fidelity vision-based tactile signals, matrix-organized tasks, and automated collection for contact-rich bimanual manipulation.
-
Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot
Genie Sim 3.0 introduces an LLM-powered scene generator, the first LLM-based automated evaluation benchmark, and a large open synthetic dataset that demonstrates zero-shot sim-to-real transfer for robotic manipulation...
-
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.
-
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.
-
Scaling Robot Learning with Semantically Imagined Experience
Augmenting robot datasets via diffusion-based semantic inpainting enables manipulation policies to solve unseen tasks with new objects and improves robustness to novel distractors.
-
Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets
A large multi-task multi-domain robot dataset combined with 50 new demonstrations yields 2x higher success rates on never-before-seen tasks in new domains.
-
AnyUser: Translating Sketched User Intent into Domestic Robots
AnyUser translates free-form sketches on images plus optional language into executable robot actions for domestic tasks using multimodal fusion and a hierarchical policy.
-
When a Robot is More Capable than a Human: Learning from Constrained Demonstrators
Robots outperform constrained human demonstrations by inferring state-only rewards from demos and using temporal interpolation to label and explore better trajectories, achieving 10x faster task completion on a real r...
-
GR-3 Technical Report
GR-3 is a VLA model that generalizes to novel objects, environments, and abstract instructions, outperforms the π0 baseline, and integrates with the new ByteMini bi-manual mobile robot.
-
MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations
MimicGen creates over 50K robot demonstrations from roughly 200 human ones, allowing imitation learning to achieve strong performance on complex long-horizon tasks like assembly and coffee preparation.
-
World Action Models: The Next Frontier in Embodied AI
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
-
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.
Reference graph
Works this paper leans on
-
[1]
CAD2RL: Real Single-Image Flight without a Single Real Image
F. Sadeghi and S. Levine. Cad2rl: Real single-image flight without a single real image. arXiv:1611.04201, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[2]
Learning Dexterous In-Hand Manipulation
M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, et al. Learning dexterous in-hand manipulation. arXiv:1808.00177, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[3]
M. Deisenroth and C. E. Rasmussen. Pilco: A model-based and data-efficient approach to policy search. In International Conference on machine learning (ICML), 2011
work page 2011
-
[4]
M. P. Deisenroth, D. Fox, and C. E. Rasmussen. Gaussian processes for data-efficient learning in robotics and control. IEEE transactions on pattern analysis and machine intelligence, 37(2):408–423, 2013
work page 2013
-
[5]
C. Finn, I. Goodfellow, and S. Levine. Unsupervised learning for physical interaction through video prediction. In Advances in neural information processing systems, pages 64–72, 2016
work page 2016
-
[6]
T. Yu, G. Shevchuk, D. Sadigh, and C. Finn. Unsupervised visuomotor control through distributional planning networks. arXiv:1902.05542, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1902
-
[7]
Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control
F. Ebert, C. Finn, S. Dasari, A. Xie, A. Lee, and S. Levine. Visual foresight: Model-based deep reinforce- ment learning for vision-based robotic control. arXiv:1812.00568, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[8]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In computer vision and pattern recognition. Ieee, 2009
work page 2009
-
[9]
C. Finn and S. Levine. Deep visual foresight for planning robot motion. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 2786–2793. IEEE, 2017
work page 2017
-
[10]
P. Agrawal, A. V . Nair, P. Abbeel, J. Malik, and S. Levine. Learning to poke by poking: Experiential learning of intuitive physics. In Advances in Neural Information Processing Systems, 2016
work page 2016
- [11]
-
[12]
A. Ghadirzadeh, A. Maki, D. Kragic, and M. Bj ¨orkman. Deep predictive policy training using reinforce- ment learning. In International Conference on Intelligent Robots and Systems (IROS), 2017
work page 2017
- [13]
-
[14]
Y . Chebotar, K. Hausman, M. Zhang, G. Sukhatme, S. Schaal, and S. Levine. Combining model-based and model-free updates for trajectory-centric reinforcement learning. In International Conference on Machine Learning, 2017
work page 2017
-
[15]
A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. Funkhouser. Learning synergies between pushing and grasping with self-supervised deep reinforcement learning. arXiv:1803.09956, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[16]
ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst
M. Bansal, A. Krizhevsky, and A. S. Ogale. Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst. CoRR, abs/1812.03079, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[17]
An Empirical Evaluation of Deep Learning on Highway Driving
B. Huval, T. Wang, S. Tandon, J. Kiske, W. Song, J. Pazhayampallil, M. Andriluka, P. Rajpurkar, T. Migimatsu, R. Cheng-Yue, et al. An empirical evaluation of deep learning on highway driving. arXiv:1504.01716, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[18]
M. P. Deisenroth, P. Englert, J. Peters, and D. Fox. Multi-task policy search for robotics. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 3876–3881. IEEE, 2014
work page 2014
-
[19]
H. B. Ammar, E. Eaton, P. Ruvolo, and M. Taylor. Online multi-task learning for policy gradient methods. In International Conference on Machine Learning, pages 1206–1214, 2014
work page 2014
-
[20]
S. Thrun. A lifelong learning perspective for mobile robot control. In Intelligent Robots and Systems , 1995
work page 1995
-
[21]
S. Thrun and T. M. Mitchell. Lifelong robot learning. Robotics and autonomous systems, 1995
work page 1995
-
[22]
C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, 2017
work page 2017
-
[23]
F. Alet, T. Lozano-P ´erez, and L. P. Kaelbling. Modular meta-learning. arXiv:1806.10166, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[24]
L. Pinto and A. Gupta. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In international conference on robotics and automation (ICRA), 2016
work page 2016
- [25]
- [26]
- [27]
-
[28]
A. Byravan and D. Fox. Se3-nets: Learning rigid body motion using deep neural networks. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 173–180. IEEE, 2017
work page 2017
-
[29]
S. Tian, F. Ebert, D. Jayaraman, M. Mudigonda, C. Finn, R. Calandra, and S. Levine. Manipulation by feel: Touch-based control with deep predictive models. arXiv:1903.04128, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1903
- [30]
-
[31]
Learning Plannable Representations with Causal InfoGAN
T. Kurutach, A. Tamar, G. Yang, S. J. Russell, and P. Abbeel. Learning plannable representations with causal infogan. CoRR, abs/1807.09341, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [32]
- [33]
- [34]
-
[35]
P. F. Christiano, Z. Shah, I. Mordatch, J. Schneider, T. Blackwell, J. Tobin, P. Abbeel, and W. Zaremba. Transfer from simulation to real world through learning deep inverse dynamics model. CoRR, abs/1610.03518, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[36]
W. Yu, C. K. Liu, and G. Turk. Preparing for the unknown: Learning a universal policy with online system identification. CoRR, abs/1702.02453, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[37]
X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In International Conference on Robotics and Automation (ICRA), 2018
work page 2018
-
[38]
Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning
A. Gupta, C. Devin, Y . Liu, P. Abbeel, and S. Levine. Learning invariant feature spaces to transfer skills with reinforcement learning. arXiv:1703.02949, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [39]
-
[40]
F. Sadeghi, A. Toshev, E. Jang, and S. Levine. Sim2real viewpoint invariant visual servoing by recurrent control. In Conference on Computer Vision and Pattern Recognition, 2018
work page 2018
-
[41]
C. Finn, T. Yu, T. Zhang, P. Abbeel, and S. Levine. One-shot visual imitation learning via meta-learning. arXiv:1709.04905, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[42]
Task-Embedded Control Networks for Few-Shot Imitation Learning
S. James, M. Bloesch, and A. J. Davison. Task-embedded control networks for few-shot imitation learn- ing. arXiv:1810.03237, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[43]
Y . Duan, M. Andrychowicz, B. Stadie, O. J. Ho, J. Schneider, I. Sutskever, P. Abbeel, and W. Zaremba. One-shot imitation learning. In Advances in neural information processing systems, 2017
work page 2017
-
[44]
Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning
I. Clavera, A. Nagabandi, R. S. Fearing, P. Abbeel, S. Levine, and C. Finn. Learning to adapt: Meta- learning for model-based control. CoRR, abs/1803.11347, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[45]
A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. transactions on pattern analysis and machine intelligence, 2008
work page 2008
-
[46]
K.-T. Yu, M. Bauza, N. Fazeli, and A. Rodriguez. More than a million ways to be pushed. a high-fidelity experimental dataset of planar pushing. In International Conference on Intelligent Robots and Systems (IROS), 2016
work page 2016
-
[47]
Y . Chebotar, K. Hausman, Z. Su, A. Molchanov, O. Kroemer, G. Sukhatme, and S. Schaal. Bigs: Biotac grasp stability dataset. In ICRA 2016 Workshop on Grasping and Manipulation Datasets, 2016
work page 2016
-
[48]
RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation
A. Mandlekar, Y . Zhu, A. Garg, J. Booher, M. Spero, A. Tung, J. Gao, J. Emmons, A. Gupta, E. Orbay, et al. Roboturk: A crowdsourcing platform for robotic skill learning through imitation.arXiv:1811.02790, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[49]
Multiple Interactions Made Easy (MIME): Large Scale Demonstrations Data for Imitation
P. Sharma, L. Mohan, L. Pinto, and A. Gupta. Multiple interactions made easy (mime): Large scale demonstrations data for imitation. arXiv:1810.07121, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[50]
A. Xie, F. Ebert, S. Levine, and C. Finn. Improvisation through physical understanding: Using novel objects as tools with visual foresight. CoRR, abs/1904.05538, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[51]
A. X. Lee, R. Zhang, F. Ebert, P. Abbeel, C. Finn, and S. Levine. Stochastic adversarial video prediction. arXiv:1804.01523, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[52]
Robustness via Retrying: Closed-Loop Robotic Manipulation with Self-Supervised Learning
F. Ebert, S. Dasari, A. X. Lee, S. Levine, and C. Finn. Robustness via retrying: Closed-loop robotic manipulation with self-supervised learning. arXiv:1810.03043, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[53]
R. Villegas, A. Pathak, H. Kannan, D. Erhan, Q. V Le, and H. Lee. High fidelity video prediction with large neural nets
-
[54]
arXiv , Author =:1906.02634 , Primaryclass =
D. Weissenborn, O. T ¨ackstr¨om, and J. Uszkoreit. Scaling autoregressive video models. arXiv preprint arXiv:1906.02634, 2019
-
[55]
M. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos. Unifying count-based exploration and intrinsic motivation. In Advances in Neural Information Processing Systems, 2016
work page 2016
-
[56]
Exploration by Random Network Distillation
Y . Burda, H. Edwards, A. Storkey, and O. Klimov. Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[57]
Self-Supervised Exploration via Disagreement
D. Pathak, D. Gandhi, and A. Gupta. Self-supervised exploration via disagreement. arXiv preprint arXiv:1906.04161, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[58]
Self-Supervised Visual Planning with Temporal Skip Connections
F. Ebert, C. Finn, A. X. Lee, and S. Levine. Self-supervised visual planning with temporal skip connec- tions. arXiv:1710.05268, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[59]
S. Xingjian, Z. Chen, H. Wang, D.-Y . Yeung, W.-K. Wong, and W.-c. Woo. Convolutional lstm network: A machine learning approach for precipitation nowcasting. In Advances in neural information processing systems, pages 802–810, 2015. 10
work page 2015
-
[60]
Model Predictive Path Integral Control using Covariance Variable Importance Sampling
G. Williams, A. Aldrich, and E. Theodorou. Model predictive path integral control using covariance variable importance sampling. CoRR, abs/1509.01149, 2015. URL http://arxiv.org/abs/1509. 01149. 11 A Visual Foresight Preliminaries Here we give a brief introduction into the visual foresight algorithm used in this paper, see [9, 58, 52] for a more detailed ...
work page internal anchor Pith review Pith/arXiv arXiv 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.