Asymmetric Actor Critic for Image-Based Robot Learning

Lerrel Pinto, Marcin Andrychowicz, Peter Welinder, Pieter Abbeel, Wojciech Zaremba

Authors on Pith no claims yet

classification 💻 cs.RO cs.AIcs.LG

keywords reallearningpoliciessimulatortrainingworldactorasymmetric

read the original abstract

Deep reinforcement learning (RL) has proven a powerful technique in many sequential decision making domains. However, Robotics poses many challenges for RL, most notably training on a physical system can be expensive and dangerous, which has sparked significant interest in learning control policies using a physics simulator. While several recent works have shown promising results in transferring policies trained in simulation to the real world, they often do not fully utilize the advantage of working with a simulator. In this work, we exploit the full state observability in the simulator to train better policies which take as input only partial observations (RGBD images). We do this by employing an actor-critic training algorithm in which the critic is trained on full states while the actor (or policy) gets rendered images as input. We show experimentally on a range of simulated tasks that using these asymmetric inputs significantly improves performance. Finally, we combine this method with domain randomization and show real robot experiments for several tasks like picking, pushing, and moving a block. We achieve this simulation to real world transfer without training on any real world data.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learning Hybrid-Control Policies for High-Precision In-Contact Manipulation Under Uncertainty
cs.RO 2026-04 unverdicted novelty 7.0

MATCH trains hybrid position-force RL policies that achieve up to 10% higher success rates and 5x fewer breaks than pose-only policies in fragile peg-in-hole tasks under localization uncertainty, with strong sim-to-re...
VOFA: Visual Object Goal Pushing with Force-Adaptive Control for Humanoids
cs.RO 2026-05 unverdicted novelty 6.0

VOFA combines a high-level visuomotor policy with a low-level force-adaptive controller to let humanoids push objects up to 17 kg to arbitrary goals using only noisy onboard vision, achieving over 80% real-world success.
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control
cs.LG 2026-04 unverdicted novelty 6.0

FlashSAC scales up Soft Actor-Critic with fewer updates, larger models, higher data throughput, and norm bounds to deliver faster, more stable training than PPO on high-dimensional robot control tasks across dozens of...
Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning
cs.RO 2021-08 conditional novelty 6.0

Isaac Gym achieves 2-3 orders of magnitude faster robot policy training by keeping physics simulation and PyTorch-based RL entirely on GPU with direct buffer sharing.
MUJICA: Multi-skill Unified Joint Integration of Control Architecture for Wheeled-Legged Robots
cs.RO 2026-05 unverdicted novelty 5.0

A single reinforcement learning policy jointly trains multiple locomotion skills for wheeled-legged robots with DC-motor constraints and learns a proprioceptive skill selector for adaptive behavior.
Quantifying the Utility of User Simulators for Building Collaborative LLM Assistants
cs.CL 2026-05 unverdicted novelty 5.0

Fine-tuned simulators grounded in real human data produce LLM assistants that win more often against real users than those trained against role-playing simulators.
Egocentric Tactile and Proximity Sensors as Observation Priors for Humanoid Collision Avoidance
cs.RO 2026-04 unverdicted novelty 5.0

Raw proximity measurements can substitute for explicit object localization in humanoid collision avoidance if sensing range is sufficient, and sparse non-directional proximity signals train more efficiently than dense...