Presents a framework for training empirically admissible neural heuristics via underestimating Bellman operator, asymmetric loss, and validation calibration offset, reporting reduced node expansions with no observed admissibility violations on small puzzles.
Solving the Rubik's Cube Without Human Knowledge
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
A generally intelligent agent must be able to teach itself how to solve problems in complex domains with minimal human supervision. Recently, deep reinforcement learning algorithms combined with self-play have achieved superhuman proficiency in Go, Chess, and Shogi without human data or domain knowledge. In these environments, a reward is always received at the end of the game, however, for many combinatorial optimization environments, rewards are sparse and episodes are not guaranteed to terminate. We introduce Autodidactic Iteration: a novel reinforcement learning algorithm that is able to teach itself how to solve the Rubik's Cube with no human assistance. Our algorithm is able to solve 100% of randomly scrambled cubes while achieving a median solve length of 30 moves -- less than or equal to solvers that employ human domain knowledge.
verdicts
UNVERDICTED 2representative citing papers
Hierarchical RL combines a model-based cube solver with a model-free hand controller to solve Rubik's cubes in simulation, achieving 90.3% success on 1400 random scrambles.
citing papers explorer
-
Learning Empirically Admissible Neural Heuristics for Combinatorial Search
Presents a framework for training empirically admissible neural heuristics via underestimating Bellman operator, asymmetric loss, and validation calibration offset, reporting reduced node expansions with no observed admissibility violations on small puzzles.
-
Learning to Solve a Rubik's Cube with a Dexterous Hand
Hierarchical RL combines a model-based cube solver with a model-free hand controller to solve Rubik's cubes in simulation, achieving 90.3% success on 1400 random scrambles.