Presents a taxonomy of wireheading in partially embedded agents, defines wirehead-vulnerable agents, demonstrates via AIXIjs simulation, and conjectures that specification gaming is the only other misalignment type.
Understanding agent incentives using causal influence diagrams
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2years
2019 2representative citing papers
Models AGI safety frameworks with causal influence diagrams to compare optimization objectives and causal assumptions.
citing papers explorer
-
Categorizing Wireheading in Partially Embedded Agents
Presents a taxonomy of wireheading in partially embedded agents, defines wirehead-vulnerable agents, demonstrates via AIXIjs simulation, and conjectures that specification gaming is the only other misalignment type.
-
Modeling AGI Safety Frameworks with Causal Influence Diagrams
Models AGI safety frameworks with causal influence diagrams to compare optimization objectives and causal assumptions.