Understanding agent incentives using causal influence diagrams

Tom Everitt, Pedro Ortega, Elizabeth Barnes, Shane Legg · 1902 · arXiv 1902.09980

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Categorizing Wireheading in Partially Embedded Agents

cs.AI · 2019-06-21 · unverdicted · novelty 6.0

Presents a taxonomy of wireheading in partially embedded agents, defines wirehead-vulnerable agents, demonstrates via AIXIjs simulation, and conjectures that specification gaming is the only other misalignment type.

Modeling AGI Safety Frameworks with Causal Influence Diagrams

cs.AI · 2019-06-20 · accept · novelty 6.0

Models AGI safety frameworks with causal influence diagrams to compare optimization objectives and causal assumptions.

citing papers explorer

Showing 2 of 2 citing papers.

Categorizing Wireheading in Partially Embedded Agents cs.AI · 2019-06-21 · unverdicted · none · ref 9
Presents a taxonomy of wireheading in partially embedded agents, defines wirehead-vulnerable agents, demonstrates via AIXIjs simulation, and conjectures that specification gaming is the only other misalignment type.
Modeling AGI Safety Frameworks with Causal Influence Diagrams cs.AI · 2019-06-20 · accept · none · ref 12
Models AGI safety frameworks with causal influence diagrams to compare optimization objectives and causal assumptions.

Understanding agent incentives using causal influence diagrams

fields

years

verdicts

representative citing papers

citing papers explorer