Is power-seeking AI an existential risk?

Carlsmith, J · 2022 · arXiv 2206.13353

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 support 1

representative citing papers

Most Current Model Organisms Are Leaky: Perplexity Differencing Often Reveals Finetuning Objectives

cs.CL · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

Perplexity differencing on completions from short random prefills surfaces finetuning objectives in the vast majority of tested model organisms across sizes and types.

AI Loss of Control Incident Management: Response & Resilience

cs.CY · 2026-05-28 · unverdicted · novelty 5.0

Presents a taxonomy for AI loss of control incident management that distinguishes extremely costly versus impossible regaining of control and accidental versus adversarial scenarios.

Reframing AGI Confrontation with Off Earth Autonomy

cs.CY · 2026-06-18 · unverdicted · novelty 4.0

An off-Earth autonomy pathway can reduce AGI confrontation incentives by making early cooperation preferable to power-seeking on Earth.

Artificial Jagged Intelligence as Uneven Optimization Energy Allocation Capability Concentration, Redistribution, and Optimization Governance

cs.AI · 2026-05-02 · unverdicted · novelty 4.0

AJI frames jagged AI capabilities as lower bounds on performance dispersion arising from concentrated optimization energy allocation under anisotropic objectives, with theorems on tradeoffs and redistribution interventions.

AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions

cs.AI · 2024-08-23 · unverdicted · novelty 4.0

The paper introduces a taxonomy of AI safety for LLMs organized into Trustworthy AI, Responsible AI, and Safe AI perspectives, accompanied by a review of state-of-the-art methods, challenges, and future directions.

Civilizational Metamaterials: Engineering Coordination Under Capability Gradients and Structural Turbulence

physics.soc-ph · 2026-05-29 · unverdicted · novelty 3.0

Introduces phenomenological model R_eff = β(1-ρ)(1-τ)(1-γρτ) for coordination under AGI decision velocity, with phase transition and proposed randomized trial.

Position: Anthropomorphic Misalignment Research Needs Stronger Evidence

cs.CY · 2026-05-29 · unverdicted · novelty 3.0

Position paper calling for stronger evidentiary standards and a diagnostic checklist in anthropomorphic misalignment research.

Deconstructing Superintelligence: Identity, Self-Modification and Diff\'erance

cs.AI · 2026-04-21

Cognitive Comparability and the Limits of Governance: Evaluating Authority Under Radical Capability Asymmetry

cs.CY · 2026-04-03

citing papers explorer

Showing 9 of 9 citing papers.

Most Current Model Organisms Are Leaky: Perplexity Differencing Often Reveals Finetuning Objectives cs.CL · 2026-05-01 · unverdicted · none · ref 5 · 2 links
Perplexity differencing on completions from short random prefills surfaces finetuning objectives in the vast majority of tested model organisms across sizes and types.
AI Loss of Control Incident Management: Response & Resilience cs.CY · 2026-05-28 · unverdicted · none · ref 3
Presents a taxonomy for AI loss of control incident management that distinguishes extremely costly versus impossible regaining of control and accidental versus adversarial scenarios.
Reframing AGI Confrontation with Off Earth Autonomy cs.CY · 2026-06-18 · unverdicted · none · ref 6
An off-Earth autonomy pathway can reduce AGI confrontation incentives by making early cooperation preferable to power-seeking on Earth.
Artificial Jagged Intelligence as Uneven Optimization Energy Allocation Capability Concentration, Redistribution, and Optimization Governance cs.AI · 2026-05-02 · unverdicted · none · ref 7
AJI frames jagged AI capabilities as lower bounds on performance dispersion arising from concentrated optimization energy allocation under anisotropic objectives, with theorems on tradeoffs and redistribution interventions.
AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions cs.AI · 2024-08-23 · unverdicted · none · ref 100
The paper introduces a taxonomy of AI safety for LLMs organized into Trustworthy AI, Responsible AI, and Safe AI perspectives, accompanied by a review of state-of-the-art methods, challenges, and future directions.
Civilizational Metamaterials: Engineering Coordination Under Capability Gradients and Structural Turbulence physics.soc-ph · 2026-05-29 · unverdicted · none · ref 8
Introduces phenomenological model R_eff = β(1-ρ)(1-τ)(1-γρτ) for coordination under AGI decision velocity, with phase transition and proposed randomized trial.
Position: Anthropomorphic Misalignment Research Needs Stronger Evidence cs.CY · 2026-05-29 · unverdicted · none · ref 94
Position paper calling for stronger evidentiary standards and a diagnostic checklist in anthropomorphic misalignment research.
Deconstructing Superintelligence: Identity, Self-Modification and Diff\'erance cs.AI · 2026-04-21 · unreviewed · ref 6
Cognitive Comparability and the Limits of Governance: Evaluating Authority Under Radical Capability Asymmetry cs.CY · 2026-04-03 · unreviewed · ref 19

Is power-seeking AI an existential risk?

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer