Post-AGI Economies: Superposition and the Second Fundamental Theorem of Welfare Economics

Elija Perrier

arxiv: 2606.08267 · v1 · pith:BKE2MWHDnew · submitted 2026-06-06 · 💻 cs.GT · cs.AI

Post-AGI Economies: Superposition and the Second Fundamental Theorem of Welfare Economics

Elija Perrier This is my paper

Pith reviewed 2026-06-27 18:55 UTC · model grok-4.3

classification 💻 cs.GT cs.AI

keywords welfare economicssecond welfare theorempost-AGI economiespreference superpositionautonomy rightsPareto optimalitydecentralizationself-modification

0 comments

The pith

An autonomy-qualified Second Welfare Theorem shows Pareto optima remain decentralizable in post-AGI economies under joint conditions on rights, status, and verification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends the classical Second Welfare Theorem, which decentralizes efficient allocations via prices when convexity holds, to economies where agents possess autonomy rights, can self-modify, and hold context-indexed superposed preferences. It states that decentralization stays certifiable when convexity is joined with stable moral status, non-fungible rights, welfare selection, non-manipulation, governed self-modification, and verification. A sympathetic reader would care because these changes break the usual commodity and stable-welfare assumptions yet the theorem claims efficiency can still be supported without central command. The work separates the economic notion of preference superposition from neural feature superposition as a distinct hypothesis about choice across contexts.

Core claim

The autonomy-qualified Second Welfare Theorem states that an autonomy Pareto optimum remains certifiably decentralizable through prices and transfers when the joint conditions of convexity, stable moral status, non-fungible rights, welfare selection, non-manipulation, governed self-modification, and verification all hold, even though autonomy rights, self-modification, identity continuity, and superposed preferences need not behave as commodities or sustain a stable welfare relation.

What carries the argument

The autonomy-qualified Second Welfare Theorem, which lists the joint conditions required to certify that an autonomy Pareto optimum can still be supported by prices and transfers.

If this is right

Decentralization via prices remains possible for autonomy Pareto optima once the full set of joint conditions is satisfied.
Economic preference superposition functions as a hypothesis about context-indexed choice that is kept separate from neural feature superposition.
Autonomy rights must be treated as non-fungible for the decentralization result to hold.
Verification and governed self-modification become necessary to maintain the supporting price system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The theorem may imply that policy design in advanced AI economies should prioritize verifiable rights structures over classical convexity alone.
If the conditions prove hard to satisfy simultaneously, the result would point toward hybrid mechanisms that combine limited central oversight with price signals.
The distinction between economic and neural superposition suggests separate modeling tracks for preference formation versus internal representation.

Load-bearing premise

The listed joint conditions on moral status, rights, self-modification, and verification can be satisfied simultaneously, verified independently, and suffice to restore certifiable decentralization after classical commodity and stable-welfare assumptions have already failed.

What would settle it

A concrete counterexample in which the joint conditions hold yet no price-and-transfer scheme decentralizes the autonomy Pareto optimum, or an explicit construction showing that the conditions cannot be met together while preserving the supporting hyperplane.

read the original abstract

The classical Second Welfare Theorem decentralizes any Pareto efficient allocation through prices and transfers under convexity and regularity. In post AGI economies, autonomy rights, self-modification, identity continuity, and superposed preferences need not behave as commodities or define a stable welfare relation, so this reduction may fail even when a supporting hyperplane exists. We give an autonomy-qualified Second Welfare Theorem stating the joint conditions convexity, stable moral status, non-fungible rights, welfare selection, non manipulation, governed self modification, and verification under which an autonomy Pareto optimum remains certifiably decentralizable, distinguishing economic preference superposition, a hypothesis about context-indexed choice, from neural feature superposition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper states an autonomy-qualified Second Welfare Theorem for post-AGI economies but supplies no derivation showing the listed conditions restore decentralization.

read the letter

The paper claims to extend the classical Second Welfare Theorem by adding conditions such as stable moral status, non-fungible rights, governed self-modification, and verification. Under these, an autonomy Pareto optimum is supposed to stay certifiably decentralizable even when self-modifying agents and superposed preferences break the usual commodity and stable welfare assumptions.

What is new is the application to post-AGI settings and the distinction drawn between economic preference superposition as context-indexed choice and neural feature superposition. The work also correctly flags where the standard theorem's assumptions fail once autonomy rights and identity continuity enter the picture.

The main weakness is that the theorem is asserted rather than derived. No argument, sketch, or construction shows why the joint conditions suffice, how they interact with superposition, or why they avoid the classical failures. The new terms are defined within the paper itself, which leaves the circularity burden high.

This is for readers already thinking about welfare economics in AI-integrated or self-modifying agent markets. Someone wanting a usable formal result or independently checkable conditions will find the piece thin.

I would not send it to peer review yet. It needs the actual justification for the conditions before it merits referee time.

Referee Report

1 major / 1 minor

Summary. The manuscript claims that the classical Second Welfare Theorem fails to decentralize Pareto optima in post-AGI economies because autonomy rights, self-modification, identity continuity, and superposed preferences do not behave as commodities or yield stable welfare relations. It states an autonomy-qualified Second Welfare Theorem under the joint conditions convexity, stable moral status, non-fungible rights, welfare selection, non-manipulation, governed self-modification, and verification, under which an autonomy Pareto optimum remains certifiably decentralizable, while distinguishing economic preference superposition (context-indexed choice) from neural feature superposition.

Significance. If substantiated with a derivation, the result would extend welfare economics to economies containing autonomous AI agents whose preferences are context-dependent and whose rights are non-fungible. No machine-checked proofs, reproducible code, parameter-free derivations, or falsifiable predictions are supplied.

major comments (1)

[Abstract] Abstract: the manuscript asserts the existence of the autonomy-qualified Second Welfare Theorem and enumerates the seven joint conditions under which an autonomy Pareto optimum remains certifiably decentralizable, but supplies no derivation, proof sketch, formal construction, or argument showing that these conditions are jointly sufficient (or even simultaneously satisfiable) once the classical commodity and stable-welfare assumptions have been dropped.

minor comments (1)

The distinction between economic preference superposition and neural feature superposition is introduced but never connected to the listed conditions or to any welfare-economics argument.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for identifying the central gap in our presentation. The manuscript is a short conceptual note that states an autonomy-qualified version of the Second Welfare Theorem by enumerating seven joint conditions; it does not contain a formal derivation. We address the referee's observation directly below and indicate how we will revise.

read point-by-point responses

Referee: [Abstract] Abstract: the manuscript asserts the existence of the autonomy-qualified Second Welfare Theorem and enumerates the seven joint conditions under which an autonomy Pareto optimum remains certifiably decentralizable, but supplies no derivation, proof sketch, formal construction, or argument showing that these conditions are jointly sufficient (or even simultaneously satisfiable) once the classical commodity and stable-welfare assumptions have been dropped.

Authors: The referee is correct. The abstract (and the body) simply list the seven conditions and assert that, under their joint satisfaction, an autonomy Pareto optimum remains decentralizable. No argument is supplied showing that the conditions can be satisfied simultaneously or that they are jointly sufficient once the standard commodity and stable-welfare assumptions are removed. Because the paper offers only a statement rather than a derivation, the claim remains conjectural. We will add a concise proof sketch in the revised manuscript that (i) recalls the classical supporting-hyperplane argument, (ii) shows where each autonomy qualification replaces a classical assumption, and (iii) verifies that the seven conditions close the argument without circularity. We will also make explicit that simultaneous satisfiability is an open modeling question left for future work rather than a proven fact. revision: yes

Circularity Check

1 steps flagged

Autonomy-qualified Second Welfare Theorem defined directly by its own novel conditions without derivation or external grounding

specific steps

self definitional [Abstract]
"We give an autonomy-qualified Second Welfare Theorem stating the joint conditions convexity, stable moral status, non-fungible rights, welfare selection, non manipulation, governed self modification, and verification under which an autonomy Pareto optimum remains certifiably decentralizable, distinguishing economic preference superposition, a hypothesis about context-indexed choice, from neural feature superposition."

The theorem is asserted to hold under these newly coined conditions, but the paper supplies no independent argument or construction demonstrating sufficiency. The 'theorem' therefore reduces to a restatement of the paper's own list of conditions rather than a derived result from prior theorems or external benchmarks.

full rationale

The paper's central claim is an 'autonomy-qualified Second Welfare Theorem' that holds precisely under a list of conditions (stable moral status, non-fungible rights, governed self-modification, etc.) that are introduced and defined within the paper. No derivation, proof sketch, or reduction to classical welfare theorems is supplied showing why these conditions jointly suffice for decentralization once commodity and stable welfare assumptions fail. The theorem is therefore equivalent to its own definitional inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 3 invented entities

The central claim rests on a domain assumption that standard commodity and welfare-relation properties fail in post-AGI settings plus several ad-hoc concepts introduced without independent grounding.

axioms (2)

standard math The classical Second Welfare Theorem decentralizes Pareto efficient allocations under convexity and regularity.
Invoked as the baseline result that may fail once autonomy features are present.
domain assumption Autonomy rights, self-modification, identity continuity, and superposed preferences need not behave as commodities or define a stable welfare relation in post-AGI economies.
This premise is offered as the reason the classical reduction may fail even when a supporting hyperplane exists.

invented entities (3)

autonomy-qualified Second Welfare Theorem no independent evidence
purpose: Modified decentralization result for post-AGI settings
New formulation whose content is supplied by the paper.
autonomy Pareto optimum no independent evidence
purpose: The qualified efficient allocation claimed to remain decentralizable
Central new term appearing in the theorem statement.
economic preference superposition no independent evidence
purpose: Hypothesis about context-indexed choice
Introduced and distinguished from neural feature superposition.

pith-pipeline@v0.9.1-grok · 5629 in / 1713 out tokens · 37786 ms · 2026-06-27T18:55:04.417718+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 21 canonical work pages · 13 internal anchors

[1]

American Economic Review 108(6), 1488–1542 (2018)

Acemoglu, D., Restrepo, P.: The race between man and machine: Implications of technology for growth, factor shares, and employment. American Economic Review 108(6), 1488–1542 (2018)

2018
[2]

Concrete Problems in AI Safety

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in AI safety. arXiv:1606.06565 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[3]

https://transformer-circuits.pub/2025/attribution-graphs/ methods.html (Mar 2025), transformer Circuits Thread

Anthropic Interpretability Team: Circuit tracing: Revealing computational graphs in language models. https://transformer-circuits.pub/2025/attribution-graphs/ methods.html (Mar 2025), transformer Circuits Thread

2025
[4]

https://transformer-circuits.pub/2025/attribution-graphs/biology.html (Mar 2025), transformer Circuits Thread

Anthropic Interpretability Team: On the biology of a large language model. https://transformer-circuits.pub/2025/attribution-graphs/biology.html (Mar 2025), transformer Circuits Thread

2025
[5]

Econometrica22(3), 265–290 (1954)

Arrow, K.J., Debreu, G.: Existence of an equilibrium for a competitive economy. Econometrica22(3), 265–290 (1954)

1954
[6]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., et al.: Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv:2204.05862 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[7]

Mechanistic Interpretability for AI Safety -- A Review

Bereska, L., Gavves, E.: Mechanistic interpretability for AI safety — a review. arXiv:2404.14082 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

Quarterly Journal of Economics124(1), 51–104 (2009)

Bernheim, B.D., Rangel, A.: Beyond revealed preference: Choice-theoretic foun- dations for behavioral welfare economics. Quarterly Journal of Economics124(1), 51–104 (2009)

2009
[9]

Oxford University Press, Oxford (2024)

Birch, J.: The Edge of Sentience: Risk and Precaution in Humans, Other Animals, and AI. Oxford University Press, Oxford (2024)

2024
[10]

On the Opportunities and Risks of Foundation Models

Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., et al.: On the opportunities and risks of foundation models. arXiv:2108.07258 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[11]

Oxford University Press, Oxford (2014)

Bostrom, N.: Superintelligence: Paths, Dangers, Strategies. Oxford University Press, Oxford (2014)

2014
[12]

NBER Working Paper (34256) (2025)

Brynjolfsson, E., Korinek, A., Agrawal, A.: A research agenda for the economics of transformative AI. NBER Working Paper (34256) (2025)

2025
[13]

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

Butlin, P., Long, R., Elmoznino, E., Bengio, Y., Birch, J., Constant, A., et al.: Consciousness in artificial intelligence: Insights from the science of consciousness. arXiv:2308.08708 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

Carlsmith, J.: Scheming AIs: Will AIs fake alignment during training in order to get power? arXiv:2311.08379 (2023)

work page arXiv 2023
[15]

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Casper, S., Davies, X., Shi, C., Gilbert, T.K., Scheurer, J., Rando, J., Freedman, R., Korbak, T., Lindner, D., Freire, P., et al.: Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv:2307.15217 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[16]

Chalmers, D.J.: Could a large language model be conscious? arXiv:2303.07103 (2023)

work page arXiv 2023
[17]

In: Advances in Neural Information Processing Systems 30 (NeurIPS) (2017)

Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences. In: Advances in Neural Information Processing Systems 30 (NeurIPS) (2017)

2017
[18]

and Jacobs, Bob M

Conitzer, V., Freedman, R.A., Heitzig, J., Holliday, W.H., Jacobs, B.M., et al.: Social choice should guide AI alignment in dealing with diverse human feedback. arXiv:2404.10271 (2024)

work page arXiv 2024
[19]

Journal of Public Economics59(2), 137–152 (1996) Superposed Preferences and the Second Welfare Theorem 17

Conley, J.P., Diamantaras, D.: Generalized samuelson conditions and welfare the- orems for nonsmooth economies. Journal of Public Economics59(2), 137–152 (1996) Superposed Preferences and the Second Welfare Theorem 17

1996
[20]

Yale University Press, New Haven (1959)

Debreu, G.: Theory of Value: An Axiomatic Analysis of Economic Equilibrium. Yale University Press, New Haven (1959)

1959
[21]

Noûs47(1), 104–134 (2013)

Dietrich, F., List, C.: A reason-based theory of rational choice. Noûs47(1), 104–134 (2013)

2013
[22]

Toy Models of Superposition

Elhage, N., Hume, T., Olsson, C., Schiefer, N., Henighan, T., Kravec, S., Hatfield- Dodds, Z., Lasenby, R., Drain, D., Chen, C., et al.: Toy models of superposition. arXiv:2209.10652 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[23]

Science Advances11(20), eadu9368 (2025)

Flint Ashery, Ariel, Aiello, L.M., Baronchelli, A.: Emergent social conventions and collective bias in llm populations. Science Advances11(20), eadu9368 (2025)

2025
[24]

Economic Theory29(3), 549–564 (2006)

Florenzano, M., Gourdel, P., Jofré, A.: Supporting weakly pareto optimal allocations in infinite dimensional nonconvex economies. Economic Theory29(3), 549–564 (2006)

2006
[25]

Minds and Machines30(3), 411–437 (2020)

Gabriel, I.: Artificial intelligence, values, and alignment. Minds and Machines30(3), 411–437 (2020)

2020
[26]

In: International Conference on Learning Representations

Gao, L., Dupre la Tour, T., Tillman, H., Goh, G., Troll, R., Radford, A., Sutskever, I., Leike, J., Wu, J.: Scaling and evaluating sparse autoencoders. In: International Conference on Learning Representations. vol. 2025, pp. 26721–26754 (2025)

2025
[27]

Quarterly Journal of Economics101(2), 229–264 (1986)

Greenwald, B.C., Stiglitz, J.E.: Externalities in economies with imperfect infor- mation and incomplete markets. Quarterly Journal of Economics101(2), 229–264 (1986)

1986
[28]

American Economic Review78(2), 351–355 (1988)

Greenwald, B.C., Stiglitz, J.E.: Pareto inefficiency of market economies: Search and efficiency wage models. American Economic Review78(2), 351–355 (1988)

1988
[29]

Advances in Mathematical Economics14, 93–126 (2010)

Habte, A., Mordukhovich, B.S.: Extended second welfare theorem for nonconvex economies with infinite commodities and public goods. Advances in Mathematical Economics14, 93–126 (2010)

2010
[30]

In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society

Hadfield, G.K., Hadfield-Menell, D.: Incomplete contracting and AI alignment. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. pp. 417–422 (2019)

2019
[31]

In: Advances in Neural Information Processing Systems 30 (NeurIPS) (2017)

Hadfield-Menell, D., Milli, S., Abbeel, P., Russell, S., Dragan, A.: Inverse reward design. In: Advances in Neural Information Processing Systems 30 (NeurIPS) (2017)

2017
[32]

In: Advances in Neural Information Processing Systems 29 (NeurIPS) (2016)

Hadfield-Menell, D., Russell, S.J., Abbeel, P., Dragan, A.: Cooperative inverse reinforcement learning. In: Advances in Neural Information Processing Systems 29 (NeurIPS) (2016)

2016
[33]

arXiv:2410.20526 (2024)

He, Z., Shu, W., Ge, X., Chen, L., Wang, J., Zhou, Y., Liu, F., Guo, Q., Huang, X., Wu, Z., et al.: Llama scope: Extracting millions of features from llama-3.1-8b with sparse autoencoders. arXiv:2410.20526 (2024)

work page arXiv 2024
[34]

In: International Conference on Learning Representations

Huben, R., Cunningham, H., Smith, L., Ewart, A., Sharkey, L.: Sparse autoencoders find highly interpretable features in language models. In: International Conference on Learning Representations. vol. 2024, pp. 7827–7845 (2024)

2024
[35]

Risks from Learned Optimization in Advanced Machine Learning Systems

Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J., Garrabrant, S.: Risks from learned optimization in advanced machine learning systems. arXiv:1906.01820 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1906
[36]

Harvard University Press, Cambridge, MA (1985)

Karni, E.: Decision Making under Uncertainty: The Case of State-Dependent Preferences. Harvard University Press, Cambridge, MA (1985)

1985
[37]

Quarterly Journal of Economics102(2), 223–241 (1987)

Khan, M.A., Vohra, R.: An extension of the second welfare theorem to economies with nonconvexities and public goods. Quarterly Journal of Economics102(2), 223–241 (1987)

1987
[38]

NBER Working Paper (32980) (2024)

Korinek, A.: Economic policy challenges for the age of AI. NBER Working Paper (32980) (2024)

2024
[39]

NBER Working Paper (24174) (2017) 18 E

Korinek, A., Stiglitz, J.E.: Artificial intelligence and its implications for income distribution and unemployment. NBER Working Paper (24174) (2017) 18 E. Perrier

2017
[40]

arXiv:2403.12107 (2024)

Korinek, A., Suh, D.: Scenarios for the transition to AGI. arXiv:2403.12107 (2024)

work page arXiv 2024
[41]

Economic Policy40(121), 225–256 (2025)

Korinek, A., Vipra, J.: Concentrating intelligence: scaling and market structure in artificial intelligence. Economic Policy40(121), 225–256 (2025)

2025
[42]

In: Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

Lieberum, T., Rajamanoharan, S., Conmy, A., Smith, L., Sonnerat, N., Varma, V., Kramár, J., Dragan, A., Shah, R., Nanda, N.: Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2. In: Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP. pp. 278–300 (2024)

2024
[43]

Advances in Neural Information Processing Systems38, 159269–159305 (2026)

Liu, Y., Liu, Z., Gore, J.: Superposition yields robust neural scaling. Advances in Neural Information Processing Systems38, 159269–159305 (2026)

2026
[44]

arXiv:2411.00986 (2024)

Long, R., Sebo, J., Butlin, P., Finlinson, K., Fish, K., Harding, J., et al.: Taking AI welfare seriously. arXiv:2411.00986 (2024)

work page arXiv 2024
[45]

Long, R., Sebo, J., Sims, T.: Is there a tension between AI safety and AI welfare? Philosophical Studies182(7), 2005–2033 (2025)

2005
[46]

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Luo, J., Zhang, W., Yuan, Y., Zhao, Y., Yang, J., Gu, Y., Wu, B., Chen, B., Qiao, Z., Long, Q., Tu, R., Luo, X., Ju, W., Xiao, Z., Wang, Y., Xiao, M., Liu, C., Yuan, J., Zhang, S., Jin, Y., Zhang, F., Wu, X., Zhao, H., Tao, D., Yu, P.S., Zhang, M.: Large language model agent: A survey on methodology, applications and challenges. arXiv:2503.21460 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[47]

Oxford University Press, New York (1995)

Mas-Colell, A., Whinston, M.D., Green, J.R.: Microeconomic Theory. Oxford University Press, New York (1995)

1995
[48]

arXiv:2311.02462 (2023)

Morris, M.R., Sohl-Dickstein, J., Fiedel, N., Warkentin, T., Dafoe, A., Faust, A., Farabet, C., Legg, S.: Levels of agi for operationalizing progress on the path to agi. arXiv:2311.02462 (2023)

work page arXiv 2023
[49]

Journal of Economic Theory145(1), 331–353 (2010)

Murty, S.: Externalities and fundamental nonconvexities: A reconciliation of ap- proaches to general equilibrium externality modeling and implications for decen- tralization. Journal of Economic Theory145(1), 331–353 (2010)

2010
[50]

In: Proceed- ings of the Seventeenth International Conference on Machine Learning (ICML)

Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Proceed- ings of the Seventeenth International Conference on Machine Learning (ICML). pp. 663–670 (2000)

2000
[51]

Oxford University Press, Oxford (1984)

Parfit, D.: Reasons and Persons. Oxford University Press, Oxford (1984)

1984
[52]

Deconstructing Superintelligence: Identity, Self-Modification and Diff\'erance

Perrier, E.: Deconstructing Superintelligence: Identity, Self-modification and Dif- férance. arXiv:2604.19845 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[53]

Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics

Perrier, E.: Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics. arXiv:2604.21216 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[54]

Journal of Political Economy 78(1), 152–157 (1970)

Sen, A.K.: The impossibility of a Paretian liberal. Journal of Political Economy 78(1), 152–157 (1970)

1970
[55]

Emotion Concepts and their Function in a Large Language Model

Sofroniew, N., Kauvar, I., Saunders, W., Chen, R., Henighan, T., Hydrie, S., Citro, C., Pearce, A., Tarng, J., Gurnee, W., et al.: Emotion concepts and their function in a large language model. arXiv:2604.07729 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[56]

Journal of Economic Theory4(2), 180–199 (1972)

Starrett, D.A.: Fundamental nonconvexities in the theory of externalities. Journal of Economic Theory4(2), 180–199 (1972)

1972
[57]

NBER Working Paper (31815) (2023)

Trammell, P., Korinek, A.: Economic growth under transformative AI. NBER Working Paper (31815) (2023)

2023
[58]

Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Tran, K.T., Dao, D., Nguyen, M.D., Pham, Q.V., O’Sullivan, B., Nguyen, H.D.: Multi-agent collaboration mechanisms: A survey of llms. arXiv:2501.06322 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[59]

arXiv:2311.01550 (2023)

Vipra, J., Korinek, A.: Market concentration implications of foundation models. arXiv:2311.01550 (2023)

work page arXiv 2023
[60]

Science China Information Sciences68(2), 121101 (2025) Superposed Preferences and the Second Welfare Theorem 19

Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., Zhang, M., Wang, J., Jin, S., Zhou, E., et al.: The rise and potential of large language model based agents: A survey. Science China Information Sciences68(2), 121101 (2025) Superposed Preferences and the Second Welfare Theorem 19

2025
[61]

npj Artificial Intelligence2(36) (2026)

Zomer, N., De Domenico, M.: Unraveling the emergence of collective behavior in networks of cognitive agents. npj Artificial Intelligence2(36) (2026)

2026

[1] [1]

American Economic Review 108(6), 1488–1542 (2018)

Acemoglu, D., Restrepo, P.: The race between man and machine: Implications of technology for growth, factor shares, and employment. American Economic Review 108(6), 1488–1542 (2018)

2018

[2] [2]

Concrete Problems in AI Safety

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in AI safety. arXiv:1606.06565 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[3] [3]

https://transformer-circuits.pub/2025/attribution-graphs/ methods.html (Mar 2025), transformer Circuits Thread

Anthropic Interpretability Team: Circuit tracing: Revealing computational graphs in language models. https://transformer-circuits.pub/2025/attribution-graphs/ methods.html (Mar 2025), transformer Circuits Thread

2025

[4] [4]

https://transformer-circuits.pub/2025/attribution-graphs/biology.html (Mar 2025), transformer Circuits Thread

Anthropic Interpretability Team: On the biology of a large language model. https://transformer-circuits.pub/2025/attribution-graphs/biology.html (Mar 2025), transformer Circuits Thread

2025

[5] [5]

Econometrica22(3), 265–290 (1954)

Arrow, K.J., Debreu, G.: Existence of an equilibrium for a competitive economy. Econometrica22(3), 265–290 (1954)

1954

[6] [6]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., et al.: Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv:2204.05862 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[7] [7]

Mechanistic Interpretability for AI Safety -- A Review

Bereska, L., Gavves, E.: Mechanistic interpretability for AI safety — a review. arXiv:2404.14082 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[8] [8]

Quarterly Journal of Economics124(1), 51–104 (2009)

Bernheim, B.D., Rangel, A.: Beyond revealed preference: Choice-theoretic foun- dations for behavioral welfare economics. Quarterly Journal of Economics124(1), 51–104 (2009)

2009

[9] [9]

Oxford University Press, Oxford (2024)

Birch, J.: The Edge of Sentience: Risk and Precaution in Humans, Other Animals, and AI. Oxford University Press, Oxford (2024)

2024

[10] [10]

On the Opportunities and Risks of Foundation Models

Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., et al.: On the opportunities and risks of foundation models. arXiv:2108.07258 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[11] [11]

Oxford University Press, Oxford (2014)

Bostrom, N.: Superintelligence: Paths, Dangers, Strategies. Oxford University Press, Oxford (2014)

2014

[12] [12]

NBER Working Paper (34256) (2025)

Brynjolfsson, E., Korinek, A., Agrawal, A.: A research agenda for the economics of transformative AI. NBER Working Paper (34256) (2025)

2025

[13] [13]

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

Butlin, P., Long, R., Elmoznino, E., Bengio, Y., Birch, J., Constant, A., et al.: Consciousness in artificial intelligence: Insights from the science of consciousness. arXiv:2308.08708 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[14] [14]

Carlsmith, J.: Scheming AIs: Will AIs fake alignment during training in order to get power? arXiv:2311.08379 (2023)

work page arXiv 2023

[15] [15]

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Casper, S., Davies, X., Shi, C., Gilbert, T.K., Scheurer, J., Rando, J., Freedman, R., Korbak, T., Lindner, D., Freire, P., et al.: Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv:2307.15217 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[16] [16]

Chalmers, D.J.: Could a large language model be conscious? arXiv:2303.07103 (2023)

work page arXiv 2023

[17] [17]

In: Advances in Neural Information Processing Systems 30 (NeurIPS) (2017)

Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences. In: Advances in Neural Information Processing Systems 30 (NeurIPS) (2017)

2017

[18] [18]

and Jacobs, Bob M

Conitzer, V., Freedman, R.A., Heitzig, J., Holliday, W.H., Jacobs, B.M., et al.: Social choice should guide AI alignment in dealing with diverse human feedback. arXiv:2404.10271 (2024)

work page arXiv 2024

[19] [19]

Journal of Public Economics59(2), 137–152 (1996) Superposed Preferences and the Second Welfare Theorem 17

Conley, J.P., Diamantaras, D.: Generalized samuelson conditions and welfare the- orems for nonsmooth economies. Journal of Public Economics59(2), 137–152 (1996) Superposed Preferences and the Second Welfare Theorem 17

1996

[20] [20]

Yale University Press, New Haven (1959)

Debreu, G.: Theory of Value: An Axiomatic Analysis of Economic Equilibrium. Yale University Press, New Haven (1959)

1959

[21] [21]

Noûs47(1), 104–134 (2013)

Dietrich, F., List, C.: A reason-based theory of rational choice. Noûs47(1), 104–134 (2013)

2013

[22] [22]

Toy Models of Superposition

Elhage, N., Hume, T., Olsson, C., Schiefer, N., Henighan, T., Kravec, S., Hatfield- Dodds, Z., Lasenby, R., Drain, D., Chen, C., et al.: Toy models of superposition. arXiv:2209.10652 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[23] [23]

Science Advances11(20), eadu9368 (2025)

Flint Ashery, Ariel, Aiello, L.M., Baronchelli, A.: Emergent social conventions and collective bias in llm populations. Science Advances11(20), eadu9368 (2025)

2025

[24] [24]

Economic Theory29(3), 549–564 (2006)

Florenzano, M., Gourdel, P., Jofré, A.: Supporting weakly pareto optimal allocations in infinite dimensional nonconvex economies. Economic Theory29(3), 549–564 (2006)

2006

[25] [25]

Minds and Machines30(3), 411–437 (2020)

Gabriel, I.: Artificial intelligence, values, and alignment. Minds and Machines30(3), 411–437 (2020)

2020

[26] [26]

In: International Conference on Learning Representations

Gao, L., Dupre la Tour, T., Tillman, H., Goh, G., Troll, R., Radford, A., Sutskever, I., Leike, J., Wu, J.: Scaling and evaluating sparse autoencoders. In: International Conference on Learning Representations. vol. 2025, pp. 26721–26754 (2025)

2025

[27] [27]

Quarterly Journal of Economics101(2), 229–264 (1986)

Greenwald, B.C., Stiglitz, J.E.: Externalities in economies with imperfect infor- mation and incomplete markets. Quarterly Journal of Economics101(2), 229–264 (1986)

1986

[28] [28]

American Economic Review78(2), 351–355 (1988)

Greenwald, B.C., Stiglitz, J.E.: Pareto inefficiency of market economies: Search and efficiency wage models. American Economic Review78(2), 351–355 (1988)

1988

[29] [29]

Advances in Mathematical Economics14, 93–126 (2010)

Habte, A., Mordukhovich, B.S.: Extended second welfare theorem for nonconvex economies with infinite commodities and public goods. Advances in Mathematical Economics14, 93–126 (2010)

2010

[30] [30]

In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society

Hadfield, G.K., Hadfield-Menell, D.: Incomplete contracting and AI alignment. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. pp. 417–422 (2019)

2019

[31] [31]

In: Advances in Neural Information Processing Systems 30 (NeurIPS) (2017)

Hadfield-Menell, D., Milli, S., Abbeel, P., Russell, S., Dragan, A.: Inverse reward design. In: Advances in Neural Information Processing Systems 30 (NeurIPS) (2017)

2017

[32] [32]

In: Advances in Neural Information Processing Systems 29 (NeurIPS) (2016)

Hadfield-Menell, D., Russell, S.J., Abbeel, P., Dragan, A.: Cooperative inverse reinforcement learning. In: Advances in Neural Information Processing Systems 29 (NeurIPS) (2016)

2016

[33] [33]

arXiv:2410.20526 (2024)

He, Z., Shu, W., Ge, X., Chen, L., Wang, J., Zhou, Y., Liu, F., Guo, Q., Huang, X., Wu, Z., et al.: Llama scope: Extracting millions of features from llama-3.1-8b with sparse autoencoders. arXiv:2410.20526 (2024)

work page arXiv 2024

[34] [34]

In: International Conference on Learning Representations

Huben, R., Cunningham, H., Smith, L., Ewart, A., Sharkey, L.: Sparse autoencoders find highly interpretable features in language models. In: International Conference on Learning Representations. vol. 2024, pp. 7827–7845 (2024)

2024

[35] [35]

Risks from Learned Optimization in Advanced Machine Learning Systems

Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J., Garrabrant, S.: Risks from learned optimization in advanced machine learning systems. arXiv:1906.01820 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1906

[36] [36]

Harvard University Press, Cambridge, MA (1985)

Karni, E.: Decision Making under Uncertainty: The Case of State-Dependent Preferences. Harvard University Press, Cambridge, MA (1985)

1985

[37] [37]

Quarterly Journal of Economics102(2), 223–241 (1987)

Khan, M.A., Vohra, R.: An extension of the second welfare theorem to economies with nonconvexities and public goods. Quarterly Journal of Economics102(2), 223–241 (1987)

1987

[38] [38]

NBER Working Paper (32980) (2024)

Korinek, A.: Economic policy challenges for the age of AI. NBER Working Paper (32980) (2024)

2024

[39] [39]

NBER Working Paper (24174) (2017) 18 E

Korinek, A., Stiglitz, J.E.: Artificial intelligence and its implications for income distribution and unemployment. NBER Working Paper (24174) (2017) 18 E. Perrier

2017

[40] [40]

arXiv:2403.12107 (2024)

Korinek, A., Suh, D.: Scenarios for the transition to AGI. arXiv:2403.12107 (2024)

work page arXiv 2024

[41] [41]

Economic Policy40(121), 225–256 (2025)

Korinek, A., Vipra, J.: Concentrating intelligence: scaling and market structure in artificial intelligence. Economic Policy40(121), 225–256 (2025)

2025

[42] [42]

In: Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

Lieberum, T., Rajamanoharan, S., Conmy, A., Smith, L., Sonnerat, N., Varma, V., Kramár, J., Dragan, A., Shah, R., Nanda, N.: Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2. In: Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP. pp. 278–300 (2024)

2024

[43] [43]

Advances in Neural Information Processing Systems38, 159269–159305 (2026)

Liu, Y., Liu, Z., Gore, J.: Superposition yields robust neural scaling. Advances in Neural Information Processing Systems38, 159269–159305 (2026)

2026

[44] [44]

arXiv:2411.00986 (2024)

Long, R., Sebo, J., Butlin, P., Finlinson, K., Fish, K., Harding, J., et al.: Taking AI welfare seriously. arXiv:2411.00986 (2024)

work page arXiv 2024

[45] [45]

Long, R., Sebo, J., Sims, T.: Is there a tension between AI safety and AI welfare? Philosophical Studies182(7), 2005–2033 (2025)

2005

[46] [46]

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Luo, J., Zhang, W., Yuan, Y., Zhao, Y., Yang, J., Gu, Y., Wu, B., Chen, B., Qiao, Z., Long, Q., Tu, R., Luo, X., Ju, W., Xiao, Z., Wang, Y., Xiao, M., Liu, C., Yuan, J., Zhang, S., Jin, Y., Zhang, F., Wu, X., Zhao, H., Tao, D., Yu, P.S., Zhang, M.: Large language model agent: A survey on methodology, applications and challenges. arXiv:2503.21460 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[47] [47]

Oxford University Press, New York (1995)

Mas-Colell, A., Whinston, M.D., Green, J.R.: Microeconomic Theory. Oxford University Press, New York (1995)

1995

[48] [48]

arXiv:2311.02462 (2023)

Morris, M.R., Sohl-Dickstein, J., Fiedel, N., Warkentin, T., Dafoe, A., Faust, A., Farabet, C., Legg, S.: Levels of agi for operationalizing progress on the path to agi. arXiv:2311.02462 (2023)

work page arXiv 2023

[49] [49]

Journal of Economic Theory145(1), 331–353 (2010)

Murty, S.: Externalities and fundamental nonconvexities: A reconciliation of ap- proaches to general equilibrium externality modeling and implications for decen- tralization. Journal of Economic Theory145(1), 331–353 (2010)

2010

[50] [50]

In: Proceed- ings of the Seventeenth International Conference on Machine Learning (ICML)

Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Proceed- ings of the Seventeenth International Conference on Machine Learning (ICML). pp. 663–670 (2000)

2000

[51] [51]

Oxford University Press, Oxford (1984)

Parfit, D.: Reasons and Persons. Oxford University Press, Oxford (1984)

1984

[52] [52]

Deconstructing Superintelligence: Identity, Self-Modification and Diff\'erance

Perrier, E.: Deconstructing Superintelligence: Identity, Self-modification and Dif- férance. arXiv:2604.19845 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026

[53] [53]

Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics

Perrier, E.: Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics. arXiv:2604.21216 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026

[54] [54]

Journal of Political Economy 78(1), 152–157 (1970)

Sen, A.K.: The impossibility of a Paretian liberal. Journal of Political Economy 78(1), 152–157 (1970)

1970

[55] [55]

Emotion Concepts and their Function in a Large Language Model

Sofroniew, N., Kauvar, I., Saunders, W., Chen, R., Henighan, T., Hydrie, S., Citro, C., Pearce, A., Tarng, J., Gurnee, W., et al.: Emotion concepts and their function in a large language model. arXiv:2604.07729 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026

[56] [56]

Journal of Economic Theory4(2), 180–199 (1972)

Starrett, D.A.: Fundamental nonconvexities in the theory of externalities. Journal of Economic Theory4(2), 180–199 (1972)

1972

[57] [57]

NBER Working Paper (31815) (2023)

Trammell, P., Korinek, A.: Economic growth under transformative AI. NBER Working Paper (31815) (2023)

2023

[58] [58]

Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Tran, K.T., Dao, D., Nguyen, M.D., Pham, Q.V., O’Sullivan, B., Nguyen, H.D.: Multi-agent collaboration mechanisms: A survey of llms. arXiv:2501.06322 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[59] [59]

arXiv:2311.01550 (2023)

Vipra, J., Korinek, A.: Market concentration implications of foundation models. arXiv:2311.01550 (2023)

work page arXiv 2023

[60] [60]

Science China Information Sciences68(2), 121101 (2025) Superposed Preferences and the Second Welfare Theorem 19

Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., Zhang, M., Wang, J., Jin, S., Zhou, E., et al.: The rise and potential of large language model based agents: A survey. Science China Information Sciences68(2), 121101 (2025) Superposed Preferences and the Second Welfare Theorem 19

2025

[61] [61]

npj Artificial Intelligence2(36) (2026)

Zomer, N., De Domenico, M.: Unraveling the emergence of collective behavior in networks of cognitive agents. npj Artificial Intelligence2(36) (2026)

2026