Post-AGI Economies: Superposition and the Second Fundamental Theorem of Welfare Economics
Pith reviewed 2026-06-27 18:55 UTC · model grok-4.3
The pith
An autonomy-qualified Second Welfare Theorem shows Pareto optima remain decentralizable in post-AGI economies under joint conditions on rights, status, and verification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The autonomy-qualified Second Welfare Theorem states that an autonomy Pareto optimum remains certifiably decentralizable through prices and transfers when the joint conditions of convexity, stable moral status, non-fungible rights, welfare selection, non-manipulation, governed self-modification, and verification all hold, even though autonomy rights, self-modification, identity continuity, and superposed preferences need not behave as commodities or sustain a stable welfare relation.
What carries the argument
The autonomy-qualified Second Welfare Theorem, which lists the joint conditions required to certify that an autonomy Pareto optimum can still be supported by prices and transfers.
If this is right
- Decentralization via prices remains possible for autonomy Pareto optima once the full set of joint conditions is satisfied.
- Economic preference superposition functions as a hypothesis about context-indexed choice that is kept separate from neural feature superposition.
- Autonomy rights must be treated as non-fungible for the decentralization result to hold.
- Verification and governed self-modification become necessary to maintain the supporting price system.
Where Pith is reading between the lines
- The theorem may imply that policy design in advanced AI economies should prioritize verifiable rights structures over classical convexity alone.
- If the conditions prove hard to satisfy simultaneously, the result would point toward hybrid mechanisms that combine limited central oversight with price signals.
- The distinction between economic and neural superposition suggests separate modeling tracks for preference formation versus internal representation.
Load-bearing premise
The listed joint conditions on moral status, rights, self-modification, and verification can be satisfied simultaneously, verified independently, and suffice to restore certifiable decentralization after classical commodity and stable-welfare assumptions have already failed.
What would settle it
A concrete counterexample in which the joint conditions hold yet no price-and-transfer scheme decentralizes the autonomy Pareto optimum, or an explicit construction showing that the conditions cannot be met together while preserving the supporting hyperplane.
read the original abstract
The classical Second Welfare Theorem decentralizes any Pareto efficient allocation through prices and transfers under convexity and regularity. In post AGI economies, autonomy rights, self-modification, identity continuity, and superposed preferences need not behave as commodities or define a stable welfare relation, so this reduction may fail even when a supporting hyperplane exists. We give an autonomy-qualified Second Welfare Theorem stating the joint conditions convexity, stable moral status, non-fungible rights, welfare selection, non manipulation, governed self modification, and verification under which an autonomy Pareto optimum remains certifiably decentralizable, distinguishing economic preference superposition, a hypothesis about context-indexed choice, from neural feature superposition.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that the classical Second Welfare Theorem fails to decentralize Pareto optima in post-AGI economies because autonomy rights, self-modification, identity continuity, and superposed preferences do not behave as commodities or yield stable welfare relations. It states an autonomy-qualified Second Welfare Theorem under the joint conditions convexity, stable moral status, non-fungible rights, welfare selection, non-manipulation, governed self-modification, and verification, under which an autonomy Pareto optimum remains certifiably decentralizable, while distinguishing economic preference superposition (context-indexed choice) from neural feature superposition.
Significance. If substantiated with a derivation, the result would extend welfare economics to economies containing autonomous AI agents whose preferences are context-dependent and whose rights are non-fungible. No machine-checked proofs, reproducible code, parameter-free derivations, or falsifiable predictions are supplied.
major comments (1)
- [Abstract] Abstract: the manuscript asserts the existence of the autonomy-qualified Second Welfare Theorem and enumerates the seven joint conditions under which an autonomy Pareto optimum remains certifiably decentralizable, but supplies no derivation, proof sketch, formal construction, or argument showing that these conditions are jointly sufficient (or even simultaneously satisfiable) once the classical commodity and stable-welfare assumptions have been dropped.
minor comments (1)
- The distinction between economic preference superposition and neural feature superposition is introduced but never connected to the listed conditions or to any welfare-economics argument.
Simulated Author's Rebuttal
We thank the referee for identifying the central gap in our presentation. The manuscript is a short conceptual note that states an autonomy-qualified version of the Second Welfare Theorem by enumerating seven joint conditions; it does not contain a formal derivation. We address the referee's observation directly below and indicate how we will revise.
read point-by-point responses
-
Referee: [Abstract] Abstract: the manuscript asserts the existence of the autonomy-qualified Second Welfare Theorem and enumerates the seven joint conditions under which an autonomy Pareto optimum remains certifiably decentralizable, but supplies no derivation, proof sketch, formal construction, or argument showing that these conditions are jointly sufficient (or even simultaneously satisfiable) once the classical commodity and stable-welfare assumptions have been dropped.
Authors: The referee is correct. The abstract (and the body) simply list the seven conditions and assert that, under their joint satisfaction, an autonomy Pareto optimum remains decentralizable. No argument is supplied showing that the conditions can be satisfied simultaneously or that they are jointly sufficient once the standard commodity and stable-welfare assumptions are removed. Because the paper offers only a statement rather than a derivation, the claim remains conjectural. We will add a concise proof sketch in the revised manuscript that (i) recalls the classical supporting-hyperplane argument, (ii) shows where each autonomy qualification replaces a classical assumption, and (iii) verifies that the seven conditions close the argument without circularity. We will also make explicit that simultaneous satisfiability is an open modeling question left for future work rather than a proven fact. revision: yes
Circularity Check
Autonomy-qualified Second Welfare Theorem defined directly by its own novel conditions without derivation or external grounding
specific steps
-
self definitional
[Abstract]
"We give an autonomy-qualified Second Welfare Theorem stating the joint conditions convexity, stable moral status, non-fungible rights, welfare selection, non manipulation, governed self modification, and verification under which an autonomy Pareto optimum remains certifiably decentralizable, distinguishing economic preference superposition, a hypothesis about context-indexed choice, from neural feature superposition."
The theorem is asserted to hold under these newly coined conditions, but the paper supplies no independent argument or construction demonstrating sufficiency. The 'theorem' therefore reduces to a restatement of the paper's own list of conditions rather than a derived result from prior theorems or external benchmarks.
full rationale
The paper's central claim is an 'autonomy-qualified Second Welfare Theorem' that holds precisely under a list of conditions (stable moral status, non-fungible rights, governed self-modification, etc.) that are introduced and defined within the paper. No derivation, proof sketch, or reduction to classical welfare theorems is supplied showing why these conditions jointly suffice for decentralization once commodity and stable welfare assumptions fail. The theorem is therefore equivalent to its own definitional inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math The classical Second Welfare Theorem decentralizes Pareto efficient allocations under convexity and regularity.
- domain assumption Autonomy rights, self-modification, identity continuity, and superposed preferences need not behave as commodities or define a stable welfare relation in post-AGI economies.
invented entities (3)
-
autonomy-qualified Second Welfare Theorem
no independent evidence
-
autonomy Pareto optimum
no independent evidence
-
economic preference superposition
no independent evidence
Reference graph
Works this paper leans on
-
[1]
American Economic Review 108(6), 1488–1542 (2018)
Acemoglu, D., Restrepo, P.: The race between man and machine: Implications of technology for growth, factor shares, and employment. American Economic Review 108(6), 1488–1542 (2018)
2018
-
[2]
Concrete Problems in AI Safety
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in AI safety. arXiv:1606.06565 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[3]
https://transformer-circuits.pub/2025/attribution-graphs/ methods.html (Mar 2025), transformer Circuits Thread
Anthropic Interpretability Team: Circuit tracing: Revealing computational graphs in language models. https://transformer-circuits.pub/2025/attribution-graphs/ methods.html (Mar 2025), transformer Circuits Thread
2025
-
[4]
https://transformer-circuits.pub/2025/attribution-graphs/biology.html (Mar 2025), transformer Circuits Thread
Anthropic Interpretability Team: On the biology of a large language model. https://transformer-circuits.pub/2025/attribution-graphs/biology.html (Mar 2025), transformer Circuits Thread
2025
-
[5]
Econometrica22(3), 265–290 (1954)
Arrow, K.J., Debreu, G.: Existence of an equilibrium for a competitive economy. Econometrica22(3), 265–290 (1954)
1954
-
[6]
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., et al.: Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv:2204.05862 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[7]
Mechanistic Interpretability for AI Safety -- A Review
Bereska, L., Gavves, E.: Mechanistic interpretability for AI safety — a review. arXiv:2404.14082 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[8]
Quarterly Journal of Economics124(1), 51–104 (2009)
Bernheim, B.D., Rangel, A.: Beyond revealed preference: Choice-theoretic foun- dations for behavioral welfare economics. Quarterly Journal of Economics124(1), 51–104 (2009)
2009
-
[9]
Oxford University Press, Oxford (2024)
Birch, J.: The Edge of Sentience: Risk and Precaution in Humans, Other Animals, and AI. Oxford University Press, Oxford (2024)
2024
-
[10]
On the Opportunities and Risks of Foundation Models
Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., et al.: On the opportunities and risks of foundation models. arXiv:2108.07258 (2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[11]
Oxford University Press, Oxford (2014)
Bostrom, N.: Superintelligence: Paths, Dangers, Strategies. Oxford University Press, Oxford (2014)
2014
-
[12]
NBER Working Paper (34256) (2025)
Brynjolfsson, E., Korinek, A., Agrawal, A.: A research agenda for the economics of transformative AI. NBER Working Paper (34256) (2025)
2025
-
[13]
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Butlin, P., Long, R., Elmoznino, E., Bengio, Y., Birch, J., Constant, A., et al.: Consciousness in artificial intelligence: Insights from the science of consciousness. arXiv:2308.08708 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [14]
-
[15]
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Casper, S., Davies, X., Shi, C., Gilbert, T.K., Scheurer, J., Rando, J., Freedman, R., Korbak, T., Lindner, D., Freire, P., et al.: Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv:2307.15217 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [16]
-
[17]
In: Advances in Neural Information Processing Systems 30 (NeurIPS) (2017)
Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences. In: Advances in Neural Information Processing Systems 30 (NeurIPS) (2017)
2017
-
[18]
Conitzer, V., Freedman, R.A., Heitzig, J., Holliday, W.H., Jacobs, B.M., et al.: Social choice should guide AI alignment in dealing with diverse human feedback. arXiv:2404.10271 (2024)
-
[19]
Journal of Public Economics59(2), 137–152 (1996) Superposed Preferences and the Second Welfare Theorem 17
Conley, J.P., Diamantaras, D.: Generalized samuelson conditions and welfare the- orems for nonsmooth economies. Journal of Public Economics59(2), 137–152 (1996) Superposed Preferences and the Second Welfare Theorem 17
1996
-
[20]
Yale University Press, New Haven (1959)
Debreu, G.: Theory of Value: An Axiomatic Analysis of Economic Equilibrium. Yale University Press, New Haven (1959)
1959
-
[21]
Noûs47(1), 104–134 (2013)
Dietrich, F., List, C.: A reason-based theory of rational choice. Noûs47(1), 104–134 (2013)
2013
-
[22]
Elhage, N., Hume, T., Olsson, C., Schiefer, N., Henighan, T., Kravec, S., Hatfield- Dodds, Z., Lasenby, R., Drain, D., Chen, C., et al.: Toy models of superposition. arXiv:2209.10652 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[23]
Science Advances11(20), eadu9368 (2025)
Flint Ashery, Ariel, Aiello, L.M., Baronchelli, A.: Emergent social conventions and collective bias in llm populations. Science Advances11(20), eadu9368 (2025)
2025
-
[24]
Economic Theory29(3), 549–564 (2006)
Florenzano, M., Gourdel, P., Jofré, A.: Supporting weakly pareto optimal allocations in infinite dimensional nonconvex economies. Economic Theory29(3), 549–564 (2006)
2006
-
[25]
Minds and Machines30(3), 411–437 (2020)
Gabriel, I.: Artificial intelligence, values, and alignment. Minds and Machines30(3), 411–437 (2020)
2020
-
[26]
In: International Conference on Learning Representations
Gao, L., Dupre la Tour, T., Tillman, H., Goh, G., Troll, R., Radford, A., Sutskever, I., Leike, J., Wu, J.: Scaling and evaluating sparse autoencoders. In: International Conference on Learning Representations. vol. 2025, pp. 26721–26754 (2025)
2025
-
[27]
Quarterly Journal of Economics101(2), 229–264 (1986)
Greenwald, B.C., Stiglitz, J.E.: Externalities in economies with imperfect infor- mation and incomplete markets. Quarterly Journal of Economics101(2), 229–264 (1986)
1986
-
[28]
American Economic Review78(2), 351–355 (1988)
Greenwald, B.C., Stiglitz, J.E.: Pareto inefficiency of market economies: Search and efficiency wage models. American Economic Review78(2), 351–355 (1988)
1988
-
[29]
Advances in Mathematical Economics14, 93–126 (2010)
Habte, A., Mordukhovich, B.S.: Extended second welfare theorem for nonconvex economies with infinite commodities and public goods. Advances in Mathematical Economics14, 93–126 (2010)
2010
-
[30]
In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society
Hadfield, G.K., Hadfield-Menell, D.: Incomplete contracting and AI alignment. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. pp. 417–422 (2019)
2019
-
[31]
In: Advances in Neural Information Processing Systems 30 (NeurIPS) (2017)
Hadfield-Menell, D., Milli, S., Abbeel, P., Russell, S., Dragan, A.: Inverse reward design. In: Advances in Neural Information Processing Systems 30 (NeurIPS) (2017)
2017
-
[32]
In: Advances in Neural Information Processing Systems 29 (NeurIPS) (2016)
Hadfield-Menell, D., Russell, S.J., Abbeel, P., Dragan, A.: Cooperative inverse reinforcement learning. In: Advances in Neural Information Processing Systems 29 (NeurIPS) (2016)
2016
-
[33]
He, Z., Shu, W., Ge, X., Chen, L., Wang, J., Zhou, Y., Liu, F., Guo, Q., Huang, X., Wu, Z., et al.: Llama scope: Extracting millions of features from llama-3.1-8b with sparse autoencoders. arXiv:2410.20526 (2024)
-
[34]
In: International Conference on Learning Representations
Huben, R., Cunningham, H., Smith, L., Ewart, A., Sharkey, L.: Sparse autoencoders find highly interpretable features in language models. In: International Conference on Learning Representations. vol. 2024, pp. 7827–7845 (2024)
2024
-
[35]
Risks from Learned Optimization in Advanced Machine Learning Systems
Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J., Garrabrant, S.: Risks from learned optimization in advanced machine learning systems. arXiv:1906.01820 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[36]
Harvard University Press, Cambridge, MA (1985)
Karni, E.: Decision Making under Uncertainty: The Case of State-Dependent Preferences. Harvard University Press, Cambridge, MA (1985)
1985
-
[37]
Quarterly Journal of Economics102(2), 223–241 (1987)
Khan, M.A., Vohra, R.: An extension of the second welfare theorem to economies with nonconvexities and public goods. Quarterly Journal of Economics102(2), 223–241 (1987)
1987
-
[38]
NBER Working Paper (32980) (2024)
Korinek, A.: Economic policy challenges for the age of AI. NBER Working Paper (32980) (2024)
2024
-
[39]
NBER Working Paper (24174) (2017) 18 E
Korinek, A., Stiglitz, J.E.: Artificial intelligence and its implications for income distribution and unemployment. NBER Working Paper (24174) (2017) 18 E. Perrier
2017
-
[40]
Korinek, A., Suh, D.: Scenarios for the transition to AGI. arXiv:2403.12107 (2024)
-
[41]
Economic Policy40(121), 225–256 (2025)
Korinek, A., Vipra, J.: Concentrating intelligence: scaling and market structure in artificial intelligence. Economic Policy40(121), 225–256 (2025)
2025
-
[42]
In: Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Lieberum, T., Rajamanoharan, S., Conmy, A., Smith, L., Sonnerat, N., Varma, V., Kramár, J., Dragan, A., Shah, R., Nanda, N.: Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2. In: Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP. pp. 278–300 (2024)
2024
-
[43]
Advances in Neural Information Processing Systems38, 159269–159305 (2026)
Liu, Y., Liu, Z., Gore, J.: Superposition yields robust neural scaling. Advances in Neural Information Processing Systems38, 159269–159305 (2026)
2026
-
[44]
Long, R., Sebo, J., Butlin, P., Finlinson, K., Fish, K., Harding, J., et al.: Taking AI welfare seriously. arXiv:2411.00986 (2024)
-
[45]
Long, R., Sebo, J., Sims, T.: Is there a tension between AI safety and AI welfare? Philosophical Studies182(7), 2005–2033 (2025)
2005
-
[46]
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
Luo, J., Zhang, W., Yuan, Y., Zhao, Y., Yang, J., Gu, Y., Wu, B., Chen, B., Qiao, Z., Long, Q., Tu, R., Luo, X., Ju, W., Xiao, Z., Wang, Y., Xiao, M., Liu, C., Yuan, J., Zhang, S., Jin, Y., Zhang, F., Wu, X., Zhao, H., Tao, D., Yu, P.S., Zhang, M.: Large language model agent: A survey on methodology, applications and challenges. arXiv:2503.21460 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[47]
Oxford University Press, New York (1995)
Mas-Colell, A., Whinston, M.D., Green, J.R.: Microeconomic Theory. Oxford University Press, New York (1995)
1995
-
[48]
Morris, M.R., Sohl-Dickstein, J., Fiedel, N., Warkentin, T., Dafoe, A., Faust, A., Farabet, C., Legg, S.: Levels of agi for operationalizing progress on the path to agi. arXiv:2311.02462 (2023)
-
[49]
Journal of Economic Theory145(1), 331–353 (2010)
Murty, S.: Externalities and fundamental nonconvexities: A reconciliation of ap- proaches to general equilibrium externality modeling and implications for decen- tralization. Journal of Economic Theory145(1), 331–353 (2010)
2010
-
[50]
In: Proceed- ings of the Seventeenth International Conference on Machine Learning (ICML)
Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Proceed- ings of the Seventeenth International Conference on Machine Learning (ICML). pp. 663–670 (2000)
2000
-
[51]
Oxford University Press, Oxford (1984)
Parfit, D.: Reasons and Persons. Oxford University Press, Oxford (1984)
1984
-
[52]
Deconstructing Superintelligence: Identity, Self-Modification and Diff\'erance
Perrier, E.: Deconstructing Superintelligence: Identity, Self-modification and Dif- férance. arXiv:2604.19845 (2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[53]
Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics
Perrier, E.: Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics. arXiv:2604.21216 (2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[54]
Journal of Political Economy 78(1), 152–157 (1970)
Sen, A.K.: The impossibility of a Paretian liberal. Journal of Political Economy 78(1), 152–157 (1970)
1970
-
[55]
Emotion Concepts and their Function in a Large Language Model
Sofroniew, N., Kauvar, I., Saunders, W., Chen, R., Henighan, T., Hydrie, S., Citro, C., Pearce, A., Tarng, J., Gurnee, W., et al.: Emotion concepts and their function in a large language model. arXiv:2604.07729 (2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[56]
Journal of Economic Theory4(2), 180–199 (1972)
Starrett, D.A.: Fundamental nonconvexities in the theory of externalities. Journal of Economic Theory4(2), 180–199 (1972)
1972
-
[57]
NBER Working Paper (31815) (2023)
Trammell, P., Korinek, A.: Economic growth under transformative AI. NBER Working Paper (31815) (2023)
2023
-
[58]
Multi-Agent Collaboration Mechanisms: A Survey of LLMs
Tran, K.T., Dao, D., Nguyen, M.D., Pham, Q.V., O’Sullivan, B., Nguyen, H.D.: Multi-agent collaboration mechanisms: A survey of llms. arXiv:2501.06322 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[59]
Vipra, J., Korinek, A.: Market concentration implications of foundation models. arXiv:2311.01550 (2023)
-
[60]
Science China Information Sciences68(2), 121101 (2025) Superposed Preferences and the Second Welfare Theorem 19
Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., Zhang, M., Wang, J., Jin, S., Zhou, E., et al.: The rise and potential of large language model based agents: A survey. Science China Information Sciences68(2), 121101 (2025) Superposed Preferences and the Second Welfare Theorem 19
2025
-
[61]
npj Artificial Intelligence2(36) (2026)
Zomer, N., De Domenico, M.: Unraveling the emergence of collective behavior in networks of cognitive agents. npj Artificial Intelligence2(36) (2026)
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.