AI Tokenomics: The Economics of Tokens, Computation, and Pricing in Foundation Models
Pith reviewed 2026-06-27 09:57 UTC · model grok-4.3
The pith
Token expenditure in foundation models does not equal economic value because productivity, workflow position, and downstream effects determine worth.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Tokens serve as the practical accounting unit that connects information processing, computation, memory, energy, pricing, and economic value in foundation model services. The framework shows token expenditure and economic value are distinct quantities: value depends on marginal productivity, workflow position, hidden reasoning activity, risk, and downstream propagation effects rather than raw token counts alone.
What carries the argument
The AI tokenomics framework that treats tokens as the linking unit between technical costs and workflow-level production functions.
If this is right
- Pricing mechanisms for foundation model services should incorporate marginal productivity and workflow position rather than token volume alone.
- Enterprise resource allocation needs separate instrumentation for hidden reasoning activity and downstream propagation effects.
- Market design for token-based AI services must account for risk and uncertainty in value realization.
- Measurement methods must expand beyond visible token counts to capture hidden activity.
- Optimization of token use requires dynamic allocation rules tied to productivity estimates.
Where Pith is reading between the lines
- Enterprises could test the framework by running identical prompts at different workflow stages and comparing output value metrics against token spend.
- If value depends on propagation effects, token markets might develop secondary trading for high-productivity positions in multi-agent workflows.
- Calibration of productivity functions could lead to new contracts where payment is based on verified downstream outcomes rather than upfront token budgets.
Load-bearing premise
Tokens can serve as a sufficient accounting unit that directly connects information processing, computation, energy, pricing, and economic value across AI systems.
What would settle it
An empirical study that measures value created by different tasks and finds it correlates almost perfectly with raw token expenditure regardless of workflow position or hidden reasoning steps.
Figures
read the original abstract
Tokens have become the practical accounting unit for modern foundation model services, linking information processing, computation, memory use, energy expenditure, pricing, and economic value. This paper develops a framework for AI tokenomics: the study of how tokens are generated, consumed, priced, allocated, and optimized across AI systems. We connect token-level technical costs to workflow-level production functions, enterprise resource allocation, measurement and instrumentation methods, and emerging market-design questions. The framework shows that token expenditure and economic value are distinct: value depends on marginal productivity, workflow position, hidden reasoning activity, risk, and downstream propagation effects. The paper concludes by identifying open research directions in hidden-token measurement, empirical calibration, token productivity, dynamic allocation, and token-based markets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a conceptual framework for 'AI tokenomics' that positions tokens as the unifying accounting unit linking information processing, computation, memory, energy, pricing, and economic value in foundation models. It connects token-level technical costs to workflow-level production functions and enterprise allocation, and asserts that token expenditure and economic value are distinct, with value depending on marginal productivity, workflow position, hidden reasoning activity, risk, and downstream propagation effects. The paper concludes by listing open research directions in measurement, calibration, productivity, allocation, and markets.
Significance. If substantiated with formal models or data, the framework could help organize thinking about resource measurement and optimization in AI services by bridging technical costs and economic value. As presented, its primary contribution is conceptual mapping rather than new derivations, predictions, or measurements.
major comments (1)
- [Abstract] Abstract: The central claim that 'token expenditure and economic value are distinct' (with value depending on marginal productivity, hidden reasoning, etc.) is asserted at a definitional level but is not supported by any production function, formal model, illustrative calculation, or empirical example showing how these factors cause value to diverge from token counts.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The manuscript is a conceptual framework paper whose primary contribution is organizing concepts at the intersection of technical costs and economic value in foundation models. We address the major comment below and will revise accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that 'token expenditure and economic value are distinct' (with value depending on marginal productivity, hidden reasoning, etc.) is asserted at a definitional level but is not supported by any production function, formal model, illustrative calculation, or empirical example showing how these factors cause value to diverge from token counts.
Authors: We agree that the paper is conceptual rather than providing formal derivations or data. The distinction is introduced as a definitional feature of the tokenomics framework, following directly from the workflow-level production function and allocation concepts developed in the body (linking token costs to marginal productivity, hidden activity, risk, and propagation). No production function, calculation, or example is supplied. In revision we will add a short illustrative example in the abstract and main text demonstrating divergence (e.g., a workflow where hidden reasoning tokens yield higher downstream value than raw token count would predict). revision: yes
Circularity Check
No significant circularity; definitional framework only
full rationale
The paper advances a conceptual framework mapping token usage to economic ideas (production functions, marginal productivity, hidden reasoning) without any equations, fitted parameters, quantitative predictions, or theorems. The central distinction between token expenditure and economic value is asserted at the definitional level rather than derived. No self-citations, ansatzes, or uniqueness claims appear as load-bearing elements. The derivation chain is empty by inspection; the work is self-contained as an exploratory proposal with no internal reductions to its own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Acemoglu
D. Acemoglu. Harms of AI. Working Paper 29247, National Bureau of Economic Research,
-
[2]
Akash network: Decentralized cloud infrastructure.https://akash
Akash Network. Akash network: Decentralized cloud infrastructure.https://akash. network/whitepapers/, 2020. (Cited on p. 19)
2020
-
[3]
M. A. Al Bari and Q. Zhu. A gestalt game-theoretic framework for designing agentic AI workflows in cyber deception. InInternational Conference on Game Theory and AI for Security, pages 228–248, Cham, 2025. Springer Nature Switzerland. (Cited on pp. 20, 21, 22)
2025
-
[4]
Qwen API pricing.https://help.aliyun.com, 2026
Alibaba Cloud. Qwen API pricing.https://help.aliyun.com, 2026. Accessed: 2026-06-
2026
-
[5]
Claude API pricing.https://platform.claude.com/docs/en/ about-claude/pricing, 2026
Anthropic. Claude API pricing.https://platform.claude.com/docs/en/ about-claude/pricing, 2026. Accessed: 2026-06-09. (Cited on pp. 2, 5, 12, 13, 14)
2026
-
[6]
Context windows.https://platform.claude.com/docs/en/ build-with-claude/context-windows, 2026
Anthropic. Context windows.https://platform.claude.com/docs/en/ build-with-claude/context-windows, 2026. Accessed 2026-06-09. (Cited on pp. 10, 11)
2026
-
[7]
Plans and pricing.https://claude.com/pricing, 2026
Anthropic. Plans and pricing.https://claude.com/pricing, 2026. Accessed 2026-06-09. (Cited on p. 15)
2026
-
[8]
L. Bai, Z. Huang, X. Wang, J. Sun, R. Mihalcea, E. Brynjolfsson, A. Pentland, and J. Pei. How do AI agents spend your money? analyzing and predicting token consumption in agentic coding tasks.arXiv preprint arXiv:2604.22750, 2026. (Cited on pp. 10, 11, 12)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[9]
Eliciting Latent Predictions from Transformers with the Tuned Lens
N. Belrose, Z. Furman, L. Smith, D. Halawi, I. Ostrovsky, L. McKinney, S. Biderman, and J. Steinhardt. Eliciting latent predictions from transformers with the tuned lens.arXiv preprint arXiv:2303.08112, 2023. (Cited on p. 16) 35
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[10]
LLM pricing comparison dashboard.https://benchlm.ai/llm-pricing,
BenchLM. LLM pricing comparison dashboard.https://benchlm.ai/llm-pricing,
-
[11]
(Cited on p
Accessed: 2026-06-09. (Cited on p. 13)
2026
-
[12]
D. P . Bertsekas.Network Optimization: Continuous and Discrete Models. Athena Scientific,
-
[13]
(Cited on pp. 22, 24)
-
[14]
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P . Dhariwal, A. Neelakantan, P . Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. InAdvances in Neural Information Processing Systems, volume 33, pages 1877–1901, 2020. (Cited on pp. 1, 7, 14)
1901
-
[15]
Z. S. Chen and Q. Zhu. A theory of multilevel interactive equilibrium in NeuroAI.arXiv preprint arXiv:2605.10505, 2026. (Cited on p. 19)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[16]
LLM API cost comparison.https://costgoat.com/compare/llm-api, 2026
CostGoat. LLM API cost comparison.https://costgoat.com/compare/llm-api, 2026. Accessed: 2026-06-09. (Cited on p. 13)
2026
-
[17]
T. Dao, D. Y. Fu, S. Ermon, A. Rudra, and C. Ré. Flashattention: Fast and memory-efficient exact attention with io-awareness. InAdvances in Neural Information Processing Systems, volume 35, pages 16344–16359, 2022. (Cited on pp. 8, 16)
2022
-
[18]
Deepseek API pricing.https://platform.deepseek.com, 2026
DeepSeek AI. Deepseek API pricing.https://platform.deepseek.com, 2026. Accessed: 2026-06-09. (Cited on p. 13)
2026
-
[19]
Elhage, N
N. Elhage, N. Nanda, C. Olsson, T. Henighan, N. Joseph, B. Mann, A. Askell, Y. Bai, A. Chen, T. Conerly, et al. A mathematical framework for transformer circuits.https: //transformer-circuits.pub/2021/framework/index.html, 2021. Transformer Circuits Thread. (Cited on p. 16)
2021
-
[20]
Ghodsi, M
A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant re- source fairness: Fair allocation of multiple resource types. InProceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation, pages 323–336, 2011. (Cited on pp. 18, 19, 22)
2011
-
[21]
Gemini enterprise agent platform.https://cloud.google.com/ products/gemini-enterprise-agent-platform, 2026
Google Cloud. Gemini enterprise agent platform.https://cloud.google.com/ products/gemini-enterprise-agent-platform, 2026. Accessed 2026-06-09. (Cited on p. 15)
2026
-
[22]
Gemini API pricing.https://ai.google.dev/gemini-api/docs/ pricing, 2026
Google DeepMind. Gemini API pricing.https://ai.google.dev/gemini-api/docs/ pricing, 2026. Accessed: 2026-06-09. (Cited on pp. 2, 5, 12, 13, 14) 36
2026
-
[23]
Hoffmann, S
J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks, J. Welbl, A. Clark, et al. Training compute-optimal large language models. InAdvances in Neural Information Processing Systems, volume 35, pages 30016–30030, 2022. (Cited on pp. 4, 5, 9, 16, 23)
2022
-
[24]
Scaling Laws for Neural Language Models
J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Rad- ford, J. Wu, and D. Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020. (Cited on pp. 4, 5, 9, 16, 23)
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[25]
F. P . Kelly, A. K. Maulloo, and D. K. H. Tan. Rate control for communication networks: Shadow prices, proportional fairness and stability.Journal of the Operational Research Soci- ety, 49(3):237–252, 1998. (Cited on pp. 18, 19, 22, 24)
1998
-
[26]
Kudo and J
T. Kudo and J. Richardson. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71,
2018
-
[27]
3, 5, 7)
(Cited on pp. 3, 5, 7)
-
[28]
ChatOpenAI integration.https://docs.langchain.com/oss/python/ integrations/chat/openai, 2026
LangChain. ChatOpenAI integration.https://docs.langchain.com/oss/python/ integrations/chat/openai, 2026. Accessed 2026-06-09. (Cited on pp. 10, 11)
2026
-
[29]
Lewis, E
P . Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, and S. Riedel. Retrieval-augmented generation for knowledge- intensive NLP tasks. InAdvances in Neural Information Processing Systems, volume 33, pages 9459–9474, 2020. (Cited on pp. 7, 23)
2020
-
[30]
Merizzi, T
N. Merizzi, T. Smith, D. Kearns-Manolatos, N. Mittal, and G. Churiwala. The pivot to tokenomics: Navigating AI’s new spend dynamics. Industry report, Deloitte, Jan. 2026. Accessed January 2026. (Cited on p. 2)
2026
-
[31]
Llama models.https://ai.meta.com/llama, 2026
Meta AI. Llama models.https://ai.meta.com/llama, 2026. Accessed: 2026-06-09. (Cited on p. 13)
2026
-
[32]
Mistral API pricing.https://docs.mistral.ai, 2026
Mistral AI. Mistral API pricing.https://docs.mistral.ai, 2026. Accessed: 2026-06-09. (Cited on p. 13)
2026
-
[33]
R. B. Myerson. Optimal auction design.Mathematics of Operations Research, 6(1):58–73,
-
[34]
17, 18, 20) 37
(Cited on pp. 17, 18, 20) 37
-
[35]
Nisan, T
N. Nisan, T. Roughgarden, E. Tardos, and V . V . Vazirani, editors.Algorithmic Game Theory. Cambridge University Press, 2007. (Cited on pp. 17, 18, 20)
2007
-
[36]
Interpreting GPT: The logit lens.https://www.alignmentforum.org/ posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens, 2020
Nostalgebraist. Interpreting GPT: The logit lens.https://www.alignmentforum.org/ posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens, 2020. Alignment Fo- rum. (Cited on p. 16)
2020
-
[37]
API pricing.https://openai.com/api/pricing/, 2026
OpenAI. API pricing.https://openai.com/api/pricing/, 2026. Accessed: 2026-06-09. (Cited on pp. 2, 5, 12, 13, 14)
2026
-
[38]
Prompt engineering.https://developers.openai.com/api/docs/guides/ prompt-engineering, 2026
OpenAI. Prompt engineering.https://developers.openai.com/api/docs/guides/ prompt-engineering, 2026. Accessed 2026-06-09. (Cited on p. 10)
2026
-
[39]
J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P . Liang, and M. S. Bernstein. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and T echnology, pages 1–22, 2023. (Cited on p. 19)
2023
-
[40]
Filecoin: A decentralized storage network.https://filecoin.io/ filecoin.pdf, 2017
Protocol Labs. Filecoin: A decentralized storage network.https://filecoin.io/ filecoin.pdf, 2017. (Cited on p. 19)
2017
-
[41]
Radford, J
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language models are unsupervised multitask learners.OpenAI T echnical Report, 2019. (Cited on p. 1)
2019
-
[42]
V . J. Reddi, C. Cheng, D. Kanter, P . Mattson, G. Schmuelling, C.-J. Wu, B. Anderson, M. Breughe, M. Charlebois, W. Chou, et al. MLPerf inference benchmark. In2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture, pages 446–459,
-
[43]
Rochet and J
J.-C. Rochet and J. Tirole. Platform competition in two-sided markets.Journal of the Euro- pean Economic Association, 1(4):990–1029, 2003. (Cited on p. 20)
2003
-
[44]
Schick, J
T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, and T. Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems, volume 36, pages 68539–68551,
-
[45]
4, 10, 15, 19)
(Cited on pp. 4, 10, 15, 19)
-
[46]
Sennrich, B
R. Sennrich, B. Haddow, and A. Birch. Neural machine translation of rare words with subword units. InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 1715–1725, 2016. (Cited on pp. 3, 5, 7) 38
2016
-
[47]
Shoham and K
Y. Shoham and K. Leyton-Brown.Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, 2008. (Cited on p. 19)
2008
-
[48]
Strubell, A
E. Strubell, A. Ganesh, and A. McCallum. Energy and policy considerations for deep learning in NLP. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3645–3650, 2019. (Cited on p. 17)
2019
-
[49]
Topcuoglu, S
H. Topcuoglu, S. Hariri, and M.-Y. Wu. Performance-effective and low-complexity task scheduling for heterogeneous computing.IEEE T ransactions on Parallel and Distributed Sys- tems, 13(3):260–274, 2002. (Cited on p. 22)
2002
-
[50]
Vaswani, N
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Sys- tems, volume 30, 2017. (Cited on pp. 1, 8, 14, 16)
2017
-
[51]
X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou. Self-consistency improves chain of thought reasoning in language models. InInternational Conference on Learning Representations, 2023. (Cited on pp. 8, 10, 16)
2023
-
[52]
J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V . Le, and D. Zhou. Chain-of-thought prompting elicits reasoning in large language models. InAd- vances in Neural Information Processing Systems, volume 35, pages 24824–24837, 2022. (Cited on pp. 8, 10, 16)
2022
-
[53]
Wooldridge.An Introduction to MultiAgent Systems
M. Wooldridge.An Introduction to MultiAgent Systems. Wiley, 2 edition, 2009. (Cited on p. 19)
2009
-
[54]
C.-J. Wu, R. Raghavendra, U. Gupta, B. Acun, N. Ardalani, K. Maeng, G. Chang, F. A. Behram, J. Huang, C. Bai, et al. Sustainable AI: Environmental implications, challenges and opportunities. InProceedings of Machine Learning and Systems, volume 4, pages 795– 813, 2022. (Cited on p. 17)
2022
-
[55]
Grok API pricing.https://docs.x.ai, 2026
xAI. Grok API pricing.https://docs.x.ai, 2026. Accessed: 2026-06-09. (Cited on pp. 2, 5, 12, 13)
2026
-
[56]
Yang and Q
Y.-T. Yang and Q. Zhu. PACT: A contract-theoretic framework for pricing agentic AI ser- vices powered by large language models. InProceedings of the 2025 IEEE Global Communi- cations Conference (GLOBECOM), Taipei, Taiwan, 2025. IEEE. (Cited on pp. 15, 34) 39
2025
-
[57]
Y.-T. Yang and Q. Zhu. Internet of agentic AI: Incentive-compatible distributed teaming and workflow.arXiv preprint arXiv:2602.03145, 2026. (Cited on pp. 19, 22)
-
[58]
S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Conference on Learning Represen- tations, 2023. (Cited on pp. 4, 10, 15, 19)
2023
-
[59]
Glm API pricing.https://open.bigmodel.cn, 2026
Zhipu AI. Glm API pricing.https://open.bigmodel.cn, 2026. Accessed: 2026-06-09. (Cited on p. 13)
2026
- [60]
- [61]
-
[62]
Q. Zhu. Insurance of agentic AI.arXiv preprint arXiv:2606.05449, 2026. (Cited on p. 25)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[63]
Zhu and M
Q. Zhu and M. A. Al Bari. Agentic AI for cyber deception: A gestalt game-theoretic ap- proach to defending against botnet DDoS attacks. InProceedings of the 59th Hawaii Interna- tional Conference on System Sciences, 2026. (Cited on pp. 20, 21, 22)
2026
- [64]
-
[65]
19, 20, 22)
(Cited on pp. 19, 20, 22)
- [66]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.