Data-Driven Automation
Pith reviewed 2026-06-27 13:56 UTC · model grok-4.3
The pith
Data-driven automation with endogenous capital accumulation generates explosive growth but stagnant long-run wages.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
With endogenous capital accumulation, data-driven automation generates explosive growth but stagnant long-run wages. The economy is generically inefficient.
What carries the argument
Dynamic model of endogenous data accumulation with task-specific heterogeneity and cross-task spillovers that simultaneously raises productivity and expands the automation frontier.
If this is right
- Tight conditions determine whether the economy is partially or fully automated in the long run.
- Full automation features slow long-run dynamics with labor share decaying as a power law in time.
- The decentralized economy fails to achieve the efficient direction of data accumulation.
- Data plays a dual role in augmenting existing automated tasks and pushing the frontier.
Where Pith is reading between the lines
- Regulations on data sharing could alter the rate at which the automation frontier expands.
- The combination of rapid growth and wage stagnation points to potential distributional conflicts not addressed by market forces.
- Extensions might incorporate heterogeneity in firm data access to study concentration effects.
Load-bearing premise
Data is heterogeneous and task-specific, accumulates endogenously as a byproduct of economic activity, and exhibits spillovers such that data generated by one task can augment the productivity of another.
What would settle it
Tracking the time path of labor's share of tasks in highly automated sectors to check for asymptotic power-law decay, or measuring whether output growth accelerates while wages stagnate under widespread data-driven automation.
Figures
read the original abstract
We build a dynamic model of data-driven automation in which data (i) is heterogeneous and task-specific; (ii) accumulates endogenously as a byproduct of economic activity; and (iii) exhibits spillovers such that data generated by one task can augment the productivity of another. Along the transition path of automation, data plays a dual role in simultaneously augmenting the productivity of already-automated tasks and expanding the automation frontier. We derive tight conditions for the economy to be partially versus fully automated in the long-run. In the latter case, automation exhibits rich short-run dynamics that depend on the pattern of data spillovers but is always slow in the long-run: the share of tasks produced by labor decays asymptotically as a power law in time. We show that the economy is generically inefficient and analyze how a planner optimally tilts the direction of data accumulation. With endogenous capital accumulation, data-driven automation generates explosive growth but stagnant long-run wages.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a dynamic model of data-driven automation in which data is heterogeneous and task-specific, accumulates endogenously as a byproduct of production, and features spillovers across tasks. Data augments productivity of automated tasks while expanding the automation frontier. The model derives conditions distinguishing partial from full long-run automation; under full automation, labor's task share decays as a power law asymptotically (with short-run dynamics depending on spillover patterns). The decentralized equilibrium is generically inefficient, and a planner can improve outcomes by tilting data accumulation. With endogenous capital accumulation, the economy exhibits explosive growth but stagnant long-run wages.
Significance. If the derivations hold, the framework supplies a microfounded mechanism for the decoupling of productivity growth from wage growth via endogenous data and spillovers, while delivering falsifiable predictions such as power-law labor-share decay and generic inefficiency. The dual role of data and the explicit treatment of directionality under a planner are strengths that could inform policy analysis of data-driven automation.
major comments (2)
- [Abstract] Abstract: the claim that 'tight conditions' for partial versus full automation are derived cannot be verified because the model equations, state variables, and functional forms governing data accumulation and spillovers are not provided; without these it is impossible to confirm whether the partial/full distinction follows from the three posited data properties or requires additional restrictions.
- [Abstract] Abstract: the statement that 'the share of tasks produced by labor decays asymptotically as a power law in time' under full automation is presented as following directly from the spillover structure, but no derivation, functional-form assumptions, or limiting argument is visible to assess whether the exponent is pinned down by primitives or depends on normalization choices.
Simulated Author's Rebuttal
We thank the referee for their comments. We address the two major comments on the abstract below. The abstract provides a summary of results that are fully derived in the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'tight conditions' for partial versus full automation cannot be verified because the model equations, state variables, and functional forms governing data accumulation and spillovers are not provided; without these it is impossible to confirm whether the partial/full distinction follows from the three posited data properties or requires additional restrictions.
Authors: The abstract is a concise summary and does not include technical details such as equations. The model is fully specified in the main text, with data accumulation governed by an endogenous law of motion that incorporates task-specific heterogeneity and spillovers. The tight conditions for partial versus full automation are derived directly from these three properties in Section 2 without additional restrictions, as formalized in the propositions there. The referee's observation is correct that the abstract alone does not allow verification, but the manuscript does. revision: no
-
Referee: [Abstract] Abstract: the statement that 'the share of tasks produced by labor decays asymptotically as a power law in time' under full automation is presented as following directly from the spillover structure, but no derivation, functional-form assumptions, or limiting argument is visible to assess whether the exponent is pinned down by primitives or depends on normalization choices.
Authors: The asymptotic power-law decay is established in Section 3 through analysis of the dynamic system under full automation. The functional form of spillovers is specified as a general spillover function, and the limiting behavior is derived using standard techniques for asymptotic analysis of the resulting ODE system. The exponent is determined by the spectral properties of the spillover structure, which are primitives of the model, and is independent of normalization choices. Short-run dynamics vary with the specific spillover pattern as stated. revision: no
Circularity Check
No significant circularity; derivations self-contained from model assumptions
full rationale
The paper constructs a dynamic model from explicit assumptions on data (heterogeneous/task-specific, endogenous accumulation as byproduct, spillovers across tasks) plus endogenous capital accumulation, then derives long-run automation conditions, power-law labor share decay, generic inefficiency, planner's optimal data direction, and explosive growth with stagnant wages. These follow from the posited primitives and transition dynamics without any quoted reduction of a 'prediction' or 'result' to a fitted parameter, self-citation chain, or definitional equivalence. No load-bearing step is shown to collapse by construction to its inputs.
Axiom & Free-Parameter Ledger
axioms (3)
- domain assumption Data is heterogeneous and task-specific.
- domain assumption Data accumulates endogenously as a byproduct of economic activity.
- domain assumption Data exhibits spillovers such that data generated by one task can augment the productivity of another.
Reference graph
Works this paper leans on
-
[1]
2006 , publisher=
Nonlinear dispersive equations: local and global analysis , author=. 2006 , publisher=
2006
-
[2]
Reuters , year =
Paul, Katie and Horwitz, Jeff , title =. Reuters , year =
-
[3]
2025 , institution=
Past Automation and Future AI: How Weak Links Tame the Growth Explosion , author=. 2025 , institution=
2025
-
[4]
1995 , publisher=
Monotone dynamical systems: an introduction to the theory of competitive and cooperative systems: an introduction to the theory of competitive and cooperative systems , author=. 1995 , publisher=
1995
-
[5]
arXiv preprint arXiv:2001.08361 , year=
Scaling laws for neural language models , author=. arXiv preprint arXiv:2001.08361 , year=
Pith/arXiv arXiv 2001
-
[6]
arXiv preprint arXiv:2211.04325 , year=
Will we run out of data? Limits of LLM scaling based on human-generated data , author=. arXiv preprint arXiv:2211.04325 , year=
-
[7]
Thatcher: Notes on trade in the presence of dynamic scale economies , author=
The narrow moving band, the Dutch disease, and the competitive consequences of Mrs. Thatcher: Notes on trade in the presence of dynamic scale economies , author=. Journal of development Economics , volume=. 1987 , publisher=
1987
-
[8]
Journal of political Economy , volume=
Learning-by-doing spillovers in the semiconductor industry , author=. Journal of political Economy , volume=. 1994 , publisher=
1994
-
[9]
The quarterly journal of economics , volume=
Learning by doing and the dynamic effects of international trade , author=. The quarterly journal of economics , volume=. 1991 , publisher=
1991
-
[10]
The Economic Impacts of Generative AI on the Structure of Work , author=
-
[11]
2025 , institution=
Workflows and Automation , author=. 2025 , institution=
2025
-
[12]
2024 , publisher=
Market power in artificial intelligence , author=. 2024 , publisher=
2024
-
[13]
2021 , institution=
A model of the data economy , author=. 2021 , institution=
2021
-
[14]
American Economic Review , volume=
Nonrivalry and the Economics of Data , author=. American Economic Review , volume=. 2020 , publisher=
2020
-
[15]
The Bell Journal of Economics , pages=
Informational economies of scale , author=. The Bell Journal of Economics , pages=. 1975 , publisher=
1975
-
[16]
NBER Handbook on the Economics of Transformative AI , pages=
An Economy of AI Agents , author=. NBER Handbook on the Economics of Transformative AI , pages=
-
[17]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
Representation similarity analysis for efficient task taxonomy & transfer learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[18]
Advances in Mathematics , volume=
Convergent sequences of dense graphs I: Subgraph frequencies, metric properties and testing , author=. Advances in Mathematics , volume=. 2008 , publisher=
2008
-
[19]
2024 , institution=
Scenarios for the Transition to AGI , author=. 2024 , institution=
2024
-
[20]
Company Wants to Take Your Job , author=
This A.I. Company Wants to Take Your Job , author=. 2025 , url=
2025
-
[21]
2025 , url=
How Anthropic and OpenAI Are Developing AI ‘Co-Workers’ , author=. 2025 , url=
2025
-
[22]
2025 , month =
The Big Tech Oligarchs’ War Against Workers:. 2025 , month =
2025
-
[23]
2025 , journal=
Labor as Capital: AI and the Ownership of Expertise , author=. 2025 , journal=
2025
-
[24]
The Quarterly Journal of Economics , volume=
Workers, machines, and economic growth , author=. The Quarterly Journal of Economics , volume=. 1998 , publisher=
1998
-
[25]
The Review of Economic Studies , volume=
Contagion , author=. The Review of Economic Studies , volume=. 2000 , publisher=
2000
-
[26]
2025 , journal=
Economic growth under transformative AI , author=. 2025 , journal=
2025
-
[27]
arXiv preprint arXiv:2502.12309 , year=
Eigenvalues in microeconomics , author=. arXiv preprint arXiv:2502.12309 , year=
-
[28]
2025 , institution=
We Won’t be Missed: Work and Growth in the AGI World , author=. 2025 , institution=
2025
-
[29]
Business Cycle Theory, Part II Volume 5 , pages=
An essay in dynamic theory , author=. Business Cycle Theory, Part II Volume 5 , pages=. 2024 , publisher=
2024
-
[30]
Econometrica, Journal of the Econometric Society , pages=
Capital expansion, rate of growth, and employment , author=. Econometrica, Journal of the Econometric Society , pages=. 1946 , publisher=
1946
-
[31]
Advances in Neural Information Processing Systems , volume=
Consent in crisis: The rapid decline of the ai data commons , author=. Advances in Neural Information Processing Systems , volume=
-
[32]
The review of economic studies , volume=
Directed technical change , author=. The review of economic studies , volume=. 2002 , publisher=
2002
-
[33]
1932 , publisher=
The theory of wages , author=. 1932 , publisher=
1932
-
[34]
arXiv preprint arXiv:2503.14499 , year=
Measuring AI Ability to Complete Long Tasks , author=. arXiv preprint arXiv:2503.14499 , year=
-
[35]
arXiv preprint arXiv:2407.12998 , year=
Surgical robot transformer (srt): Imitation learning for surgical tasks , author=. arXiv preprint arXiv:2407.12998 , year=
-
[36]
arXiv preprint arXiv:2403.09162 , year=
Unveiling the generalization power of fine-tuned large language models , author=. arXiv preprint arXiv:2403.09162 , year=
-
[37]
arXiv preprint arXiv:2009.03300 , year=
Measuring massive multitask language understanding , author=. arXiv preprint arXiv:2009.03300 , year=
Pith/arXiv arXiv 2009
-
[38]
Nature , volume=
Health system-scale language models are all-purpose prediction engines , author=. Nature , volume=. 2023 , publisher=
2023
-
[39]
2024 , url=
Training compute of frontier AI models grows by 4-5x per year , author=. 2024 , url=
2024
-
[40]
Advances in Neural Information Processing Systems , volume=
Algorithmic progress in language models , author=. Advances in Neural Information Processing Systems , volume=
-
[41]
Journal of Economic Theory , volume=
Capability accumulation and conglomeratization in the information age , author=. Journal of Economic Theory , volume=. 2023 , publisher=
2023
-
[42]
Review of Economic Studies , volume=
Inefficient automation , author=. Review of Economic Studies , volume=. 2025 , publisher=
2025
-
[43]
The Review of Economic Studies , volume=
Robots, trade, and luddism: A sufficient statistic approach to optimal technology regulation , author=. The Review of Economic Studies , volume=. 2023 , publisher=
2023
-
[44]
The Review of Economic Studies , volume=
Should robots be taxed? , author=. The Review of Economic Studies , volume=. 2022 , publisher=
2022
-
[45]
Blog Post , year=
The Extreme Inefficiency of RL for Frontier Models , author=. Blog Post , year=
-
[46]
Knight First Amend
AI as normal technology , author=. Knight First Amend. Inst , year=
-
[47]
arXiv preprint arXiv:2404.10102 , year=
Chinchilla scaling: A replication attempt , author=. arXiv preprint arXiv:2404.10102 , year=
-
[48]
Blog Post , year=
RL is even more information inefficient than you thought , author=. Blog Post , year=
-
[49]
The review of economic studies , volume=
The economic implications of learning by doing , author=. The review of economic studies , volume=. 1962 , publisher=
1962
-
[50]
Journal of the European Economic Association , pages=
Expertise , author=. Journal of the European Economic Association , pages=. 2025 , publisher=
2025
-
[51]
Tweet thread , year=
Tweet thread , author=. Tweet thread , year=
-
[52]
arXiv preprint arXiv:2509.20328 , year=
Video models are zero-shot learners and reasoners , author=. arXiv preprint arXiv:2509.20328 , year=
-
[53]
CNBC , year=
Legal AI startup Harvey hits \ 100 million in annual recurring revenue , author=. CNBC , year=
-
[54]
arXiv preprint arXiv:2303.17564 , year=
Bloomberggpt: A large language model for finance , author=. arXiv preprint arXiv:2303.17564 , year=
-
[55]
2025 , month =
The Future of Jobs Report 2025 , institution =. 2025 , month =
2025
-
[56]
Econometrica , volume=
Graphon games: A statistical framework for network games and interventions , author=. Econometrica , volume=. 2023 , publisher=
2023
-
[57]
2025 , institution=
Transformative AI and Firms , author=. 2025 , institution=
2025
-
[58]
Journal of political economy , volume=
Increasing returns and long-run growth , author=. Journal of political economy , volume=. 1986 , publisher=
1986
-
[59]
Handbook of Labor Economics , volume=
Tasks at work: comparative advantage, technology and labor demand , author=. Handbook of Labor Economics , volume=. 2025 , publisher=
2025
-
[60]
The BE Journal of Macroeconomics , volume=
Baumol's diseases: a macroeconomic perspective , author=. The BE Journal of Macroeconomics , volume=. 2008 , publisher=
2008
-
[61]
Journal of the American Statistical association , volume=
Reaching a consensus , author=. Journal of the American Statistical association , volume=. 1974 , publisher=
1974
-
[62]
Review of Economic Studies , volume=
Dynamic opinion aggregation: long-run stability and disagreement , author=. Review of Economic Studies , volume=. 2024 , publisher=
2024
-
[63]
The review of Economics and Statistics , volume=
Technical change and the aggregate production function , author=. The review of Economics and Statistics , volume=. 1957 , publisher=
1957
-
[64]
The American economic review , volume=
Macroeconomics of unbalanced growth: the anatomy of urban crisis , author=. The American economic review , volume=. 1967 , publisher=
1967
-
[65]
Axios , note =
AI jobs danger: Sleepwalking into a white-collar bloodbath , author =. Axios , note =. 2025 , month =
2025
-
[66]
The Quarterly journal of economics , volume=
The skill content of recent technological change: An empirical exploration , author=. The Quarterly journal of economics , volume=. 2003 , publisher=
2003
-
[67]
American Economic Review , volume=
A framework for economic growth with capital-embodied technical change , author=. American Economic Review , volume=. 2024 , publisher=
2024
-
[68]
2025 , eprint=
Pre-training under infinite compute , author=. 2025 , eprint=
2025
-
[69]
arXiv preprint arXiv:2203.15556 , year=
Training compute-optimal large language models , author=. arXiv preprint arXiv:2203.15556 , year=
-
[70]
Journal of Economic Theory , volume=
Contagion in graphons , author=. Journal of Economic Theory , volume=. 2023 , publisher=
2023
-
[71]
Available at SSRN 4292559 , year=
Speed vs resilience in contagion , author=. Available at SSRN 4292559 , year=
-
[72]
2025 , publisher=
The Scaling Era: An Oral History of AI, 2019--2025 , author=. 2025 , publisher=
2019
-
[73]
American economic review , volume=
The race between man and machine: Implications of technology for growth, factor shares, and employment , author=. American economic review , volume=. 2018 , publisher=
2018
-
[74]
AEA papers and proceedings , volume=
The impact of big data on firm performance: An empirical investigation , author=. AEA papers and proceedings , volume=. 2019 , organization=
2019
-
[75]
2012 , publisher=
Large networks and graph limits , author=. 2012 , publisher=
2012
-
[76]
2017 , institution=
Artificial intelligence and economic growth , author=. 2017 , institution=
2017
-
[77]
Armstrong , title =
Thomas E. Armstrong , title =. Real Analysis Exchange , year =
-
[78]
American Economic Journal: Macroeconomics , volume=
Are we approaching an economic singularity? information technology and the future of economic growth , author=. American Economic Journal: Macroeconomics , volume=. 2021 , publisher=
2021
-
[79]
2011 , publisher=
Banach space theory: The basis for linear and nonlinear analysis , author=. 2011 , publisher=
2011
-
[80]
2025 , url=
The AI Startup Fueling ChatGPT’s Expertise Is Now Valued at \ 10 Billion , author=. 2025 , url=
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.