arxiv: 2605.09047 · v1 · submitted 2026-05-09 · 💻 cs.NI · cs.SY· eess.SY· math.OC

Recognition: 2 theorem links

· Lean Theorem

Locational Pricing for Generative-AI Services via Token-Flow Market Clearing

Shaohui Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:58 UTC · model grok-4.3

classification 💻 cs.NI cs.SYeess.SYmath.OC

keywords locational pricinggenerative AItoken flowmarket clearingnetwork optimizationAI infrastructuremarginal pricesbandwidth constraints

0 comments

The pith

A linear program for token flows produces locational prices that dispatch AI workloads across distributed compute and networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets up a market model for generative AI services in which workloads move as token flows over networks of compute nodes and communication links. A baseline linear program jointly optimizes routing and processing while respecting node capacities and link bandwidths; the dual variables from this program become location-specific marginal prices for service. An extension version prices physical data transfers separately to isolate congestion rents on bandwidth. Case studies on five- and twenty-node U.S. networks illustrate how these prices shift when latency limits tighten or links saturate, and they show the system turning infeasible once demand exceeds total capacity. A reader would care because the same price signals could eventually guide both daily dispatch decisions and longer-term investment in AI infrastructure.

Core claim

The central claim is that a network-constrained token-flow market clears AI workloads by solving a linear program that co-optimizes routing and processing subject to compute-capacity and bandwidth constraints. Its dual variables supply location- and workload-specific marginal service prices. The transfer-aware extension prices data movement in physical units and separates bandwidth congestion rents. Results from a five-node network identify four saturated backbone links, show a 2.7 percent cost increase relative to the baseline, and demonstrate a 117 percent rise in one locational price when the latency limit drops from 100 ms to 15 ms; the twenty-node scale-up reproduces the same meritorder

What carries the argument

The dual variables of the token-flow linear program, which serve as locational marginal prices reflecting local compute scarcity and network congestion.

If this is right

Workloads are routed and processed at the combination of nodes and links that minimizes total operating cost given the stated constraints.
Bandwidth congestion produces identifiable rents that the transfer-aware model isolates from compute charges.
Tightening latency requirements raises prices sharply at locations that must use congested or distant resources.
Once demand exceeds aggregate capacity the optimization becomes infeasible, indicating where capacity additions are required.
The same price signals could later support competitive bidding among providers of AI compute and connectivity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be linked to electricity locational marginal pricing in regions where AI data centers draw large power loads.
Real-time versions might adjust prices continuously as user demand and network state change.
Global AI service economics would then show large regional cost differences driven by both compute density and interconnect quality.
Network planners could test proposed backbone upgrades by re-running the model and measuring the reduction in congestion rents.

Load-bearing premise

Real generative AI workloads and infrastructure can be represented accurately enough as linear token flows with fixed capacities and bandwidth limits for the resulting dual prices to serve as meaningful operational signals.

What would settle it

Deploy the model on an actual five-node regional network and observe that the computed locational prices do not align with the nodes and links chosen by operators or with the measured costs and latencies that occur in live service.

Figures

Figures reproduced from arXiv: 2605.09047 by Shaohui Liu.

**Figure 1.** Figure 1: 20-node U.S. scale-up at demand scale 35×. Node pies show the processing mix across chat, image, code-review, and batch-training workloads, with pie size proportional to total processing. Colored curves show the largest active inter-node transfers for each workload class, with thicker curves indicating higher transferred token volume. The figure makes the merit-order geography visible: low-cost hubs in the… view at source ↗

**Figure 2.** Figure 2: 5-node market outcome at demand scale 35×. Left: geographic dispatch map with node size proportional to total processing and active flows drawn between metros and colored by workload class. Center: optimal processing by node and workload class; parenthesized node labels report electricity prices in $/kWh. Right: class-specific utilization rates. Cheap nodes carry most of the system load, while image genera… view at source ↗

**Figure 3.** Figure 3: Market dynamics under demand growth in the 5-node network. Left: mean locational prices by workload class in $/M [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Left: physical bandwidth utilization of active links under the transfer-aware LP with four links fully saturated. Right: [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

GenAI services are in an early yet fast expanding phase. Providers compete on model capability and service quality, while the underlying infrastructure remains expensive and heterogeneous across regions, workloads, and compute assets. If these services diffuse into routine daily use, the relevant engineering problem becomes not only better models but also efficient dispatch on a geographically distributed AI service infrastructure. To address this, we formulate a network-constrained token-flow market that clears AI workloads across compute nodes and communication links. The baseline model is a linear program that co-optimizes routing and processing subject to compute-capacity and bandwidth constraints; its dual variables define location- and workload-specific marginal service prices. We further introduce a transfer-aware extension that prices data movement in physical units and isolates bandwidth congestion rents. In a 5-node U.S. case study, the transfer-aware model uncovers four saturated backbone links and raises total operating cost by 2.7\% relative to the token-equivalent baseline, while tightening the chatbot latency limit from 100~ms to 15~ms increases one locational price by 117\%. A 20-node scale-up exhibits the same merit-order dispatch logic and becomes infeasible once demand exceeds aggregate capacity. These results suggest that locational pricing is a useful organizing principle for operating an emerging AI service infrastructure and, over time, for designing competitive markets around it.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies locational marginal pricing to GenAI via a token-flow LP and shows some illustrative price effects, but the linear model leaves real workload fit untested.

read the letter

The paper's main idea is to treat GenAI workloads as token flows on a network and clear them with a linear program whose duals give locational marginal prices, plus an extension that prices bandwidth transfers in physical units. This is a direct transplant of power systems concepts to AI service infrastructure. It sets up the model clearly and runs two case studies. In the 5-node U.S. example the transfer-aware version identifies four congested links and lifts total cost by 2.7 percent, while tightening the latency bound from 100 ms to 15 ms spikes one price by 117 percent. The 20-node version shows the same dispatch order until aggregate capacity is exceeded. These numbers demonstrate how the framework can highlight bottlenecks and price signals. The soft spot is the reliance on linear token flows with fixed capacities. GenAI inference involves variable rates, batching, and contention that break linearity, and the studies stay internal without testing against real traces or more detailed simulators. That leaves the operational usefulness of the prices open. This is for engineers and researchers focused on AI resource management and market mechanisms. Someone building dispatch systems for distributed GenAI could draw from the formulation. It should go to peer review. The math is standard and consistent, the application is fresh, and the gaps are addressable with added validation.

Referee Report

2 major / 1 minor

Summary. The paper formulates a network-constrained token-flow market as a linear program that co-optimizes routing and processing of generative-AI workloads subject to compute-capacity and bandwidth constraints; dual variables supply location- and workload-specific marginal prices. A transfer-aware extension prices physical data movement and isolates bandwidth congestion rents. Numerical results from a 5-node U.S. case study show four saturated links, a 2.7% operating-cost increase, and a 117% locational-price jump under a tighter 15 ms latency bound; a 20-node scale-up reproduces the same merit-order logic and becomes infeasible beyond aggregate capacity.

Significance. If the linear token-flow abstraction adequately captures dispatch, the framework supplies a transparent, parameter-free method for extracting marginal prices that could support efficient operation and eventual market design for distributed GenAI infrastructure. The direct use of LP duality for pricing is a methodological strength, as prices emerge endogenously from the stated constraints rather than from external fitting.

major comments (2)

[5-node U.S. case study] 5-node U.S. case study: the reported 2.7% cost uplift and 117% price increase are presented as direct LP outputs, yet the manuscript supplies no comparison against real GenAI workload traces, batch-level latency measurements, or higher-fidelity non-linear simulators; without such grounding the dual prices' operational relevance for the central claim remains untested.
[Baseline model] Baseline LP formulation: the model treats token flows as linear with fixed capacities and deterministic bandwidth limits, but real inference exhibits batch-dependent latency, stochastic token rates, and GPU memory contention; this modeling choice is load-bearing for the interpretability of the derived locational prices.

minor comments (1)

[Abstract] The abstract states that the transfer-aware model 'uncovers four saturated backbone links' but does not identify the links or supply the underlying network topology, limiting reproducibility of the congestion-rent results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and constructive feedback on our manuscript. We address each major comment below, clarifying the scope of our modeling choices and the illustrative purpose of the case studies while acknowledging their limitations.

read point-by-point responses

Referee: [5-node U.S. case study] 5-node U.S. case study: the reported 2.7% cost uplift and 117% price increase are presented as direct LP outputs, yet the manuscript supplies no comparison against real GenAI workload traces, batch-level latency measurements, or higher-fidelity non-linear simulators; without such grounding the dual prices' operational relevance for the central claim remains untested.

Authors: We agree that the 5-node results rely on stylized network parameters and synthetic demand rather than real GenAI workload traces, empirical batch latency data, or non-linear simulators. The reported 2.7% cost increase and 117% locational price jump are exact dual outputs from the LP under the stated assumptions, intended to demonstrate how congestion and latency bounds endogenously shape prices. We do not claim these figures represent operational forecasts for deployed systems. In the revised manuscript we will add an explicit limitations subsection noting the illustrative character of the case study and outlining pathways for future validation against production traces and higher-fidelity models. revision: partial
Referee: [Baseline model] Baseline LP formulation: the model treats token flows as linear with fixed capacities and deterministic bandwidth limits, but real inference exhibits batch-dependent latency, stochastic token rates, and GPU memory contention; this modeling choice is load-bearing for the interpretability of the derived locational prices.

Authors: The linear, deterministic token-flow abstraction is a deliberate modeling decision that enables direct application of LP duality to obtain transparent, constraint-derived marginal prices, following the precedent of linearized network models in electricity markets. While we recognize that actual inference involves batch effects, stochasticity, and memory contention that could alter realized latencies and dispatch, the framework isolates the pricing implications of capacity and bandwidth limits under average-flow assumptions. This choice keeps prices endogenous and parameter-free. We will insert a dedicated limitations paragraph discussing these simplifications and their consequences for price interpretability, without altering the core formulation. revision: yes

Circularity Check

0 steps flagged

No circularity: prices are standard duals of the formulated LP

full rationale

The paper's core derivation consists of stating a linear program that co-optimizes token routing and processing under capacity and bandwidth constraints, then extracting locational prices directly as the dual variables of those constraints. This is a standard, non-circular extraction of marginal costs from any linear program; the duals are defined by the primal constraints by construction of duality theory, not by fitting, self-definition, or self-citation. The 5-node and 20-node numerical examples simply solve the stated model and report the resulting dual values; no quantity is renamed as a prediction after being fitted to the same data, and no load-bearing premise rests on prior self-citations. The derivation chain is therefore self-contained and independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that AI service dispatch can be modeled as a linear token-flow problem whose duals give operationally useful prices; no new physical entities are postulated and no parameters are fitted to data in the abstract.

axioms (1)

domain assumption AI workloads can be represented as token flows subject to node compute capacities and link bandwidth limits
This representation is the foundation of both the baseline LP and the transfer-aware extension.

pith-pipeline@v0.9.0 · 5545 in / 1253 out tokens · 40638 ms · 2026-05-12T01:58:22.409329+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
The baseline clearing problem is formulated as the following linear program: min … s.t. A f_k + x_k = d_k … (1b) … dual variables define location- and workload-specific marginal service prices.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear
For any active arc … π_i,k − π_j,k = c_ij,k + η_ij (7); … π_j,k = g_j,k + μ_j,k (8)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

[1]

Ahuja, Thomas L

Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. 2001. Minimum Cost Flow Problem. InEncyclopedia of Optimization. Springer US, 1382–1392. doi:10.1007/0-306-48332-7_283

work page doi:10.1007/0-306-48332-7_283 2001
[2]

Ross Baldick. 2018. Locational Marginal Pricing. Course notes for EE394V: Restructured Electricity Markets: Locational Marginal Pricing. https://users.ece. utexas.edu/~baldick/classes/394V/Locational.pdf Accessed 2026-03-16

work page 2018
[3]

Cynthia Barnhart, Niranjan Krishnan, and Pamela H. Vance. 2008. Multicom- modity Flow Problems. InEncyclopedia of Optimization. Springer US, 2354–2362. doi:10.1007/978-0-387-74759-0_407

work page doi:10.1007/978-0-387-74759-0_407 2008
[4]

Bradley, Arnoldo C

Stephen P. Bradley, Arnoldo C. Hax, and Thomas L. Magnanti. 1977. Network Models. InApplied Mathematical Programming. Addison-Wesley, Chapter 8. https://web.mit.edu/15.053/www/AMP-Chapter-08.pdf MIT-hosted PDF version, accessed 2026-03-16

work page 1977
[5]

California Independent System Operator. 2010. Appendix C: Locational Marginal Price. https://www.caiso.com/documents/appendicesc-f-fifthreplacementcaiso tariff_15-dec-10.pdf Fifth replacement electronic tariff, effective December 15, 2010

work page 2010
[6]

Data Center Map. 2026. Data Center Map: USA. https://www.datacentermap.co m/usa/ ; manually curated and accessed 2026-03-27

work page 2026
[7]

Deloitte. 2026. The State of AI in the Enterprise - 2026 AI Report. https://ww w.deloitte.com/cz-sk/en/issues/generative-ai/state-of-ai-in-enterprise.html Accessed 2026-04-05

work page 2026
[8]

Anish Devasia. 2025. Data Center Operating Costs: Complete Guide (2026). The Network Installers. https://thenetworkinstallers.com/blog/data-center- operating-costs/ Accessed 2026-03-16

work page 2025
[9]

Digital China Summit. 2024. National Supercomputing Internet Platform Builds a “Highway” for Digital China. https://www.digitalchina.gov.cn/magazine/4/2 5/technology/1867.html Translated title; article on the Digital China Summit website, accessed 2026-03-16

work page 2024
[10]

Stephen Frank and Steffen Rebennack. 2016. An Introduction to Optimal Power Flow: Theory, Formulation, and Examples.IIE Transactions48, 12 (2016), 1172–

work page 2016
[11]

doi:10.1080/0740817X.2016.1189626

work page doi:10.1080/0740817x.2016.1189626 2016
[12]

IAEI Magazine. 2026. How Much Electricity Does a Data Center Use? Complete 2025 Analysis. https://iaeimagazine.org/electrical-fundamentals/how-much- electricity-does-a-data-center-use-complete-2025-analysis/ Page updated 2026-01-01; accessed 2026-03-16

work page 2026
[13]

F. P. Kelly, A. K. Maulloo, and D. K. H. Tan. 1998. Rate Control for Communication Networks: Shadow Prices, Proportional Fairness and Stability.Journal of the Operational Research Society49, 3 (1998), 237–252. doi:10.1057/palgrave.jors.26 00523

work page doi:10.1057/palgrave.jors.26 1998
[14]

Low and David E

Steven H. Low and David E. Lapsley. 1999. Optimization Flow Control—I: Basic Algorithm and Convergence.IEEE/ACM Transactions on Networking7, 6 (1999), 861–874. doi:10.1109/90.811451

work page doi:10.1109/90.811451 1999
[15]

Ministry of Industry and Information Technology of the People’s Republic of China. 2025. Notice on Issuing the Action Plan for Computing Power Intercon- nection. https://www.miit.gov.cn/zwgk/zcwj/wjfb/tz/art/2025/art_1bbbd7e75dc c4fd8b75d6d2109f2e9ab.html Document No. Gong Xin Bu Xin Guan [2025] 119, accessed 2026-03-16

work page 2025
[16]

Ministry of Industry and Information Technology of the PRC. 2026. MIIT Ad- vances the “1+M+N” National Computing Power Interconnection Node System. https://www.miit.gov.cn/jgsj/xgj/gzdt/art/2026/art_0df3c73645a64d7dab09f82 46acf7049.html , accessed 2026-03-16

work page 2026
[17]

Palomar and Mung Chiang

Daniel P. Palomar and Mung Chiang. 2006. A Tutorial on Decomposition Methods for Network Utility Maximization.IEEE Journal on Selected Areas in Communi- cations24, 8 (2006), 1439–1451. doi:10.1109/JSAC.2006.879350

work page doi:10.1109/jsac.2006.879350 2006
[18]

Schweppe, Michael C

Fred C. Schweppe, Michael C. Caramanis, Richard D. Tabors, and Roger E. Bohn. 1988.Spot Pricing of Electricity. Springer US. https://books.google.com/books/a bout/Spot_Pricing_of_Electricity.html?id=Sg5zRPWrZ_gC

work page 1988
[19]

1984.Urban Transportation Networks: Equilibrium Analysis with Mathematical Programming Methods

Yosef Sheffi. 1984.Urban Transportation Networks: Equilibrium Analysis with Mathematical Programming Methods. Prentice-Hall, Englewood Cliffs, NJ. https: //books.google.com/books/about/Urban_Transportation_Networks.html?id=z x1PAAAAMAAJ

work page 1984
[20]

Stanford Institute for Human-Centered Artificial Intelligence. 2025. Economy | The 2025 AI Index Report. https://hai.stanford.edu/ai-index/2025-ai-index- report/economy Accessed 2026-04-05

work page 2025
[21]

2002.Power System Economics: Designing Markets for Electricity

Steven Stoft. 2002.Power System Economics: Designing Markets for Electricity. Wiley. https://books.google.com/books/about/Power_System_Economics.html ?id=DrTEsqJRKrYC

work page 2002
[22]

Energy Information Administration

U.S. Energy Information Administration. 2026. Electricity Open Data Browser: Retail Sales by State and Sector. https://www.eia.gov/opendata/index.php/bro wser/electricity/retail-sales ; accessed 2026-03-27

work page 2026