pith. sign in

arxiv: 2606.10451 · v1 · pith:44YHKOXWnew · submitted 2026-06-09 · 💻 cs.GT

Arbitrage-free Data Pricing

Pith reviewed 2026-06-27 11:15 UTC · model grok-4.3

classification 💻 cs.GT
keywords arbitrage-free pricingdata marketsinformation pricingBlackwell dominancequery pricingmodel pricingthreshold utilitiesBayesian decision making
0
0 comments X

The pith

Under threshold utilities, arbitrage-freeness in information pricing is characterized exactly by Blackwell dominance, unifying the conditions for query and model pricing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the seller's problem of pricing replicable data products such as query access and noisy model releases when buyers derive value from Bayesian decision problems. It first formulates a general arbitrage-free information selling problem and establishes its computational hardness, supplying a branch-and-bound algorithm. For the restricted case of threshold utilities, in which a buyer assigns positive value to an experiment if and only if the experiment meets a minimum informativeness level, the paper proves that arbitrage-freeness holds precisely when the offered products are ordered by Blackwell dominance. This single characterization recovers the known no-arbitrage conditions previously derived separately for query pricing and for model pricing, and it yields revenue-maximizing prices for restricted menus of queries and models.

Core claim

When buyers have threshold utilities, arbitrage-freeness of a menu of information products is equivalent to the menu being totally ordered by Blackwell dominance; this equivalence unifies the arbitrage-free conditions for query pricing and model pricing as special cases and permits explicit characterization of revenue-maximizing prices under restricted query and model menus.

What carries the argument

Blackwell dominance, the partial order on experiments that ranks one experiment as more informative than another for every possible decision problem, used here to enforce that no combination of cheaper products can replicate a more expensive one.

If this is right

  • The separate arbitrage-free conditions previously obtained for query pricing and for model pricing become instances of one common Blackwell-order requirement.
  • Revenue-maximizing prices can be characterized explicitly once menus are restricted to queries or to models.
  • A branch-and-bound procedure based on McCormick relaxations solves the general arbitrage-free information pricing problem to optimality.
  • Sellers can version data products by offering experiments that are comparable under Blackwell dominance without creating arbitrage opportunities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The Blackwell characterization may supply a template for checking arbitrage-freeness in other information markets once buyer utilities are known to be threshold type.
  • Menus designed via Blackwell ordering could be tested directly in laboratory experiments that present subjects with threshold-value decision problems.
  • The unification suggests that pricing algorithms developed for one data product type can be ported to the other without re-deriving no-arbitrage constraints.

Load-bearing premise

Buyers derive value from data solely through Bayesian decision making and assign positive value to an experiment if and only if that experiment is sufficiently informative according to a fixed threshold.

What would settle it

A concrete pricing menu of two or more experiments that satisfies arbitrage-freeness yet violates total Blackwell ordering (or the converse) when buyers are restricted to threshold utilities.

read the original abstract

Driven by the rising value of data in applications such as advertising, finance, and machine learning, markets for data products have become increasingly important. Data markets mainly sell two kinds of products: datasets and machine learning models. Since these products can be replicated at negligible marginal cost, sellers naturally version them through query access and noisy model releases. Versioning immediately raises an arbitrage problem: a buyer may combine cheaper purchases and recover a more informative product at a lower total price. Existing work on query and model pricing studies arbitrage-freeness when buyer values are treated as exogenous, whereas the literature on selling information derives value from the buyer's decision problem but ignores arbitrage-freeness. Accordingly, we study the seller's optimal data pricing problem where buyers value data through Bayesian decision making and we impose arbitrage-freeness constraints. We first interpret query and model pricing as special cases of information pricing, and formulate the general arbitrage-free information selling problem, show the computational hardness and give a branch-and-bound algorithm based on McCormick relaxations. We then consider threshold utilities where buyers have a positive value if and only if the experiment is sufficiently informative. Under this condition, we find that the arbitrage-freeness can be characterized by Blackwell dominance, which in turn unifies the arbitrage-free conditions for query pricing \cite{deep2017design} and model pricing \cite{chen2019towards}. Finally, we characterize the revenue-maximizing pricing under restricted query and model menus.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper studies arbitrage-free pricing of information products (query access to datasets and noisy releases of ML models) where buyers derive value from Bayesian decision problems rather than exogenous valuations. It formulates the general arbitrage-free information-selling problem, establishes its computational hardness, and supplies a branch-and-bound algorithm based on McCormick relaxations. Under the restriction to threshold utilities (positive value iff the experiment meets an informativeness threshold), it shows that arbitrage-freeness is characterized by Blackwell dominance; this unifies the conditions previously derived separately for query pricing and model pricing. The paper concludes by characterizing revenue-maximizing prices for restricted query and model menus.

Significance. If the Blackwell characterization and unification hold, the work supplies a decision-theoretic foundation for arbitrage-free data markets that bridges the exogenous-value literature on query/model pricing with the information-design literature. The algorithmic treatment of the general case, despite hardness, and the clean reduction to Blackwell order under a natural utility class are the primary contributions. These results could inform mechanism design for data marketplaces in advertising, finance, and ML.

major comments (2)
  1. [Abstract / general formulation] The abstract states that the general problem is computationally hard and that a branch-and-bound algorithm is supplied, but no explicit hardness reduction or complexity class is referenced in the provided text; the load-bearing claim that the algorithm solves the general case therefore cannot be verified without the full derivation.
  2. [Abstract / threshold utilities paragraph] The unification claim rests on the threshold-utility restriction being invoked precisely when moving from the general formulation to the Blackwell characterization; if the paper later relaxes this restriction without additional conditions, the unification would no longer hold for arbitrary utilities.
minor comments (1)
  1. [Abstract] The abstract cites deep2017design and chen2019towards but does not indicate whether the Blackwell characterization recovers their exact arbitrage-free conditions as special cases or only qualitatively similar ones.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful summary and major comments. We address each point below, clarifying the structure and scope of the results in the manuscript.

read point-by-point responses
  1. Referee: [Abstract / general formulation] The abstract states that the general problem is computationally hard and that a branch-and-bound algorithm is supplied, but no explicit hardness reduction or complexity class is referenced in the provided text; the load-bearing claim that the algorithm solves the general case therefore cannot be verified without the full derivation.

    Authors: The explicit hardness reduction (establishing NP-hardness of the general arbitrage-free information pricing problem) appears in Section 3 of the full manuscript. The branch-and-bound algorithm, which applies to the general case via McCormick relaxations on the bilinear arbitrage constraints, is developed and analyzed in Section 4. The abstract summarizes these contributions at a high level, consistent with standard practice; the derivations are in the body text. revision: no

  2. Referee: [Abstract / threshold utilities paragraph] The unification claim rests on the threshold-utility restriction being invoked precisely when moving from the general formulation to the Blackwell characterization; if the paper later relaxes this restriction without additional conditions, the unification would no longer hold for arbitrary utilities.

    Authors: The Blackwell dominance characterization and unification of the query and model pricing conditions are derived only under the threshold utilities restriction, as stated in the abstract and proven in Section 5. The manuscript does not relax this restriction when stating the unification result. The general (non-threshold) case is handled separately by the hardness result and branch-and-bound algorithm in Sections 3 and 4. revision: no

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper explicitly scopes the main result to threshold utilities, under which it derives that arbitrage-freeness is characterized by Blackwell dominance; this characterization is presented as following from the Bayesian decision-making model and the arbitrage constraint rather than being presupposed or fitted. The unification with prior query and model pricing conditions is a consequence of the derived equivalence, not an input. No self-citations appear as load-bearing steps, no parameters are fitted and then relabeled as predictions, and the argument does not reduce any claimed result to its own definition or to a self-referential ansatz. The general formulation is acknowledged as hard, with the threshold case serving as a tractable restriction that yields an independent characterization.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard concepts from information economics and optimization theory; no new free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (2)
  • domain assumption Buyers value information according to Bayesian decision making
    Explicitly stated in the abstract as the foundation for how buyer values are derived.
  • standard math Standard mathematical assumptions for optimization and information ordering (Blackwell dominance)
    Invoked when characterizing arbitrage-freeness and when applying McCormick relaxations.

pith-pipeline@v0.9.1-grok · 5797 in / 1363 out tokens · 19903 ms · 2026-06-27T11:15:27.654975+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 3 canonical work pages

  1. [1]

    Optimal mechanisms for selling information

    Moshe Babaioff, Robert Kleinberg, and Renato Paes Leme. Optimal mechanisms for selling information. InProceedings of the 13th ACM Conference on Electronic Commerce, pages 92–109, 2012

  2. [2]

    Data markets in the cloud: An opportunity for the database community.Proceedings of the VLDB Endowment, 4(12):1482–1485, 2011

    Magdalena Balazinska, Bill Howe, and Dan Suciu. Data markets in the cloud: An opportunity for the database community.Proceedings of the VLDB Endowment, 4(12):1482–1485, 2011

  3. [3]

    Markets for information: An introduction.Annual Review of Economics, 11(1):85–107, 2019

    Dirk Bergemann and Alessandro Bonatti. Markets for information: An introduction.Annual Review of Economics, 11(1):85–107, 2019

  4. [4]

    Information markets and nonmarkets

    Dirk Bergemann and Marco Ottaviani. Information markets and nonmarkets. InHandbook of industrial organization, volume 4, pages 593–672. Elsevier, 2021

  5. [5]

    The design and price of information

    Dirk Bergemann, Alessandro Bonatti, and Alex Smolin. The design and price of information. American economic review, 108(1):1–48, 2018

  6. [6]

    Is selling complete information (approximately) optimal? InProceedings of the 23rd ACM Conference on Economics and Computation, pages 608–663, 2022

    Dirk Bergemann, Yang Cai, Grigoris Velegkas, and Mingfei Zhao. Is selling complete information (approximately) optimal? InProceedings of the 23rd ACM Conference on Economics and Computation, pages 608–663, 2022

  7. [7]

    When data pricing meets non- cooperative game theory

    Yuran Bi, Yihang Wu, Jinfei Liu, Kui Ren, and Li Xiong. When data pricing meets non- cooperative game theory. In2024 IEEE 40th International Conference on Data Engineering (ICDE), pages 5548–5559, 2024. doi: 10.1109/ICDE60146.2024.00443

  8. [8]

    and Leibler, R

    David Blackwell. Equivalent comparisons of experiments.The Annals of Mathematical Statistics, 24(2):265–272, 1953. doi: 10.1214/aoms/1177729032

  9. [9]

    How to sell information optimally: An algorithmic study

    Yang Cai and Grigoris Velegkas. How to sell information optimally: An algorithmic study. arXiv preprint arXiv:2011.14570, 2020

  10. [10]

    Revenue maximization for query pricing.Proceedings of the VLDB Endowment, 13(1):1–14, 2019

    Shuchi Chawla, Shaleen Deep, Paraschos Koutrisw, and Yifeng Teng. Revenue maximization for query pricing.Proceedings of the VLDB Endowment, 13(1):1–14, 2019

  11. [11]

    Towards model-based pricing for machine learning in a data marketplace

    Lingjiao Chen, Paraschos Koutris, and Arun Kumar. Towards model-based pricing for machine learning in a data marketplace. InProceedings of the 2019 international conference on management of data, pages 1535–1552, 2019

  12. [12]

    Selling information through consulting

    Yiling Chen, Haifeng Xu, and Shuran Zheng. Selling information through consulting. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2412–2431. SIAM, 2020

  13. [13]

    Data pricing in machine learning pipelines.Knowledge and Information Systems, 64(6):1417–1455, 2022

    Zicun Cong, Xuan Luo, Jian Pei, Feida Zhu, and Yong Zhang. Data pricing in machine learning pipelines.Knowledge and Information Systems, 64(6):1417–1455, 2022

  14. [14]

    The design of arbitrage-free data pricing schemes

    Shaleen Deep and Paraschos Koutris. The design of arbitrage-free data pricing schemes. In 20th International Conference on Database Theory, 2017

  15. [15]

    Qirana: A framework for scalable query pricing

    Shaleen Deep and Paraschos Koutris. Qirana: A framework for scalable query pricing. In Proceedings of the 2017 ACM International Conference on Management of Data, pages 699–713, 2017

  16. [16]

    Algorithmic bayesian persuasion.SIAM Journal on Computing, 46(1):335–368, 2017

    Shaddin Dughmi and Haifeng Xu. Algorithmic bayesian persuasion.SIAM Journal on Computing, 46(1):335–368, 2017

  17. [17]

    Quantifying information and uncertainty.American Economic Review, 109(10):3650–3680, 2019

    Alexander Frankel and Emir Kamenica. Quantifying information and uncertainty.American Economic Review, 109(10):3650–3680, 2019. 18

  18. [18]

    Bayesian persuasion.American Economic Review, 101(6):2590–2615, 2011

    Emir Kamenica and Matthew Gentzkow. Bayesian persuasion.American Economic Review, 101(6):2590–2615, 2011

  19. [19]

    Toward practical query pricing with querymarket

    Paraschos Koutris, Prasang Upadhyaya, Magdalena Balazinska, Bill Howe, and Dan Suciu. Toward practical query pricing with querymarket. Inproceedings of the 2013 ACM SIGMOD international conference on management of data, pages 613–624, 2013

  20. [20]

    Query-based data pricing.Journal of the ACM (JACM), 62(5):1–44, 2015

    Paraschos Koutris, Prasang Upadhyaya, Magdalena Balazinska, Bill Howe, and Dan Suciu. Query-based data pricing.Journal of the ACM (JACM), 62(5):1–44, 2015

  21. [21]

    An automatic method of solving discrete programming problems

    AH Land and AG Doig. An automatic method of solving discrete programming problems. Econometrica, 28(3):497–520, 1960

  22. [22]

    A theory of pricing private data

    Chao Li, Daniel Yang Li, Gerome Miklau, and Dan Suciu. A theory of pricing private data. ACM Transactions on Database Systems (TODS), 39(4):1–28, 2014

  23. [23]

    Selling data to an agent with endogenous information

    Yingkai Li. Selling data to an agent with endogenous information. InProceedings of the 23rd ACM Conference on Economics and Computation, pages 664–665, 2022

  24. [24]

    On arbitrage-free pricing for general data queries.Pro- ceedings of the VLDB Endowment, 7(9):757–768, 2014

    Bing-Rong Lin and Daniel Kifer. On arbitrage-free pricing for general data queries.Pro- ceedings of the VLDB Endowment, 7(9):757–768, 2014

  25. [25]

    Dealer: An end-to-end model marketplace with differential privacy.Proceedings of the VLDB Endowment, 14(6), 2021

    Jinfei Liu, Jian Lou, Junxu Liu, Li Xiong, Jian Pei, and Jimeng Sun. Dealer: An end-to-end model marketplace with differential privacy.Proceedings of the VLDB Endowment, 14(6), 2021

  26. [26]

    Optimal pricing of information

    Shuze Liu, Weiran Shen, and Haifeng Xu. Optimal pricing of information. InProceedings of the 22nd ACM Conference on Economics and Computation, pages 693–693, 2021

  27. [27]

    McCormick

    Garth P. McCormick. Computability of global solutions to factorable nonconvex programs: Part I—convex underestimating problems.Mathematical Programming, 10(1):147–175, 1976. doi: 10.1007/BF01580665

  28. [28]

    A survey on data pricing: from economics to data science.IEEE Transactions on knowledge and Data Engineering, 34(10):4586–4608, 2020

    Jian Pei. A survey on data pricing: from economics to data science.IEEE Transactions on knowledge and Data Engineering, 34(10):4586–4608, 2020

  29. [29]

    On scalable query pricing in data marketplaces

    Huanhuan Peng, Xiaoye Miao, Yicheng Fu, Jinshan Zhang, Shuiguang Deng, and Jianwei Yin. On scalable query pricing in data marketplaces. In2025 IEEE 41st International Conference on Data Engineering (ICDE), pages 3140–3152. IEEE Computer Society, 2025

  30. [30]

    Versioning: the smart way to sell information.Harvard business review, 107(6):107, 1998

    Carl Shapiro and Hal R Varian. Versioning: the smart way to sell information.Harvard business review, 107(6):107, 1998

  31. [31]

    Wang, Haocheng Xia, Li Xiong, Xiaohui Yu, and James Zou

    Jiayao Zhang, Yuran Bi, Mengye Cheng, Jinfei Liu, Kui Ren, Qiheng Sun, Yihang Wu, Yang Cao, Raul Castro Fernandez, Haifeng Xu, Ruoxi Jia, Yongchan Kwon, Jian Pei, Jiachen T. Wang, Haocheng Xia, Li Xiong, Xiaohui Yu, and James Zou. A survey on data markets,

  32. [32]

    A Omitted Proofs in Section 2 Proof of Proposition 2.2

    URLhttps://arxiv.org/abs/2411.07267. A Omitted Proofs in Section 2 Proof of Proposition 2.2. Let the original menu be M = {(Ej, tj)}N j=1. For each type θ, let Bθ = (iθ 1, . . . , iθ ℓθ) be the bundle selected by typeθfromM. Thus Bθ ∈arg max B {Vθ(EB)−t B}. 19 If several bundles maximize the buyer’s utility, fix the same tie-breaking rule as in the origin...

  33. [33]

    among the first m copies there are exactly r correct signals, and the two new copies are both correct

  34. [34]

    ⊥ versus U

    among the first m copies there are exactly r + 1 correct signals, and the two new copies are both incorrect. Therefore qm+2 −q m = 2r+ 1 r (1−ε) rεr+1(1−ε) 2 − 2r+ 1 r+ 1 (1−ε) r+1εrε2 = 2r+ 1 r (1−ε) r+1εr+1(1−2ε)>0. Hence majority on m + 2 copies achieves strictly higher expected utility than majority on m copies. On the other hand, E⊗(m+2) ⪰B E⊗m, sinc...