pith. sign in

arxiv: 2606.30423 · v1 · pith:T4PGJBBSnew · submitted 2026-06-29 · 💻 cs.LG · cs.CR

Proofs of Ownership for Machine Learning Models

Pith reviewed 2026-06-30 06:47 UTC · model grok-4.3

classification 💻 cs.LG cs.CR
keywords proof of ownershipmachine learning modelsconcept classesself-correctabilityblack-box accessclassifierscryptographic assumptions
0
0 comments X

The pith

Model ownership can be proven for a concept class if and only if the class is not self-correctable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets up a three-party game in which an owner perturbs a model and attaches a proof, a thief makes minimal changes to hide the origin, and a judge must decide whether the submitted model comes from the owner. It shows that under standard cryptographic assumptions this proof succeeds exactly when the underlying concept class cannot self-correct. The result applies to classifiers in the black-box setting and is constructive, so explicit ownership mechanisms follow from the non-self-correctability condition. A reader would care because the dichotomy tells practitioners which model families can receive cryptographic ownership protection and which cannot.

Core claim

The paper proves a dichotomy: in the black-box setting for classifiers, under standard cryptographic assumptions, ownership proofs exist for a concept class if and only if the class is not self-correctable. The construction is explicit and the same separation extends, with variations, to several related model settings.

What carries the argument

The three-party ownership game in which the owner perturbs the model with a proof, the thief applies minimal modifications, and the judge decides origin based on the submitted model and proof.

If this is right

  • Explicit ownership mechanisms can be built for every concept class that is not self-correctable.
  • No such mechanism exists for self-correctable classes under the stated cryptographic assumptions.
  • The same separation holds, with adjustments, across multiple model ownership variants.
  • The result applies directly to black-box classifier access.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • For common neural-network classes that turn out to be self-correctable, designers would need non-cryptographic ownership methods.
  • The game could be instantiated on concrete classes such as linear separators to test the boundary.
  • Similar games might separate provable ownership from non-provable cases in other learning settings like regression or generative models.

Load-bearing premise

The three-party game accurately captures how model theft and verification occur in practice, and self-correctability is the property that separates cases where proofs exist from cases where they do not.

What would settle it

A concrete example of a non-self-correctable class for which no judge can reliably distinguish an owned model from an independently trained one, or a self-correctable class for which an explicit ownership proof works against any minimal thief modification.

read the original abstract

With the increasing adoption of Machine Learning, protecting model ownership has become an essential challenge. We initiate a formal study of Proof of Ownership for machine learning models: under what conditions can one prove that a stolen model originated from a particular creator? We model proofs of ownership as a game among three parties: a model owner, a thief, and a judge. The owner transforms the original model into a slightly perturbed model together with a proof of ownership. The thief then obtains the transformed model and attempts to minimally modify it so that it remains useful but escapes detection as owned by the model owner. Finally, the judge receives a model and a proof of ownership, and must decide whether the given model is a modified version of some model created by the model owner, or else the given model was developed independently. Our main result is a dichotomy for classifiers in the black-box setting: Under standard cryptographic assumptions, ownership of models for some concept class can be proven in the above sense {\em if and only if} the concept class is not self-correctable, in a sense close to that of Blum, Luby and Rubinfeld, STOC'90. The result is constructive and extends, with some variations, to a number of related settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper initiates a formal cryptographic study of proofs of ownership for machine learning models, formalized as a three-party game (owner perturbs model with proof; thief minimally modifies to evade detection while preserving usefulness; judge verifies origin). The central claim is a dichotomy theorem for black-box classifiers: under standard cryptographic assumptions, such proofs exist if and only if the underlying concept class is not self-correctable (in a notion close to Blum-Luby-Rubinfeld, STOC 1990). The result is stated to be constructive and to extend, with variations, to related settings.

Significance. If the dichotomy and its reductions hold, the work supplies a clean theoretical separation between concept classes for which ownership can be provably asserted and those for which it cannot, directly linking model-protection questions to property-testing notions. The constructive nature of the positive direction and the explicit extensions to related settings would be genuine strengths, offering both impossibility results and concrete constructions that could inform practical watermarking or verification protocols.

major comments (2)
  1. [Abstract] Abstract (main theorem statement): The 'only if' direction asserts that self-correctability allows a thief to escape detection while keeping the model useful under the judge's distribution in the three-party game. Because the paper's self-correctability notion is only 'close to' the original BLR definition, an explicit lemma is required showing that the output of the adapted self-corrector meets the exact usefulness predicate used in the game definition; without it the reduction does not go through.
  2. [Abstract] Game definition and reduction (main theorem): The three-party game quantifies 'minimally modify' and 'remains useful' via specific distance and accuracy thresholds on the judge's distribution. It is not immediate that the error bounds or query distribution in the paper's self-correctability definition coincide with these thresholds; a concrete comparison or simulation argument between the two metrics is needed to establish that the thief's output remains useful.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting points where the link between the adapted self-correctability notion and the three-party game could be made fully explicit. We will revise the manuscript to add the requested lemma and comparison argument, thereby closing the identified gaps in the 'only if' direction.

read point-by-point responses
  1. Referee: [Abstract] Abstract (main theorem statement): The 'only if' direction asserts that self-correctability allows a thief to escape detection while keeping the model useful under the judge's distribution in the three-party game. Because the paper's self-correctability notion is only 'close to' the original BLR definition, an explicit lemma is required showing that the output of the adapted self-corrector meets the exact usefulness predicate used in the game definition; without it the reduction does not go through.

    Authors: We agree that an explicit lemma is needed to make the reduction rigorous. In the revised manuscript we will insert a dedicated lemma (placed immediately before the main theorem) that shows the output of our adapted self-corrector satisfies the precise usefulness predicate appearing in the game definition. This will eliminate any ambiguity arising from the 'close to' phrasing. revision: yes

  2. Referee: [Abstract] Game definition and reduction (main theorem): The three-party game quantifies 'minimally modify' and 'remains useful' via specific distance and accuracy thresholds on the judge's distribution. It is not immediate that the error bounds or query distribution in the paper's self-correctability definition coincide with these thresholds; a concrete comparison or simulation argument between the two metrics is needed to establish that the thief's output remains useful.

    Authors: The parameters of the self-correctability definition were selected to match the game's distance and accuracy thresholds. We will augment the proof of the main theorem with an explicit comparison of the error bounds together with a simulation argument that shows the query distributions are compatible and that any model produced by the self-corrector remains useful under the judge's distribution. revision: yes

Circularity Check

0 steps flagged

No circularity: dichotomy reduces to external BLR self-correctability notion with no self-referential definitions or fitted quantities.

full rationale

The paper states a constructive if-and-only-if result under cryptographic assumptions, separating provable ownership from non-provable cases exactly when the concept class is not self-correctable 'in a sense close to' the external Blum-Luby-Rubinfeld STOC'90 definition. No equations, parameters, or self-citations appear in the provided text; the central claim is a reduction to an independent property-testing result rather than a self-definition or renaming of an input. The three-party game is presented as the modeling choice, not derived from the result itself. This matches the default expectation of a non-circular paper whose derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The result rests on standard cryptographic assumptions (mentioned explicitly) and on the external definition of self-correctability from Blum-Luby-Rubinfeld; no free parameters or new invented entities are introduced in the abstract.

axioms (2)
  • domain assumption standard cryptographic assumptions
    The dichotomy holds under these assumptions as stated in the abstract.
  • domain assumption self-correctability is defined in a sense close to Blum, Luby and Rubinfeld STOC'90
    The separation between provable and non-provable classes is stated relative to this prior notion.

pith-pipeline@v0.9.1-grok · 5746 in / 1369 out tokens · 29324 ms · 2026-06-30T06:47:30.173774+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    Turning your weak- ness into a strength: Watermarking deep neural networks by backdooring

    Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. Turning your weak- ness into a strength: Watermarking deep neural networks by backdooring. In27th USENIX security symposium (USENIX Security 18), pages 1615–1631, 2018. 5, 6

  2. [2]

    Differing-inputs ob- fuscation and applications

    Prabhanjan Ananth, Dan Boneh, Sanjam Garg, Amit Sahai, and Mark Zhandry. Differing-inputs ob- fuscation and applications. Cryptology ePrint Archive, Paper 2013/689, 2013. 18

  3. [3]

    How to go beyond the black-box simulation barrier

    Boaz Barak. How to go beyond the black-box simulation barrier. In42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, 14-17 October 2001, Las Vegas, Nevada, USA, pages 106–115, 2001. 5

  4. [4]

    On the (im)possibility of obfuscating programs

    Boaz Barak, Oded Goldreich, Russell Impagliazzo, Steven Rudich, Amit Sahai, Salil Vadhan, and Ke Yang. On the (im)possibility of obfuscating programs. InAdvances in Cryptology — CRYPTO ’01, volume 2139 ofLecture Notes in Computer Science, pages 1–18, 2001. 3, 6, 18

  5. [5]

    Self-testing/correcting with applications to nu- merical problems.J

    Manuel Blum, Michael Luby, and Ronitt Rubinfeld. Self-testing/correcting with applications to nu- merical problems.J. Comput. Syst. Sci., 47(3):549–595, 1993. 4, 7, 9

  6. [6]

    On extractability (a.k.a

    Elette Boyle, Kai-Min Chung, and Rafael Pass. On extractability (a.k.a. differing-inputs) obfuscation. Cryptology ePrint Archive, Paper 2013/650, 2013. 5, 18

  7. [7]

    NIZK from LPN and trapdoor hash via correla- tion intractability for approximable relations

    Zvika Brakerski, Venkata Koppula, and Tamer Mour. NIZK from LPN and trapdoor hash via correla- tion intractability for approximable relations. In Daniele Micciancio and Thomas Ristenpart, editors, Advances in Cryptology - CRYPTO 2020 - 40th Annual International Cryptology Conference, CRYPTO 2020, Santa Barbara, CA, USA, August 17-21, 2020, Proceedings, Pa...

  8. [8]

    Rothblum, Ron D

    Ran Canetti, Yilei Chen, Justin Holmgren, Alex Lombardi, Guy N. Rothblum, Ron D. Rothblum, and Daniel Wichs. Fiat-shamir: from practice to theory. InSTOC, pages 1082–1090, 2019. 5, 15

  9. [9]

    The random oracle methodology, revisited.J

    Ran Canetti, Oded Goldreich, and Shai Halevi. The random oracle methodology, revisited.J. ACM, 51(4):557–594, 2004. 5, 15

  10. [10]

    Undetectable watermarks for language models

    Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models. In Shipra Agrawal and Aaron Roth, editors,The Thirty Seventh Annual Conference on Learning Theory, June 30 - July 3, 2023, Edmonton, Canada, volume 247 ofProceedings of Machine Learning Research, pages 1125–1139. PMLR, 2024. 7

  11. [11]

    Watermark- ing cryptographic capabilities

    Aloni Cohen, Justin Holmgren, Ryo Nishimaki, Vinod Vaikuntanathan, and Daniel Wichs. Watermark- ing cryptographic capabilities. InProceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 1115–1127, 2016. 5, 6

  12. [12]

    A cryptographic perspective on mitigation vs

    Greg Gluch and Shafi Goldwasser. A cryptographic perspective on mitigation vs. detection in machine learning.arXiv preprint arXiv:2504.20310, 2025. 7

  13. [13]

    Kim, Vinod Vaikuntanathan, and Or Zamir

    Shafi Goldwasser, Michael P. Kim, Vinod Vaikuntanathan, and Or Zamir. Planting undetectable back- doors in machine learning models : [extended abstract]. In63rd IEEE Annual Symposium on Foun- dations of Computer Science, FOCS 2022, Denver, CO, USA, October 31 - November 3, 2022, pages 931–942. IEEE, 2022. 6

  14. [14]

    Oblivious defense in ml models: Backdoor removal without detection

    Shafi Goldwasser, Jonathan Shafer, Neekon Vafa, and Vinod Vaikuntanathan. Oblivious defense in ml models: Backdoor removal without detection. InProceedings of the 57th Annual ACM Symposium on Theory of Computing, pages 1785–1794, 2025. 4, 6, 7

  15. [15]

    Distilling the Knowledge in a Neural Network

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015. 5

  16. [16]

    Watermarking cryptographic functionalities from standard lattice assump- tions

    Sam Kim and David J Wu. Watermarking cryptographic functionalities from standard lattice assump- tions. InAnnual International Cryptology Conference, pages 503–536. Springer, 2017. 6

  17. [17]

    A wa- termark for large language models

    John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A wa- termark for large language models. InInternational Conference on Machine Learning, pages 17061– 17084. PMLR, 2023. 7

  18. [18]

    Model stealing for any low-rank language model

    Allen Liu and Ankur Moitra. Model stealing for any low-rank language model. InProceedings of the 57th Annual ACM Symposium on Theory of Computing, pages 1755–1761, 2025. 5

  19. [19]

    Noninteractive zero knowledge for NP from (plain) learning with errors

    Chris Peikert and Sina Shiehian. Noninteractive zero knowledge for NP from (plain) learning with errors. In Alexandra Boldyreva and Daniele Micciancio, editors,Advances in Cryptology - CRYPTO 2019 - 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18-22, 2019, Proceedings, Part I, Lecture Notes in Computer Science, pages 89–...

  20. [20]

    Stealing machine learning models via prediction{APIs}

    Florian Tramèr, Fan Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. Stealing machine learning models via prediction{APIs}. In25th USENIX security symposium (USENIX Security 16), pages 601–618, 2016. 5

  21. [21]

    The good, the bad and the ugly: watermarks, transferable attacks and adversarial defenses.arXiv preprint arXiv:2410.08864, 2024

    Berkant Turan, Sai Ganesh Nagarajan, Sebastian Pokutta, et al. The good, the bad and the ugly: watermarks, transferable attacks and adversarial defenses.arXiv preprint arXiv:2410.08864, 2024. 7 19

  22. [22]

    Embedding watermarks into deep neural networks

    Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, and Shin’ichi Satoh. Embedding watermarks into deep neural networks. InProceedings of the 2017 ACM on international conference on multimedia retrieval, pages 269–277, 2017. 6

  23. [23]

    A recipe for watermarking diffusion models.arXiv preprint arXiv:2303.10137,

    Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Ngai-Man Cheung, and Min Lin. A recipe for watermarking diffusion models.arXiv preprint arXiv:2303.10137, 2023. 7 20