Recognition: unknown
Position: No Retroactive Cure for Infringement during Training
Pith reviewed 2026-05-10 05:08 UTC · model grok-4.3
The pith
Post-hoc mitigation cannot retroactively cure liability for unlawful data acquisition and training in generative AI.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Unauthorized copying or ingestion during training constitutes a legally complete act, and model weights function as fixed copies that retain training-derived expressive value; therefore post-hoc methods such as unlearning cannot retroactively establish compliance, contract and tort rules can restrict use independently of copyright defenses, and remedies including unjust enrichment may require stripping gains or reaching the model.
What carries the argument
Model weights as fixed copies that retain training-derived expressive value, making the timing of compliance checks hinge on data lineage rather than later outputs.
If this is right
- Liability for infringement attaches at the moment of unauthorized ingestion and survives any subsequent filtering or unlearning.
- Contractual licenses and unfair-competition principles can limit use even when copyright defenses such as fair use would otherwise apply.
- Remedies for unjust enrichment may require disgorgement of gains traceable to protected inputs and, in some cases, restrictions on the model itself.
- Development practices must prioritize verifiable lawful data sourcing and process documentation over reliance on post-training fixes.
Where Pith is reading between the lines
- This position would encourage courts and regulators to treat the training phase as a distinct point of legal risk rather than focusing only on deployed outputs.
- It could accelerate requirements for auditable data provenance records in large-scale model development.
- The argument implies that mitigation efforts might still be useful for reducing ongoing harm but cannot serve as a complete defense to past acquisition violations.
Load-bearing premise
Model weights operate as fixed copies that retain training-derived expressive value, rendering later filtering beside the point for infringement.
What would settle it
A demonstration, whether technical or judicial, that complete removal of influence from specific unauthorized training data eliminates all legal liability attached to the original ingestion and training steps.
read the original abstract
As generative AI faces intensifying legal challenges, the machine learning community has increasingly relied on post-hoc mitigation -- especially machine unlearning and inference-time guardrails -- to argue for compliance. This paper argues that such post-hoc mitigation methods cannot retroactively cure liability from unlawful acquisition and training, because compliance hinges on data lineage, not the outputs. Our argument has three parts. First, unauthorized copying/ingestion can be a legally complete completed act, and model weights may operate as fixed copies that retain training-derived expressive value, making later filtering beside the point for infringement. Second, contract and tort/unfair-competition rules -- via licenses, terms of service, and anti-free-riding principles -- can independently restrict access and use, often bypassing copyright defenses (e.g., fair use or TDM exceptions). Third, since value from protected inputs can persist in weights, remedies such as unjust enrichment and disgorgement may require stripping gains and, in some cases, reaching the model itself. We therefore argue for a shift from Post-Hoc Sanitization to verifiable Ex-Ante Process Compliance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript argues that post-hoc mitigation methods such as machine unlearning and inference-time guardrails cannot retroactively cure legal liability arising from unlawful acquisition and training of data for generative AI models. Compliance hinges on data lineage rather than outputs. The three-part argument is: (1) unauthorized copying/ingestion is a completed act, with model weights operating as fixed copies that retain training-derived expressive value, rendering later filtering irrelevant to infringement; (2) contract and tort/unfair-competition rules (licenses, terms of service, anti-free-riding) can independently restrict use and often bypass copyright defenses such as fair use or TDM exceptions; (3) since value from protected inputs can persist in weights, remedies including unjust enrichment and disgorgement may require stripping gains and, in some cases, reaching the model itself. The paper advocates shifting from post-hoc sanitization to verifiable ex-ante process compliance.
Significance. If the interpretive premises hold, the position would have substantial implications for AI development by prioritizing lawful data sourcing and ex-ante compliance over technical fixes after training. It offers a clear normative framework linking technical processes to legal doctrines on copying, contracts, and remedies, which could inform policy and research priorities at the ML-law intersection. The paper's strength is its logical structure without internal circularity or unsupported factual premises, drawing on external legal principles. As a position paper without empirical data or formal derivations, its significance depends on acceptance of the doctrinal claims rather than technical novelty.
major comments (2)
- [First part of the argument] First part (unauthorized copying/ingestion): The claim that 'model weights may operate as fixed copies that retain training-derived expressive value' is load-bearing for the conclusion that post-hoc filtering is 'beside the point for infringement.' This interpretive premise about the nature of weights and persistence of expressive value requires additional grounding in technical literature on information encoding in neural networks or specific case precedents on derivative works and fixation in software, as contestability here directly affects the first pillar of the argument.
- [Third part of the argument] Third part (remedies): The discussion of unjust enrichment, disgorgement, and potential reach to the model itself is central to arguing that value persists beyond outputs. This would be strengthened by citing concrete examples or doctrines where similar remedies have been applied to trained models or intangible assets derived from protected inputs, to support the claim that post-hoc changes cannot cure the underlying liability.
minor comments (2)
- [Abstract] The abstract and introduction could clarify the jurisdictional scope (e.g., US copyright law focus) since doctrines like fair use and TDM exceptions vary significantly across jurisdictions, affecting the generality of the bypass claim.
- [Overall structure] Consider adding a brief table or structured summary comparing post-hoc methods (unlearning, guardrails) against the three legal pillars to improve readability for a mixed technical-legal audience.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for recognizing the paper's logical structure and potential implications for AI development and policy. We address each major comment below and will incorporate revisions to strengthen the manuscript as suggested.
read point-by-point responses
-
Referee: [First part of the argument] First part (unauthorized copying/ingestion): The claim that 'model weights may operate as fixed copies that retain training-derived expressive value' is load-bearing for the conclusion that post-hoc filtering is 'beside the point for infringement.' This interpretive premise about the nature of weights and persistence of expressive value requires additional grounding in technical literature on information encoding in neural networks or specific case precedents on derivative works and fixation in software, as contestability here directly affects the first pillar of the argument.
Authors: We agree that additional grounding would make the first pillar more robust. The manuscript, as a position paper, rests on copyright law's fixation doctrine, under which a work is fixed when embodied in a tangible medium permitting perception, reproduction, or communication. Model weights qualify as such a medium because they persistently encode and retain expressive elements from training data, enabling reproduction of similar outputs. To address the comment, we will revise the relevant section to cite technical literature on information encoding and retention in neural networks, including studies on memorization, data extraction attacks, and membership inference that demonstrate how training data influences and persists in weights. We will also reference legal precedents on derivative works and fixation as applied to software and databases. These additions will clarify the premise without altering the core argument that unauthorized ingestion is a completed act. revision: yes
-
Referee: [Third part of the argument] Third part (remedies): The discussion of unjust enrichment, disgorgement, and potential reach to the model itself is central to arguing that value persists beyond outputs. This would be strengthened by citing concrete examples or doctrines where similar remedies have been applied to trained models or intangible assets derived from protected inputs, to support the claim that post-hoc changes cannot cure the underlying liability.
Authors: We concur that concrete examples and doctrinal support would strengthen the remedies analysis. The paper invokes general principles of unjust enrichment and disgorgement, which require stripping benefits derived from unauthorized use of protected inputs, including in intellectual property contexts. To respond, we will revise the third part to include references to analogous doctrines and cases involving intangible assets, such as disgorgement of profits in trade secret misappropriation where derived products or knowledge are at issue, and copyright cases where remedies reach works incorporating protected expression. While direct precedents on trained generative models remain limited and emerging, these citations will better illustrate how value retention in weights can trigger remedies that post-hoc mitigation cannot retroactively eliminate. The revision will maintain the position paper's normative focus on ex-ante compliance. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper is a normative legal position paper with no equations, fitted parameters, technical derivations, or empirical predictions. Its three-part argument rests on external legal doctrines (copyright completion, contract/tort restrictions, and disgorgement remedies) and interpretive claims about model weights as fixed copies, all drawn from cited external principles rather than internal self-definitions or self-citation chains. No load-bearing step reduces by construction to the paper's own inputs; the central claim is self-contained against external legal benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Unauthorized copying or ingestion of protected material constitutes a completed legal act even if later outputs are filtered.
- domain assumption Contract and tort rules can restrict use independently of copyright defenses such as fair use.
Reference graph
Works this paper leans on
-
[1]
Integrated cash management services, inc. v. digital transac- tions, inc. 920 F.2d 171 (2d Cir. 1990),
1990
-
[2]
Sega enterprises ltd. v. accolade, inc. 977 F.2d 1510 (9th Cir. 1992),
1992
-
[3]
Mai systems corp. v. peak computer, inc. 991 F.2d 511 (9th Cir. 1993),
1993
-
[4]
Procd, inc. v. zeidenberg. 86 F.3d 1447 (7th Cir. 1996),
1996
-
[5]
Tsubasa system co. ltd. v. toppan printing co. ltd. 1780 Hanrei Jiho 25 (Tokyo Dist. Ct. 2001),
2001
-
[6]
google, inc
Authors guild v. google, inc. 804 F.3d 202 (2d Cir. 2015),
2015
-
[7]
Case C-30/14 (CJEU 2015),
Ryanair ltd v pr aviation bv. Case C-30/14 (CJEU 2015),
2015
-
[8]
redigi inc
Capitol records, llc v. redigi inc. 910 F.3d 649 (2d Cir. 2018),
2018
-
[9]
Case C-762/19 (CJEU 2021),
Cv-online latvia v melons. Case C-762/19 (CJEU 2021),
2021
-
[10]
Getty images (us), inc. v. stability ai, ltd. No. 1:23-cv-00135 (D. Del. filed Feb. 3, 2023),
2023
-
[11]
The new york times co. v. microsoft corp. et al. No. 1:23- cv-11195 (S.D.N.Y . filed Dec. 27, 2023),
2023
-
[12]
Thomson reuters enter. ctr. gmbh v. ross intelligence inc. No. 1:20-cv-00613 (D. Del. Sept. 25, 2023),
2023
-
[13]
Order on fair use, bartz et al. v. anthropic pbc. No. C 24- 05417 WHA (N.D. Cal. June 23, 2025),
2025
-
[14]
Constitutional AI: Harmlessness from AI Feedback
Bai, Y ., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., et al. Constitutional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073,
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
On the Opportunities and Risks of Foundation Models
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosse- lut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J. Q., Demszky, D., Donahue, C., Doumbouya, M., Durmus, E., Ermon, S., Etchemendy, J., Ethayarajh, K., Fei-Fei, L., Finn, C.,...
work page internal anchor Pith review arXiv
-
[16]
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-V oss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A....
1901
-
[17]
Who’s harry potter? approximate unlearning in llms, 2023.URL https://arxiv
Eldan, R. and Russinovich, M. Who’s harry pot- ter? approximate unlearning in llms.arXiv preprint arXiv:2310.02238,
-
[18]
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Inan, H., Upasani, K., Chi, J., Rungta, R., Iyer, K., Mao, Y ., Tontchev, M., Hu, Q., Fuller, B., Testug- gine, D., et al. Llama guard: Llm-based input-output safeguard for human-ai conversations.arXiv preprint arXiv:2312.06674,
work page internal anchor Pith review arXiv
- [19]
-
[20]
However, at the scale of large language models, this approach is often computationally prohibitive
can provide strong guarantees by retraining only the affected sub-models. However, at the scale of large language models, this approach is often computationally prohibitive. • Approximate unlearning:As a result, most practical work focuses on approximations, such as targetedgradient ascent updates that attempt to undo the learning signal by reducing the l...
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.