arxiv: 2605.01425 · v1 · submitted 2026-05-02 · 💻 cs.LG

Recognition: unknown

Barriers to Counterfactual Credit Attribution for Autoregressive Models

Aloni Cohen , Chenhao Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-09 15:09 UTC · model grok-4.3

classification 💻 cs.LG

keywords counterfactual credit attributionautoregressive modelsdifferential privacygenerative modelsretrofittingquery complexitycredit attributionRAG

0 comments

The pith

Counterfactual credit attribution fails to compose under autoregressive generation and requires exponential queries to retrofit.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies how autoregressive generative models can satisfy counterfactual credit attribution (CCA) when drawing from a deployment-time dataset such as a retrieval database. It demonstrates that requiring the next-token predictor to obey CCA does not make the full autoregressive model obey CCA, because the property does not survive chaining of predictions the way differential privacy does. The authors also establish a lower bound showing that retrofitting an existing model to achieve CCA, even under a weak optimality condition, demands a number of black-box queries that grows exponentially with output length.

Core claim

Imposing CCA on the underlying next-token predictor does not guarantee that the model is CCA: CCA does not compose autoregressively (unlike DP). We prove a lower bound for CCA retrofitting under a weak optimality requirement. Given black-box access to the starting model, retrofitting requires query complexity exponential in the length of the model's outputs.

What carries the argument

Counterfactual credit attribution (CCA), a relaxation of differential privacy that requires a model to attribute credit to any training or deployment data on which its output depends in a significant way.

Load-bearing premise

The lower bound on retrofitting assumes only a weak optimality requirement on the retrofitted model.

What would settle it

A black-box retrofitting procedure that produces a CCA-compliant model using only polynomially many queries in output length, while meeting the weak optimality condition, would falsify the lower bound.

Figures

Figures reproduced from arXiv: 2605.01425 by Aloni Cohen, Chenhao Zhang.

**Figure 1.** Figure 1: Example pair of generation trees with ℓ = 2 and z = 10. All edges have probability 1/2 except the dashed blue edges which have probability 1/2 ± (1 − e −ε (1 − γ))/2. The proof is deferred to Appendix B.1. To obtain this characterization, we show that any optimal Ge∗ z can be represented as a solution to a linear program and then characterize the optimal solutions of the LP. Remark 5.6.1 (Crediting documen… view at source ↗

read the original abstract

Generative AI disrupts the practice of giving credit to work that came before. Ideally, a generative model would give credit to any work on which its output depends in a significant way. \emph{Counterfactual credit attribution} (CCA) is a technical condition formalizing this goal--a relaxation of differential privacy--recently introduced by Livni, Moran, Nissim, and Pabbaraju [2024] who studied it in the PAC learning setting. We initiate the study of CCA generative models. Specifically, we consider autoregressive models giving credit to a deployment-time dataset (e.g., a RAG database). We uncover barriers to two natural approaches to CCA autoregressive models. First, we show that imposing CCA on the underlying next-token predictor does not guarantee that the model is CCA: CCA does not compose autoregressively (unlike DP). Second, we consider a different approach to building CCA models which we call \emph{retrofitting}. Retrofitting takes a model that does not attribute credit, and adds credit onto it. We prove a lower bound for CCA retrofitting under a weak optimality requirement. Given black-box access to the starting model, retrofitting requires query complexity exponential in the length of the model's outputs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper initiates the study of counterfactual credit attribution (CCA) for autoregressive generative models that must attribute credit to a deployment-time dataset. It establishes two barriers: (1) CCA imposed on the next-token predictor does not imply CCA for the full autoregressive model, because CCA fails to compose under autoregressive chaining (unlike differential privacy); (2) retrofitting a non-CCA model to satisfy CCA, under a weak optimality requirement on the retrofitted model, requires query complexity exponential in output length given only black-box access to the original model.

Significance. If the results hold, they are significant for theoretical machine learning and trustworthy generative AI. The non-composition result cleanly separates CCA from DP and rules out a natural compositional construction. The exponential lower bound under the stated weak optimality condition demonstrates that black-box retrofitting is impractical for realistic output lengths, which should steer future work toward alternative model designs or relaxed credit notions. The formal negative results are a useful contribution to the emerging CCA literature.

major comments (2)

[§3] §3: The non-composition counterexample must be checked against the precise CCA definition for full sequences; the construction should explicitly verify that the next-token CCA condition holds while the chained model violates the sequence-level CCA condition.
[§4] §4, lower-bound theorem: The weak optimality requirement is load-bearing for the exponential query complexity; the proof should state the requirement formally (including any dependence on output length or dataset size) and confirm that the bound is information-theoretic rather than an artifact of the black-box query model.

minor comments (2)

[Abstract, §1] Abstract and §1: The citation to Livni et al. (2024) would benefit from a one-sentence summary of their PAC-learning CCA results to orient readers unfamiliar with the prior work.
[Preliminaries] Notation: Ensure that the symbols for the next-token predictor, the full autoregressive model, and the deployment-time dataset are introduced once and used consistently across sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and positive recommendation. The comments help clarify the presentation of our negative results. We address each major comment below and will incorporate the requested clarifications via minor revisions.

read point-by-point responses

Referee: [§3] §3: The non-composition counterexample must be checked against the precise CCA definition for full sequences; the construction should explicitly verify that the next-token CCA condition holds while the chained model violates the sequence-level CCA condition.

Authors: We agree that an explicit verification strengthens the counterexample. In the revised manuscript we will expand §3 to restate the precise CCA definitions for both next-token and full-sequence settings, then walk through the construction to confirm that the next-token predictor satisfies CCA while the autoregressive chaining violates the sequence-level condition. This will reference the definitions from Livni et al. directly and include the necessary calculations for the specific example. revision: yes
Referee: [§4] §4, lower-bound theorem: The weak optimality requirement is load-bearing for the exponential query complexity; the proof should state the requirement formally (including any dependence on output length or dataset size) and confirm that the bound is information-theoretic rather than an artifact of the black-box query model.

Authors: We thank the referee for this observation. The weak optimality condition is indeed essential. In the revision we will formally include the weak optimality requirement in the theorem statement, explicitly noting its dependence on output length n and deployment dataset size. We will also add a paragraph in the proof clarifying that the exponential lower bound is information-theoretic, derived from the number of queries needed to distinguish credit-attributing behaviors, and is not an artifact of the black-box access model. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper derives two negative results directly from the external definition of CCA (counterfactual credit attribution) introduced in the cited Livni et al. 2024 work: (1) that CCA on the next-token predictor does not imply CCA for the full autoregressive model, shown via explicit counterexample or direct expansion of the definition under autoregressive chaining, and (2) an exponential-query lower bound for black-box retrofitting under an explicitly stated weak optimality condition. Neither result reduces to a fitted parameter, self-referential equation, or load-bearing self-citation; the cited definition is used only as an external starting point, and all subsequent steps are standard proof techniques (non-composition via counterexample, query-complexity lower bound via information or adversary argument) that remain independent of the paper's own outputs. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work relies on the CCA definition introduced in prior work and introduces the retrofitting approach together with a weak optimality assumption needed for the lower bound.

axioms (2)

domain assumption CCA definition from Livni, Moran, Nissim, and Pabbaraju 2024
Central technical condition is taken from the cited PAC-learning paper.
ad hoc to paper Weak optimality requirement for the retrofitted model
Invoked to obtain the exponential query lower bound.

pith-pipeline@v0.9.0 · 5515 in / 1242 out tokens · 37830 ms · 2026-05-09T15:09:48.289141+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 4 canonical work pages

[1]

Advances in Neural Information Processing Systems , volume=

Credit Attribution and Stable Compression , author=. Advances in Neural Information Processing Systems , volume=
[2]

International conference on machine learning , pages=

The composition theorem for differential privacy , author=. International conference on machine learning , pages=. 2015 , organization=

2015
[3]

International conference on machine learning , pages=

On provable copyright protection for generative models , author=. International conference on machine learning , pages=. 2023 , organization=

2023
[4]

Conference On Learning Theory , pages=

Privacy-preserving prediction , author=. Conference On Learning Theory , pages=. 2018 , organization=

2018
[5]

arXiv preprint arXiv:2205.13621 , year=

Differentially private decoding in large language models , author=. arXiv preprint arXiv:2205.13621 , year=

work page arXiv
[6]

arXiv preprint arXiv:2407.12108 , year=

Private prediction for large-scale synthetic text generation , author=. arXiv preprint arXiv:2407.12108 , year=

work page arXiv
[7]

arXiv preprint arXiv:2507.04480 , year=

Source attribution in retrieval-augmented generation , author=. arXiv preprint arXiv:2507.04480 , year=

work page arXiv
[8]

Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , pages=

Deep learning with differential privacy , author=. Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , pages=

2016
[9]

18th Annual Symposium on Foundations of Computer Science (sfcs 1977) , pages=

Probabilistic computations: Toward a unified measure of complexity , author=. 18th Annual Symposium on Foundations of Computer Science (sfcs 1977) , pages=. 1977 , organization=

1977
[10]

arXiv preprint arXiv:1603.01887 , year=

Concentrated differential privacy , author=. arXiv preprint arXiv:1603.01887 , year=

work page arXiv
[11]

Theory of cryptography conference , pages=

Concentrated differential privacy: Simplifications, extensions, and lower bounds , author=. Theory of cryptography conference , pages=. 2016 , organization=

2016
[12]

Mironov, Ilya , booktitle=. R. 2017 , organization=

2017
[13]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Gaussian differential privacy , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2022 , publisher=

2022
[14]

Advances in neural information processing systems , volume=

A unified approach to interpreting model predictions , author=. Advances in neural information processing systems , volume=
[15]

Advances in Neural Information Processing Systems , volume=

Differentially private contextual linear bandits , author=. Advances in Neural Information Processing Systems , volume=
[16]

Advances in neural information processing systems , volume=

Model-agnostic private learning , author=. Advances in neural information processing systems , volume=
[17]

International Conference on Machine Learning , pages=

Private reinforcement learning with pac and regret guarantees , author=. International Conference on Machine Learning , pages=. 2020 , organization=

2020
[18]

Advances in Neural Information Processing Systems , volume=

Private everlasting prediction , author=. Advances in Neural Information Processing Systems , volume=
[19]

Advances in Neural Information Processing Systems , volume=

Black-box differential privacy for interactive ml , author=. Advances in Neural Information Processing Systems , volume=
[20]

The Twelfth International Conference on Learning Representations , year=

Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation , author=. The Twelfth International Conference on Learning Representations , year=
[21]

5th Symposium on Foundations of Responsible Computing , year=

Can Copyright Be Reduced to Privacy? , author=. 5th Symposium on Foundations of Responsible Computing , year=
[22]

ICLR 2025 Workshop on Building Trust in Language Models and Applications , year=

Private Retrieval Augmented Generation with Random Projection , author=. ICLR 2025 Workshop on Building Trust in Language Models and Applications , year=

2025
[23]

2025 IEEE Conference on Artificial Intelligence (CAI) , pages=

Rag with differential privacy , author=. 2025 IEEE Conference on Artificial Intelligence (CAI) , pages=. 2025 , organization=

2025
[24]

International conference on machine learning , pages=

Data shapley: Equitable valuation of data for machine learning , author=. International conference on machine learning , pages=. 2019 , organization=

2019
[25]

A Survey of Data Attribution: Methods, Applications, and Evaluation in the Era of Generative AI , author=