Recognition: unknown
Barriers to Counterfactual Credit Attribution for Autoregressive Models
Pith reviewed 2026-05-09 15:09 UTC · model grok-4.3
The pith
Counterfactual credit attribution fails to compose under autoregressive generation and requires exponential queries to retrofit.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Imposing CCA on the underlying next-token predictor does not guarantee that the model is CCA: CCA does not compose autoregressively (unlike DP). We prove a lower bound for CCA retrofitting under a weak optimality requirement. Given black-box access to the starting model, retrofitting requires query complexity exponential in the length of the model's outputs.
What carries the argument
Counterfactual credit attribution (CCA), a relaxation of differential privacy that requires a model to attribute credit to any training or deployment data on which its output depends in a significant way.
Load-bearing premise
The lower bound on retrofitting assumes only a weak optimality requirement on the retrofitted model.
What would settle it
A black-box retrofitting procedure that produces a CCA-compliant model using only polynomially many queries in output length, while meeting the weak optimality condition, would falsify the lower bound.
Figures
read the original abstract
Generative AI disrupts the practice of giving credit to work that came before. Ideally, a generative model would give credit to any work on which its output depends in a significant way. \emph{Counterfactual credit attribution} (CCA) is a technical condition formalizing this goal--a relaxation of differential privacy--recently introduced by Livni, Moran, Nissim, and Pabbaraju [2024] who studied it in the PAC learning setting. We initiate the study of CCA generative models. Specifically, we consider autoregressive models giving credit to a deployment-time dataset (e.g., a RAG database). We uncover barriers to two natural approaches to CCA autoregressive models. First, we show that imposing CCA on the underlying next-token predictor does not guarantee that the model is CCA: CCA does not compose autoregressively (unlike DP). Second, we consider a different approach to building CCA models which we call \emph{retrofitting}. Retrofitting takes a model that does not attribute credit, and adds credit onto it. We prove a lower bound for CCA retrofitting under a weak optimality requirement. Given black-box access to the starting model, retrofitting requires query complexity exponential in the length of the model's outputs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper initiates the study of counterfactual credit attribution (CCA) for autoregressive generative models that must attribute credit to a deployment-time dataset. It establishes two barriers: (1) CCA imposed on the next-token predictor does not imply CCA for the full autoregressive model, because CCA fails to compose under autoregressive chaining (unlike differential privacy); (2) retrofitting a non-CCA model to satisfy CCA, under a weak optimality requirement on the retrofitted model, requires query complexity exponential in output length given only black-box access to the original model.
Significance. If the results hold, they are significant for theoretical machine learning and trustworthy generative AI. The non-composition result cleanly separates CCA from DP and rules out a natural compositional construction. The exponential lower bound under the stated weak optimality condition demonstrates that black-box retrofitting is impractical for realistic output lengths, which should steer future work toward alternative model designs or relaxed credit notions. The formal negative results are a useful contribution to the emerging CCA literature.
major comments (2)
- [§3] §3: The non-composition counterexample must be checked against the precise CCA definition for full sequences; the construction should explicitly verify that the next-token CCA condition holds while the chained model violates the sequence-level CCA condition.
- [§4] §4, lower-bound theorem: The weak optimality requirement is load-bearing for the exponential query complexity; the proof should state the requirement formally (including any dependence on output length or dataset size) and confirm that the bound is information-theoretic rather than an artifact of the black-box query model.
minor comments (2)
- [Abstract, §1] Abstract and §1: The citation to Livni et al. (2024) would benefit from a one-sentence summary of their PAC-learning CCA results to orient readers unfamiliar with the prior work.
- [Preliminaries] Notation: Ensure that the symbols for the next-token predictor, the full autoregressive model, and the deployment-time dataset are introduced once and used consistently across sections.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and positive recommendation. The comments help clarify the presentation of our negative results. We address each major comment below and will incorporate the requested clarifications via minor revisions.
read point-by-point responses
-
Referee: [§3] §3: The non-composition counterexample must be checked against the precise CCA definition for full sequences; the construction should explicitly verify that the next-token CCA condition holds while the chained model violates the sequence-level CCA condition.
Authors: We agree that an explicit verification strengthens the counterexample. In the revised manuscript we will expand §3 to restate the precise CCA definitions for both next-token and full-sequence settings, then walk through the construction to confirm that the next-token predictor satisfies CCA while the autoregressive chaining violates the sequence-level condition. This will reference the definitions from Livni et al. directly and include the necessary calculations for the specific example. revision: yes
-
Referee: [§4] §4, lower-bound theorem: The weak optimality requirement is load-bearing for the exponential query complexity; the proof should state the requirement formally (including any dependence on output length or dataset size) and confirm that the bound is information-theoretic rather than an artifact of the black-box query model.
Authors: We thank the referee for this observation. The weak optimality condition is indeed essential. In the revision we will formally include the weak optimality requirement in the theorem statement, explicitly noting its dependence on output length n and deployment dataset size. We will also add a paragraph in the proof clarifying that the exponential lower bound is information-theoretic, derived from the number of queries needed to distinguish credit-attributing behaviors, and is not an artifact of the black-box access model. revision: yes
Circularity Check
No significant circularity
full rationale
The paper derives two negative results directly from the external definition of CCA (counterfactual credit attribution) introduced in the cited Livni et al. 2024 work: (1) that CCA on the next-token predictor does not imply CCA for the full autoregressive model, shown via explicit counterexample or direct expansion of the definition under autoregressive chaining, and (2) an exponential-query lower bound for black-box retrofitting under an explicitly stated weak optimality condition. Neither result reduces to a fitted parameter, self-referential equation, or load-bearing self-citation; the cited definition is used only as an external starting point, and all subsequent steps are standard proof techniques (non-composition via counterexample, query-complexity lower bound via information or adversary argument) that remain independent of the paper's own outputs. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption CCA definition from Livni, Moran, Nissim, and Pabbaraju 2024
- ad hoc to paper Weak optimality requirement for the retrofitted model
Reference graph
Works this paper leans on
-
[1]
Advances in Neural Information Processing Systems , volume=
Credit Attribution and Stable Compression , author=. Advances in Neural Information Processing Systems , volume=
-
[2]
International conference on machine learning , pages=
The composition theorem for differential privacy , author=. International conference on machine learning , pages=. 2015 , organization=
2015
-
[3]
International conference on machine learning , pages=
On provable copyright protection for generative models , author=. International conference on machine learning , pages=. 2023 , organization=
2023
-
[4]
Conference On Learning Theory , pages=
Privacy-preserving prediction , author=. Conference On Learning Theory , pages=. 2018 , organization=
2018
-
[5]
arXiv preprint arXiv:2205.13621 , year=
Differentially private decoding in large language models , author=. arXiv preprint arXiv:2205.13621 , year=
-
[6]
arXiv preprint arXiv:2407.12108 , year=
Private prediction for large-scale synthetic text generation , author=. arXiv preprint arXiv:2407.12108 , year=
-
[7]
arXiv preprint arXiv:2507.04480 , year=
Source attribution in retrieval-augmented generation , author=. arXiv preprint arXiv:2507.04480 , year=
-
[8]
Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , pages=
Deep learning with differential privacy , author=. Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , pages=
2016
-
[9]
18th Annual Symposium on Foundations of Computer Science (sfcs 1977) , pages=
Probabilistic computations: Toward a unified measure of complexity , author=. 18th Annual Symposium on Foundations of Computer Science (sfcs 1977) , pages=. 1977 , organization=
1977
-
[10]
arXiv preprint arXiv:1603.01887 , year=
Concentrated differential privacy , author=. arXiv preprint arXiv:1603.01887 , year=
-
[11]
Theory of cryptography conference , pages=
Concentrated differential privacy: Simplifications, extensions, and lower bounds , author=. Theory of cryptography conference , pages=. 2016 , organization=
2016
-
[12]
Mironov, Ilya , booktitle=. R. 2017 , organization=
2017
-
[13]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Gaussian differential privacy , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2022 , publisher=
2022
-
[14]
Advances in neural information processing systems , volume=
A unified approach to interpreting model predictions , author=. Advances in neural information processing systems , volume=
-
[15]
Advances in Neural Information Processing Systems , volume=
Differentially private contextual linear bandits , author=. Advances in Neural Information Processing Systems , volume=
-
[16]
Advances in neural information processing systems , volume=
Model-agnostic private learning , author=. Advances in neural information processing systems , volume=
-
[17]
International Conference on Machine Learning , pages=
Private reinforcement learning with pac and regret guarantees , author=. International Conference on Machine Learning , pages=. 2020 , organization=
2020
-
[18]
Advances in Neural Information Processing Systems , volume=
Private everlasting prediction , author=. Advances in Neural Information Processing Systems , volume=
-
[19]
Advances in Neural Information Processing Systems , volume=
Black-box differential privacy for interactive ml , author=. Advances in Neural Information Processing Systems , volume=
-
[20]
The Twelfth International Conference on Learning Representations , year=
Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation , author=. The Twelfth International Conference on Learning Representations , year=
-
[21]
5th Symposium on Foundations of Responsible Computing , year=
Can Copyright Be Reduced to Privacy? , author=. 5th Symposium on Foundations of Responsible Computing , year=
-
[22]
ICLR 2025 Workshop on Building Trust in Language Models and Applications , year=
Private Retrieval Augmented Generation with Random Projection , author=. ICLR 2025 Workshop on Building Trust in Language Models and Applications , year=
2025
-
[23]
2025 IEEE Conference on Artificial Intelligence (CAI) , pages=
Rag with differential privacy , author=. 2025 IEEE Conference on Artificial Intelligence (CAI) , pages=. 2025 , organization=
2025
-
[24]
International conference on machine learning , pages=
Data shapley: Equitable valuation of data for machine learning , author=. International conference on machine learning , pages=. 2019 , organization=
2019
-
[25]
A Survey of Data Attribution: Methods, Applications, and Evaluation in the Era of Generative AI , author=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.