Training data influence analysis and estimation: a survey.Machine Learning, 113(5):2351–2403, March 2024

Zayd Hammoudeh, Daniel Lowd · 2024 · DOI 10.1007/s10994-023-06495-7

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Constraint-Data-Value-Maximization: Utilizing Data Attribution for Effective Data Pruning in Low-Data Environments

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

CDVM formulates data pruning as maximizing total data influence while constraining excessive contributions to any single test point, yielding robust performance on the OpenDataVal benchmark in low-data regimes.

Prototype Language Models

cs.LG · 2026-07-01 · unverdicted · novelty 6.0

PRISM forms predictions as sparse mixtures of learned prototypes trained with clustering objectives, matching dense model accuracy while enabling ~500x faster data attribution and behavior editing without finetuning.

Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation

cs.LG · 2026-04-17 · unverdicted · novelty 6.0

RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence analysis on LLMs up to 32B parameters.

citing papers explorer

Showing 3 of 3 citing papers.

Constraint-Data-Value-Maximization: Utilizing Data Attribution for Effective Data Pruning in Low-Data Environments cs.AI · 2026-05-11 · unverdicted · none · ref 22
CDVM formulates data pruning as maximizing total data influence while constraining excessive contributions to any single test point, yielding robust performance on the OpenDataVal benchmark in low-data regimes.
Prototype Language Models cs.LG · 2026-07-01 · unverdicted · none · ref 156
PRISM forms predictions as sparse mixtures of learned prototypes trained with clustering objectives, matching dense model accuracy while enabling ~500x faster data attribution and behavior editing without finetuning.
Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation cs.LG · 2026-04-17 · unverdicted · none · ref 18
RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence analysis on LLMs up to 32B parameters.

Training data influence analysis and estimation: a survey.Machine Learning, 113(5):2351–2403, March 2024

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer