CDVM formulates data pruning as maximizing total data influence while constraining excessive contributions to any single test point, yielding robust performance on the OpenDataVal benchmark in low-data regimes.
Training data influence analysis and estimation: a survey.Machine Learning, 113(5):2351–2403, March 2024
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 2polarities
background 2representative citing papers
PRISM forms predictions as sparse mixtures of learned prototypes trained with clustering objectives, matching dense model accuracy while enabling ~500x faster data attribution and behavior editing without finetuning.
RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence analysis on LLMs up to 32B parameters.
citing papers explorer
-
Constraint-Data-Value-Maximization: Utilizing Data Attribution for Effective Data Pruning in Low-Data Environments
CDVM formulates data pruning as maximizing total data influence while constraining excessive contributions to any single test point, yielding robust performance on the OpenDataVal benchmark in low-data regimes.
-
Prototype Language Models
PRISM forms predictions as sparse mixtures of learned prototypes trained with clustering objectives, matching dense model accuracy while enabling ~500x faster data attribution and behavior editing without finetuning.
-
Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation
RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence analysis on LLMs up to 32B parameters.