Activation Addition steers language models by adding contrastive activation vectors from prompt pairs to control high-level properties like sentiment and toxicity at inference time without training.
More than a Feeling: Accuracy and Application of Sentiment Analysis
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
LLMs produce lower-fidelity summaries of identical public comments when attributed to lower-status occupations like street vendors versus financial analysts, with inconsistent race effects and no gender effects.
citing papers explorer
-
Steering Language Models With Activation Engineering
Activation Addition steers language models by adding contrastive activation vectors from prompt pairs to control high-level properties like sentiment and toxicity at inference time without training.
-
All Public Voices Are Equal, But Are Some More Equal Than Others to LLMs?
LLMs produce lower-fidelity summaries of identical public comments when attributed to lower-status occupations like street vendors versus financial analysts, with inconsistent race effects and no gender effects.