Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.
ICML 2024 Workshop on Mechanistic Interpretability , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Language model circuits show high within-task consistency and necessity but substantial overlap across tasks, making them less specific than assumed.
citing papers explorer
-
GKnow: Measuring the Entanglement of Gender Bias and Factual Gender
Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.
-
How Much Do Circuits Tell Us? Measuring the Consistency and Specificity of Language Model Circuits
Language model circuits show high within-task consistency and necessity but substantial overlap across tasks, making them less specific than assumed.