Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.
arXiv preprint arXiv:2501.14457 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
MechaRule localizes agonist neurons in LLMs via contrastive hierarchical ablation to ground rule extraction in circuitry, recalling 96.8% of high-effect neurons and reducing task performance when suppressed.
LLMs and VLMs encode viewpoint information in hidden states but fail to bind it to corresponding observations, resulting in hallucinations in final layers on text-only viewpoint rotation tasks.
citing papers explorer
-
GKnow: Measuring the Entanglement of Gender Bias and Factual Gender
Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.
-
Neuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablation
MechaRule localizes agonist neurons in LLMs via contrastive hierarchical ablation to ground rule extraction in circuitry, recalling 96.8% of high-effect neurons and reducing task performance when suppressed.
-
How Do LLMs and VLMs Understand Viewpoint Rotation Without Vision? An Interpretability Study
LLMs and VLMs encode viewpoint information in hidden states but fail to bind it to corresponding observations, resulting in hallucinations in final layers on text-only viewpoint rotation tasks.