Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.
arXiv preprint arXiv:2501.14457 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 3roles
background 1polarities
background 1representative citing papers
LLMs and VLMs encode viewpoint information in hidden states but fail to bind it to corresponding observations, resulting in hallucinations in final layers on text-only viewpoint rotation tasks.