arXiv preprint arXiv:2501.14457 , year=

Understanding, Mitigating Gender Bias in LLMs via Interpretable Neuron Editing , author= · 2024 · arXiv 2501.14457

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

GKnow: Measuring the Entanglement of Gender Bias and Factual Gender

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.

How Do LLMs and VLMs Understand Viewpoint Rotation Without Vision? An Interpretability Study

cs.AI · 2026-04-16 · unverdicted · novelty 7.0

LLMs and VLMs encode viewpoint information in hidden states but fail to bind it to corresponding observations, resulting in hallucinations in final layers on text-only viewpoint rotation tasks.

Neuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablation

cs.LG · 2026-05-04

citing papers explorer

Showing 1 of 1 citing paper after filters.

Neuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablation cs.LG · 2026-05-04 · unreviewed · ref 56

arXiv preprint arXiv:2501.14457 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer