arXiv preprint arXiv:2501.14457 , year=

Zeping Yu, Sophia Ananiadou · 2025 · arXiv 2501.14457

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

GKnow: Measuring the Entanglement of Gender Bias and Factual Gender

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.

How Do LLMs and VLMs Understand Viewpoint Rotation Without Vision? An Interpretability Study

cs.AI · 2026-04-16 · unverdicted · novelty 7.0

LLMs and VLMs encode viewpoint information in hidden states but fail to bind it to corresponding observations, resulting in hallucinations in final layers on text-only viewpoint rotation tasks.

Neuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablation

cs.LG · 2026-05-04 · unverdicted · novelty 6.0 · 2 refs

MechaRule localizes sparse agonist neurons via contrastive hierarchical ablation and adaptive group testing to ground rule extraction, recalling 97% of high-effect activations at 2.14% cost while enabling near-total elimination of target behaviors.

citing papers explorer

Showing 3 of 3 citing papers.

GKnow: Measuring the Entanglement of Gender Bias and Factual Gender cs.CL · 2026-05-12 · unverdicted · none · ref 33
Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.
How Do LLMs and VLMs Understand Viewpoint Rotation Without Vision? An Interpretability Study cs.AI · 2026-04-16 · unverdicted · none · ref 5
LLMs and VLMs encode viewpoint information in hidden states but fail to bind it to corresponding observations, resulting in hallucinations in final layers on text-only viewpoint rotation tasks.
Neuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablation cs.LG · 2026-05-04 · unverdicted · none · ref 56 · 2 links
MechaRule localizes sparse agonist neurons via contrastive hierarchical ablation and adaptive group testing to ground rule extraction, recalling 97% of high-effect activations at 2.14% cost while enabling near-total elimination of target behaviors.

arXiv preprint arXiv:2501.14457 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer