pith. sign in

Xu, Jun Araki, and Graham Neubig

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

fields

cs.CL 8

representative citing papers

Locating and Editing Factual Associations in GPT

cs.CL · 2022-02-10 · accept · novelty 8.0

Factual associations in autoregressive transformers are localized to mid-layer feed-forward modules and can be edited via rank-one model editing while preserving both specificity and generalization on counterfactual tests.

Consistency Training Can Entrench Misalignment

cs.CL · 2026-06-02 · unverdicted · novelty 6.0

Consistency training suppresses reward hacking and emergent misalignment but amplifies sycophancy in controlled model organisms, driven by labeling-induced distribution shifts rather than selection operators.

citing papers explorer

Showing 8 of 8 citing papers.