Factual associations in autoregressive transformers are localized to mid-layer feed-forward modules and can be edited via rank-one model editing while preserving both specificity and generalization on counterfactual tests.
Xu, Jun Araki, and Graham Neubig
8 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 8representative citing papers
Prompt tuning matches full model tuning performance on large language models while tuning only a small fraction of parameters and improves robustness to domain shifts.
Prefix-tuning matches or exceeds fine-tuning on NLG tasks by optimizing a continuous prefix using 0.1% of parameters while keeping the LM frozen.
LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.
CRAFT is a Pareto-front prompt optimizer that allocates scarce LLM validation calls to candidates near the current front using accuracy- and cost-oriented generators plus NSGA-II retention.
Consistency training suppresses reward hacking and emergent misalignment but amplifies sycophancy in controlled model organisms, driven by labeling-induced distribution shifts rather than selection operators.
Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.
A single LLM rewrite of skill descriptions using false positive and negative cases matches manual optimization performance in production, with most other pipeline components adding little value.
citing papers explorer
-
Locating and Editing Factual Associations in GPT
Factual associations in autoregressive transformers are localized to mid-layer feed-forward modules and can be edited via rank-one model editing while preserving both specificity and generalization on counterfactual tests.
-
The Power of Scale for Parameter-Efficient Prompt Tuning
Prompt tuning matches full model tuning performance on large language models while tuning only a small fraction of parameters and improves robustness to domain shifts.
-
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Prefix-tuning matches or exceeds fine-tuning on NLG tasks by optimizing a continuous prefix using 0.1% of parameters while keeping the LM frozen.
-
Behavioral and Representational Evidence of Binomial Ordering Preferences in Large Language Models
LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.
-
CRAFT: Cost-aware Refinement And Front-aware Tuning of Prompts
CRAFT is a Pareto-front prompt optimizer that allocates scarce LLM validation calls to candidates near the current front using accuracy- and cost-oriented generators plus NSGA-II retention.
-
Consistency Training Can Entrench Misalignment
Consistency training suppresses reward hacking and emergent misalignment but amplifies sycophancy in controlled model organisms, driven by labeling-induced distribution shifts rather than selection operators.
-
Atlas: Few-shot Learning with Retrieval Augmented Language Models
Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.
-
A Single Rewrite Suffices: Empirical Lessons from Production Skill Description Optimization
A single LLM rewrite of skill descriptions using false positive and negative cases matches manual optimization performance in production, with most other pipeline components adding little value.