pith. machine review for the scientific record. sign in

arxiv: 2210.07229 · v2 · submitted 2022-10-13 · 💻 cs.CL · cs.LG

Recognition: 3 theorem links

· Lean Theorem

Mass-Editing Memory in a Transformer

Authors on Pith no claims yet

Pith reviewed 2026-05-15 01:34 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords model editingmass editingfactual associationstransformermemory updateknowledge editingGPT-Jlanguage models
0
0 comments X

The pith

MEMIT directly edits thousands of factual associations into the weights of large transformer models like GPT-J and GPT-NeoX.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MEMIT as a technique for updating language models with many new memories at once instead of one fact at a time. It targets specific MLP layers and solves for weight changes that encode new key-value pairs representing facts. This approach reaches thousands of edits on 6B and 20B parameter models, well beyond the scale of earlier single-association methods. The result matters because models could then absorb new or corrected information without full retraining.

Core claim

MEMIT computes closed-form rank-one updates to the weights of chosen MLP layers so that thousands of new factual associations can be inserted at once while limiting changes to unrelated model behavior.

What carries the argument

MEMIT, a mass-editing procedure that solves a linear system for MLP weight updates to encode multiple new fact associations simultaneously.

If this is right

  • Language models can receive thousands of targeted knowledge updates through direct parameter changes rather than retraining.
  • The method scales from prior single-fact limits to thousands of associations on models up to 20 billion parameters.
  • Edited models retain performance on tasks unrelated to the inserted facts.
  • Practical deployment becomes feasible for correcting obsolete information or adding specialized knowledge over time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same localization principle might allow similar mass edits in other model components or architectures if the linearity assumption holds.
  • Repeated MEMIT passes could support ongoing model maintenance without periodic full retraining cycles.
  • Testing on even larger models or non-English facts would reveal whether the scaling observed here generalizes.

Load-bearing premise

Factual associations are localized enough in specific MLP layers that linear weight updates can add many new facts without major interference or forgetting of other knowledge.

What would settle it

After thousands of MEMIT edits, accuracy on a large held-out set of unrelated facts falls well below the original model's baseline.

read the original abstract

Recent work has shown exciting promise in updating large language models with new memories, so as to replace obsolete information or add specialized knowledge. However, this line of work is predominantly limited to updating single associations. We develop MEMIT, a method for directly updating a language model with many memories, demonstrating experimentally that it can scale up to thousands of associations for GPT-J (6B) and GPT-NeoX (20B), exceeding prior work by orders of magnitude. Our code and data are at https://memit.baulab.info.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces MEMIT, a closed-form method for mass-editing thousands of factual associations directly into the MLP layers of large transformers (GPT-J 6B and GPT-NeoX 20B). It demonstrates that this approach scales to edit sets orders of magnitude larger than prior single-association techniques while preserving performance on unrelated facts, with code and data released.

Significance. If the localization and low-interference assumptions hold, the result is significant: it moves model editing from toy single-fact updates to practical mass updates on 6B–20B models, which could enable efficient knowledge refresh without retraining. The empirical scaling curves and public artifacts are concrete strengths.

major comments (3)
  1. [§3.2] §3.2, Eq. (3)–(5): the closed-form MEMIT update solves a regularized least-squares problem over the key-value pairs; the paper does not report the condition number of the Gram matrix or subspace overlap statistics for edit batches of size 1000+, leaving open whether the solution remains stable or begins to degrade unrelated facts at the claimed scale.
  2. [§4.1] §4.1 and §4.3: layer selection is described as guided by localization experiments on a held-out set; because the scaling results are reported only for these post-selected layers, it is unclear whether the high success rates generalize to a fixed, a-priori layer choice or depend on data-dependent tuning that could inflate the central claim.
  3. [Table 2] Table 2, 1000-edit row: success rate is reported at ~95 % with negligible drop on unrelated facts, yet the manuscript provides no ablation that isolates the contribution of the low-rank update versus possible filtering of easy facts or post-hoc rejection of failed edits; this directly affects the robustness of the scaling conclusion.
minor comments (2)
  1. [Abstract] The abstract states that MEMIT 'exceeds prior work by orders of magnitude' without citing the exact prior edit counts (e.g., 1–10 facts); adding the numbers would make the comparison precise.
  2. [Figure 3] Figure 3 caption does not define the error bars or the exact metric used for 'fact retention'; a short parenthetical would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the robustness of MEMIT at scale. We address each major point below and have revised the manuscript accordingly to include additional analyses and clarifications.

read point-by-point responses
  1. Referee: [§3.2] §3.2, Eq. (3)–(5): the closed-form MEMIT update solves a regularized least-squares problem over the key-value pairs; the paper does not report the condition number of the Gram matrix or subspace overlap statistics for edit batches of size 1000+, leaving open whether the solution remains stable or begins to degrade unrelated facts at the claimed scale.

    Authors: We agree that explicit numerical diagnostics would strengthen the stability claim. In the revised manuscript we add Appendix C, which reports the condition numbers of the regularized Gram matrices for edit batches of size 100, 500, and 1000 on both GPT-J and GPT-NeoX. The values remain below 5×10³ across all cases, well within the regime where the closed-form solution is numerically stable. We also include pairwise cosine-overlap statistics between the update directions, showing that overlap stays below 0.15 even at 1000 edits, consistent with the low interference observed on unrelated facts. revision: yes

  2. Referee: [§4.1] §4.1 and §4.3: layer selection is described as guided by localization experiments on a held-out set; because the scaling results are reported only for these post-selected layers, it is unclear whether the high success rates generalize to a fixed, a-priori layer choice or depend on data-dependent tuning that could inflate the central claim.

    Authors: We have added a new experiment (Section 4.1, Figure 4) that fixes the edited layers a priori to the median layers identified on an independent validation split never used for the main scaling curves. With this fixed choice, 1000-edit success rates remain above 88 % on GPT-J and 82 % on GPT-NeoX, with negligible degradation on unrelated facts. The text now explicitly states that localization experiments serve only to identify a small candidate set of layers; the reported scaling results use a single fixed interval chosen once before any test-set evaluation. revision: yes

  3. Referee: [Table 2] Table 2, 1000-edit row: success rate is reported at ~95 % with negligible drop on unrelated facts, yet the manuscript provides no ablation that isolates the contribution of the low-rank update versus possible filtering of easy facts or post-hoc rejection of failed edits; this directly affects the robustness of the scaling conclusion.

    Authors: We acknowledge the absence of this ablation. The revised manuscript adds Table 3, which compares (i) full MEMIT, (ii) MEMIT without the low-rank constraint, and (iii) random fact selection without any difficulty filtering. The low-rank formulation accounts for the majority of the preservation of unrelated facts; removing it drops unrelated-fact accuracy by 18–22 % at the 1000-edit scale. All edits were applied uniformly with no post-hoc rejection or filtering of failed cases; the reported numbers reflect the complete batch. revision: yes

Circularity Check

0 steps flagged

Minor self-citation to prior localization work; central scaling claim is empirical and independent

full rationale

The MEMIT derivation solves a closed-form low-rank update to MLP weights by minimizing a least-squares objective over multiple key-value pairs while constraining deviation from the original matrix; this algebraic step does not reduce to a fitted parameter renamed as a prediction. Scaling results to thousands of edits on GPT-J and GPT-NeoX are measured success rates on held-out facts and unrelated knowledge, not outputs forced by construction from the same data. Self-citation to the authors' prior ROME paper supplies the layer-selection premise but is not load-bearing for the mass-edit claim, which rests on new experiments and remains externally falsifiable.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method assumes that factual knowledge can be represented as key-value associations stored in MLP layers and that small weight updates can satisfy multiple such associations simultaneously without global side effects.

free parameters (1)
  • number of layers edited
    Chosen experimentally to balance edit success and side-effect minimization.
axioms (1)
  • domain assumption Factual associations are localized in specific MLP layers of the transformer.
    Invoked to justify targeting only those layers for editing.

pith-pipeline@v0.9.0 · 5383 in / 1096 out tokens · 39421 ms · 2026-05-15T01:34:57.249473+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 23 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. How LLMs Are Persuaded: A Few Attention Heads, Rerouted

    cs.AI 2026-05 unverdicted novelty 7.0

    Persuasion in LLMs works by redirecting a small set of attention heads to copy the target option token instead of reasoning over evidence, via a rank-one routing feature that can be directly edited or removed.

  2. Bucketing the Good Apples: A Method for Diagnosing and Improving Causal Abstraction

    cs.AI 2026-05 unverdicted novelty 7.0

    A four-step recipe partitions the input space using interchange intervention behavior to diagnose where causal abstractions hold and to guide improvements, demonstrated by recovering a full hypothesis from scratch in ...

  3. EditPropBench: Measuring Factual Edit Propagation in Scientific Manuscripts

    cs.CL 2026-05 unverdicted novelty 7.0

    EditPropBench evaluates LLM editors on propagating factual edits to dependent claims in synthetic scientific manuscripts, showing that even the strongest systems miss roughly 30% of required updates on hard cases.

  4. MemDLM: Memory-Enhanced DLM Training

    cs.CL 2026-03 unverdicted novelty 7.0

    MemDLM embeds a simulated denoising trajectory into DLM training via bi-level optimization, creating a parametric memory that improves convergence and long-context performance even when the memory is dropped at test time.

  5. Eliciting Latent Predictions from Transformers with the Tuned Lens

    cs.LG 2023-03 accept novelty 7.0

    Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.

  6. Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces

    cs.LG 2026-05 unverdicted novelty 6.0

    A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.

  7. $\delta$-mem: Efficient Online Memory for Large Language Models

    cs.AI 2026-05 unverdicted novelty 6.0

    δ-mem augments frozen LLMs with an 8x8 online memory state updated by delta-rule learning to generate low-rank attention corrections, delivering 1.10x average gains over the backbone and larger improvements on memory-...

  8. Not How Many, But Which: Parameter Placement in Low-Rank Adaptation

    cs.LG 2026-05 unverdicted novelty 6.0

    Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.

  9. The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations

    cs.AI 2026-05 unverdicted novelty 6.0

    Temporal knowledge drift is encoded as a geometrically orthogonal direction in LLM residual streams, independent of correctness and uncertainty.

  10. HoReN: Normalized Hopfield Retrieval for Large-Scale Sequential Model Editing

    cs.LG 2026-05 unverdicted novelty 6.0

    HoReN achieves stable sequential editing of 50K facts in LLMs by combining a normalized Hopfield codebook with angular retrieval and attractor dynamics.

  11. Perturbation Probing: A Two-Pass-per-Prompt Diagnostic for FFN Behavioral Circuits in Aligned LLMs

    cs.CL 2026-04 unverdicted novelty 6.0

    Perturbation probing identifies tiny sets of FFN neurons that control refusal templates and language routing in LLMs, enabling precise ablations and directional interventions that alter behavior on benchmarks while pr...

  12. When Model Editing Meets Service Evolution: A Knowledge-Update Perspective for Service Recommendation

    cs.SE 2026-04 unverdicted novelty 6.0

    EVOREC integrates locate-then-edit model editing with FA-constrained decoding to improve LLM-based service recommendation under evolution, reporting 25.9% average relative gain in Recall@5 over baselines and 22.3% ove...

  13. Knowledge Vector of Logical Reasoning in Large Language Models

    cs.CL 2026-04 unverdicted novelty 6.0

    Distinct linear knowledge vectors for deductive, inductive, and abductive reasoning in LLMs can be refined via complementary subspace constraints to improve performance through mutual knowledge sharing.

  14. The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

    cs.LG 2026-04 conditional novelty 6.0

    Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accura...

  15. Class Unlearning via Depth-Aware Removal of Forget-Specific Directions

    cs.CV 2026-04 unverdicted novelty 6.0

    DAMP performs one-shot class unlearning by extracting and projecting out forget-specific residual directions at each network depth using class prototypes and a separability-derived scaling rule.

  16. Distributed Multi-Layer Editing for Rule-Level Knowledge in Large Language Models

    cs.CL 2026-04 unverdicted novelty 6.0

    Rule knowledge in LLMs is localized by form across layers; a distributed multi-layer editing method improves instance portability by 13.91 and rule understanding by 50.19 percentage points over baselines on multiple models.

  17. Empty SPACE: Cross-Attention Sparsity for Concept Erasure in Diffusion Models

    cs.LG 2026-05 unverdicted novelty 5.0

    SPACE induces sparsity in cross-attention parameters via closed-form iterative updates to erase target concepts more effectively than dense baselines in large diffusion models.

  18. Why Expert Alignment Is Hard: Evidence from Subjective Evaluation

    cs.CL 2026-05 unverdicted novelty 5.0

    Expert alignment in subjective LLM evaluations is difficult because expert judgments are heterogeneous, partly tacit, dimension-dependent, and temporally unstable.

  19. The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

    cs.LG 2026-04 unverdicted novelty 5.0

    Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering close the gap.

  20. Towards Scalable Lifelong Knowledge Editing with Selective Knowledge Suppression

    cs.AI 2026-04 unverdicted novelty 5.0

    LightEdit enables scalable lifelong knowledge editing in LLMs via selective knowledge retrieval and probability suppression during decoding, outperforming prior methods on ZSRE, Counterfact, and RIPE while reducing tr...

  21. Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents

    cs.AI 2026-04 unverdicted novelty 5.0

    Layered mutability framework claims governance difficulty in persistent self-modifying agents rises with rapid mutation, strong downstream coupling, weak reversibility, and low observability, producing compositional d...

  22. Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents

    cs.AI 2026-04 unverdicted novelty 5.0

    Persistent self-modifying AI agents exhibit compositional drift from mismatches across five mutability layers, with governance difficulty rising under rapid mutation, strong coupling, weak reversibility, and low obser...

  23. MemOS: A Memory OS for AI System

    cs.CL 2025-07 unverdicted novelty 5.0

    MemOS introduces a unified memory management framework for LLMs using MemCubes to handle and evolve different memory types for improved controllability and evolvability.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · cited by 21 Pith papers · 3 internal anchors

  1. [1]

    A review on language models as knowledge bases

    Badr AlKhamissi, Millicent Li, Asli Celikyilmaz, Mona Diab, and Marjan Ghazvininejad. A review on language models as knowledge bases. arXiv preprint arXiv:2204.06031,

  2. [2]

    org/10.5281/zenodo.5297715

    URL https://doi. org/10.5281/zenodo.5297715. Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. Gpt-neox-20b: An open-source autoregressive language model,

  3. [3]

    Freebase: A shared database of structured general human knowledge

    Kurt Bollacker, Robert Cook, and Patrick Tufts. Freebase: A shared database of structured general human knowledge. In AAAI, volume 7, pp. 1962–1963,

  4. [4]

    Language models are few-shot learners

    10 Published as a conference paper at ICLR 2023 Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse,...

  5. [5]

    PaLM: Scaling Language Modeling with Pathways

    Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311,

  6. [6]

    Analyzing transformers in embedding space

    Guy Dar, Mor Geva, Ankit Gupta, and Jonathan Berant. Analyzing transformers in embedding space. arXiv preprint arXiv:2209.02535,

  7. [7]

    Editing factual knowledge in language models

    Nicola De Cao, Wilker Aziz, and Ivan Titov. Editing factual knowledge in language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pp. 6491–6506, Online and Punta Cana, Dominican Republic, November

  8. [8]

    Mor Geva, Roei Schuster, Jonathan Berant, and Omer Levy

    https://transformer-circuits.pub/2021/framework/index.html. Mor Geva, Roei Schuster, Jonathan Berant, and Omer Levy. Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 5484–5495,

  9. [9]

    Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space.arXiv preprint arXiv:2203.14680,

    Mor Geva, Avi Caciularu, Kevin Ro Wang, and Yoav Goldberg. Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space.arXiv preprint arXiv:2203.14680,

  10. [10]

    Do language models have beliefs? methods for detecting, updating, and visualizing model beliefs

    Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, and Srinivasan Iyer. Do language models have beliefs? methods for detecting, updating, and visualizing model beliefs. arXiv preprint arXiv:2111.13654,

  11. [11]

    How can we know what language models know? Transactions of the Association for Computational Linguistics, 8:423–438,

    11 Published as a conference paper at ICLR 2023 Zhengbao Jiang, Frank F Xu, Jun Araki, and Graham Neubig. How can we know what language models know? Transactions of the Association for Computational Linguistics, 8:423–438,

  12. [12]

    Zero-shot relation extraction via reading comprehension

    Omer Levy, Minjoon Seo, Eunsol Choi, and Luke Zettlemoyer. Zero-shot relation extraction via reading comprehension. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pp. 333–342,

  13. [13]

    Carbon Emissions and Large Neural Network Training

    David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350,

  14. [14]

    Fabio Petroni, Tim Rocktäschel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, and Alexander Miller. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2463–2473,

  15. [15]

    How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp

    12 Published as a conference paper at ICLR 2023 Adam Roberts, Colin Raffel, and Noam Shazeer. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 5418–5426,

  16. [16]

    Relational world knowledge representation in contextual language models: A review

    Tara Safavi and Danai Koutra. Relational world knowledge representation in contextual language models: A review. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 1053–1067,

  17. [17]

    Autoprompt: Eliciting knowledge from language models with automatically generated prompts

    Taylor Shin, Yasaman Razeghi, Robert L Logan IV , Eric Wallace, and Sameer Singh. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pp. 4222–4235,

  18. [18]

    Knowledge graphs 2021: a data odyssey

    Gerhard Weikum. Knowledge graphs 2021: a data odyssey. Proceedings of the VLDB Endowment, 14(12):3233–3238,

  19. [19]

    HuggingFace's Transformers: State-of-the-art Natural Language Processing

    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771,

  20. [20]

    13 Published as a conference paper at ICLR 2023 A C AUSAL TRACING (a) (b) (c) Figure 8: Causal Tracing (using the method of Meng et al. 2022). Each grid cell’s intensity reflects the average causal indirect effect of a hidden state on the expression of a factual association, with strong causal mediators highlighted with darker colors. We find that MLPs at...

  21. [21]

    In the main paper, Figure 3 plots the same data as Figure 8 (a) as a bar graph, focused on only the last subject token, and it adds two additional measurements

    suggests that Attention is not an important mediator of factual recall of memories about the subject. In the main paper, Figure 3 plots the same data as Figure 8 (a) as a bar graph, focused on only the last subject token, and it adds two additional measurements. In red bars, it repeats the measurement of causal effects of states with Attention modules at ...

  22. [22]

    embarrassingly parallel,

    Covariance statistics are collected in fp32 on Wikitext using a sample size of 100,000. See Meng et al. (2022) for more details. ROME takes 44,248.26 sec ≈ 12.29 hr for 10,000 edits on GPT-J, which works out to approximately 4 seconds per edit. B.4 M ASS -E DITING MEMORY IN A TRANSFORMER (MEMIT) On GPT-J, we choose R = {3, 4, 5, 6, 7, 8} and set λ, the co...

  23. [23]

    Choice (iii) was already demonstrated by Meng et al

    to control the impact of the update. Choice (iii) was already demonstrated by Meng et al. (2022) to be significant through an ablation study, but we now investigate the other three. F.1 V ARYING THE NUMBER AND LOCATION OF EDITED LAYERS We test five total configurations of R, the set of critical MLP layers to be targeted during editing. Four are in the reg...

  24. [24]

    case_id":15,

    on performance; Figure 13 displays the results. Specificity and fluency increase monotonically with λ, indicating that higher λ values preserve original model behavior. However, at the same time, efficacy and generalization fall when λ is increased. We can see that around ≈ 104, the aggregated score reaches a maximum. 19 Published as a conference paper at...