hub

Mass- editing memory in a transformer

· 2022 · arXiv 2210.07229

19 Pith papers cite this work. Polarity classification is still indexing.

19 Pith papers citing it

read on arXiv browse 19 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

How LLMs Are Persuaded: A Few Attention Heads, Rerouted

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

Persuasion in LLMs works by redirecting a small set of attention heads to copy the target option token instead of reasoning over evidence, via a rank-one routing feature that can be directly edited or removed.

Bucketing the Good Apples: A Method for Diagnosing and Improving Causal Abstraction

cs.AI · 2026-05-04 · unverdicted · novelty 7.0

A four-step recipe partitions the input space using interchange intervention behavior to diagnose where causal abstractions hold and to guide improvements, demonstrated by recovering a full hypothesis from scratch in a toy logic task.

EditPropBench: Measuring Factual Edit Propagation in Scientific Manuscripts

cs.CL · 2026-05-03 · unverdicted · novelty 7.0

EditPropBench evaluates LLM editors on propagating factual edits to dependent claims in synthetic scientific manuscripts, showing that even the strongest systems miss roughly 30% of required updates on hard cases.

Eliciting Latent Predictions from Transformers with the Tuned Lens

cs.LG · 2023-03-14 · accept · novelty 7.0

Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.

Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.

$\delta$-mem: Efficient Online Memory for Large Language Models

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

δ-mem augments frozen LLMs with an 8x8 online memory state updated by delta-rule learning to generate low-rank attention corrections, delivering 1.10x average gains over the backbone and larger improvements on memory-heavy tasks.

Not How Many, But Which: Parameter Placement in Low-Rank Adaptation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.

The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

Temporal knowledge drift is encoded as a geometrically orthogonal direction in LLM residual streams, independent of correctness and uncertainty.

HoReN: Normalized Hopfield Retrieval for Large-Scale Sequential Model Editing

cs.LG · 2026-05-02 · unverdicted · novelty 6.0

HoReN achieves stable sequential editing of 50K facts in LLMs by combining a normalized Hopfield codebook with angular retrieval and attractor dynamics.

Perturbation Probing: A Two-Pass-per-Prompt Diagnostic for FFN Behavioral Circuits in Aligned LLMs

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

Perturbation probing identifies tiny sets of FFN neurons that control refusal templates and language routing in LLMs, enabling precise ablations and directional interventions that alter behavior on benchmarks while preserving safety.

When Model Editing Meets Service Evolution: A Knowledge-Update Perspective for Service Recommendation

cs.SE · 2026-04-29 · unverdicted · novelty 6.0

EVOREC integrates locate-then-edit model editing with FA-constrained decoding to improve LLM-based service recommendation under evolution, reporting 25.9% average relative gain in Recall@5 over baselines and 22.3% over fine-tuning in dynamic scenarios.

Knowledge Vector of Logical Reasoning in Large Language Models

cs.CL · 2026-04-26 · unverdicted · novelty 6.0

Distinct linear knowledge vectors for deductive, inductive, and abductive reasoning in LLMs can be refined via complementary subspace constraints to improve performance through mutual knowledge sharing.

The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

cs.LG · 2026-04-26 · conditional · novelty 6.0 · 2 refs

Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.

Class Unlearning via Depth-Aware Removal of Forget-Specific Directions

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

DAMP performs one-shot class unlearning by extracting and projecting out forget-specific residual directions at each network depth using class prototypes and a separability-derived scaling rule.

Distributed Multi-Layer Editing for Rule-Level Knowledge in Large Language Models

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

Rule knowledge in LLMs is localized by form across layers; a distributed multi-layer editing method improves instance portability by 13.91 and rule understanding by 50.19 percentage points over baselines on multiple models.

Empty SPACE: Cross-Attention Sparsity for Concept Erasure in Diffusion Models

cs.LG · 2026-05-11 · unverdicted · novelty 5.0

SPACE induces sparsity in cross-attention parameters via closed-form iterative updates to erase target concepts more effectively than dense baselines in large diffusion models.

Why Expert Alignment Is Hard: Evidence from Subjective Evaluation

cs.CL · 2026-05-06 · unverdicted · novelty 5.0

Expert alignment in subjective LLM evaluations is difficult because expert judgments are heterogeneous, partly tacit, dimension-dependent, and temporally unstable.

Towards Scalable Lifelong Knowledge Editing with Selective Knowledge Suppression

cs.AI · 2026-04-21 · unverdicted · novelty 5.0

LightEdit enables scalable lifelong knowledge editing in LLMs via selective knowledge retrieval and probability suppression during decoding, outperforming prior methods on ZSRE, Counterfact, and RIPE while reducing training costs.

Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents

cs.AI · 2026-04-16 · unverdicted · novelty 5.0 · 2 refs

Persistent self-modifying AI agents exhibit compositional drift from mismatches across five mutability layers, with governance difficulty rising under rapid mutation, strong coupling, weak reversibility, and low observability, as indicated by a 0.68 identity hysteresis ratio in a preliminary ratchet

citing papers explorer

Showing 1 of 1 citing paper after filters.

Class Unlearning via Depth-Aware Removal of Forget-Specific Directions cs.CV · 2026-04-16 · unverdicted · none · ref 30
DAMP performs one-shot class unlearning by extracting and projecting out forget-specific residual directions at each network depth using class prototypes and a separability-derived scaling rule.

Mass- editing memory in a transformer

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer