arxiv: 2212.04089 · v3 · submitted 2022-12-08 · 💻 cs.LG · cs.CL· cs.CV

Recognition: 1 theorem link

Editing Models with Task Arithmetic

Gabriel Ilharco , Marco Tulio Ribeiro , Mitchell Wortsman , Suchin Gururangan , Ludwig Schmidt , Hannaneh Hajishirzi , Ali Farhadi

Authors on Pith no claims yet

Pith reviewed 2026-05-13 08:05 UTC · model grok-4.3

classification 💻 cs.LG cs.CLcs.CV

keywords task vectorsmodel editingtask arithmeticweight spacepre-trained modelsfine-tuninganalogy tasksmodel steering

0 comments

The pith

Task vectors steer pre-trained models by adding, subtracting, and combining directions in weight space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces task vectors as the difference between a model's fine-tuned weights and its original pre-trained weights. These vectors act as directional adjustments that support arithmetic: negation lowers performance on one task, addition raises performance across several tasks at once, and combinations based on task analogies improve a fourth task even when no examples from that task are available during training. The approach is tested across models, modalities, and tasks, showing that simple vector operations can edit model behavior without full retraining.

Core claim

A task vector is obtained by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a given task. Arithmetic on these vectors steers behavior: negation reduces accuracy on the target task, addition improves accuracy on multiple tasks simultaneously, and when tasks satisfy an analogy of the form A is to B as C is to D, the combination of three vectors raises performance on the fourth task without any training data from it.

What carries the argument

Task vectors, defined as the weight difference between a fine-tuned model and its pre-trained base, which function as linear directions in parameter space that combine via addition and negation to alter task performance.

If this is right

Negating a task vector lowers performance on its associated task while leaving performance on unrelated tasks largely unchanged.
Adding several task vectors raises performance on each of the corresponding tasks at the same time.
Vector combinations derived from task analogies improve accuracy on a fourth task even when no examples from that task are used.
The same arithmetic operations apply across different model architectures and data modalities in the reported experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

A library of pre-computed task vectors could allow quick assembly of custom models by selecting and combining desired directions.
Negating vectors linked to biased or undesired behaviors offers a route to debiasing without new labeled data.
The method suggests that task adaptations may remain modular enough to support sequential additions or removals of capabilities.

Load-bearing premise

Directions in weight space for different tasks add together with little destructive interference.

What would settle it

Run the analogy experiment on a held-out task and observe that the combined vector yields no accuracy gain over the plain pre-trained model.

read the original abstract

Changing how pre-trained models behave -- e.g., improving their performance on a downstream task or mitigating biases learned during pre-training -- is a common practice when developing machine learning systems. In this work, we propose a new paradigm for steering the behavior of neural networks, centered around \textit{task vectors}. A task vector specifies a direction in the weight space of a pre-trained model, such that movement in that direction improves performance on the task. We build task vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a task. We show that these task vectors can be modified and combined together through arithmetic operations such as negation and addition, and the behavior of the resulting model is steered accordingly. Negating a task vector decreases performance on the target task, with little change in model behavior on control tasks. Moreover, adding task vectors together can improve performance on multiple tasks at once. Finally, when tasks are linked by an analogy relationship of the form ``A is to B as C is to D", combining task vectors from three of the tasks can improve performance on the fourth, even when no data from the fourth task is used for training. Overall, our experiments with several models, modalities and tasks show that task arithmetic is a simple, efficient and effective way of editing models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Task vectors from fine-tuning deltas can be added, negated, and combined for multi-task edits and some analogy transfers, with clean empirical support but post-hoc scaling and no bounds on when additivity breaks.

read the letter

The main thing to know is that subtracting a fine-tuned model's weights from the base model produces a task vector you can add or negate to steer behavior. Adding vectors improves multiple tasks at once, and in analogy setups like A is to B as C is to D, combining three vectors lifts performance on the fourth task with no data from it. This is shown across vision, language, and other modalities with several models.

Referee Report

2 major / 2 minor

Summary. The paper introduces task vectors, defined as the difference between the weights of a model fine-tuned on a task and the weights of the corresponding pre-trained model. It demonstrates that these vectors support arithmetic operations such as negation (which decreases performance on the target task) and addition (which can improve performance on multiple tasks simultaneously). For tasks related by an analogy of the form 'A is to B as C is to D', the paper shows that combining task vectors from three tasks can improve performance on the fourth task without using any data from it. Experiments are reported across multiple models, modalities, and tasks.

Significance. If the empirical results hold, the work provides a simple, efficient method for editing pre-trained models without full retraining or access to task data in some cases. The multi-task addition and analogy-based editing results are particularly notable, as they suggest a form of weight-space compositionality that could reduce the need for task-specific fine-tuning. The experiments across models and modalities lend concrete support to the central claims.

major comments (2)

[Experiments] Experiments section: the scaling coefficients used for vector addition are selected post-hoc for each reported result; this choice directly affects the magnitude of the claimed gains and should be accompanied by a sensitivity analysis or default selection rule to avoid the appearance of tuning to the test set.
[Analogy experiments] Analogy experiments (the fourth-task improvement results): error bars or multiple random seeds are not reported for all gains; without them it is difficult to assess whether the observed improvements on the held-out task are statistically reliable or could be explained by variance in the base fine-tuning runs.

minor comments (2)

[§3] Notation for task vectors should be introduced once with a clear equation (e.g., τ = θ_fine − θ_pre) and then used consistently; occasional redefinition in later sections reduces readability.
[Figures] Several figures would benefit from explicit annotation of the scaling coefficient value used in each plotted curve.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the constructive comments. We address each major comment below.

read point-by-point responses

Referee: [Experiments] Experiments section: the scaling coefficients used for vector addition are selected post-hoc for each reported result; this choice directly affects the magnitude of the claimed gains and should be accompanied by a sensitivity analysis or default selection rule to avoid the appearance of tuning to the test set.

Authors: We agree that the scaling coefficients warrant additional justification. In the manuscript, coefficients were chosen based on validation performance for each combination, following common practice for such methods. To strengthen the presentation, we will add a sensitivity analysis in the revised version showing performance across a range of coefficients (e.g., 0.0 to 2.0) for the primary multi-task and analogy results. We will also state a default rule of using coefficient 1.0 when no validation data is available. revision: yes
Referee: [Analogy experiments] Analogy experiments (the fourth-task improvement results): error bars or multiple random seeds are not reported for all gains; without them it is difficult to assess whether the observed improvements on the held-out task are statistically reliable or could be explained by variance in the base fine-tuning runs.

Authors: We acknowledge the value of reporting variability for assessing reliability. The original experiments used single runs primarily due to computational cost. In the revision, we will rerun the analogy experiments with at least three random seeds, report mean performance with standard deviation error bars, and confirm that the observed gains remain statistically reliable. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper defines task vectors explicitly as the difference between fine-tuned and pre-trained weights, then demonstrates their arithmetic properties (negation, addition, and analogy-based combinations) through direct empirical evaluation on held-out test sets across multiple models and tasks. No equations or claims reduce a 'prediction' to a fitted parameter by construction, and the central results (including analogy editing without fourth-task data) are measured independently rather than derived tautologically from the inputs. There are no load-bearing self-citations, uniqueness theorems, or ansatzes that collapse the argument to prior author work. The work is self-contained as an experimental paradigm with falsifiable measurements.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the empirical observation that weight-space differences behave as approximately linear task directions. No free parameters are explicitly fitted beyond scaling coefficients chosen per experiment. No new entities are postulated beyond the task vector construct itself.

free parameters (1)

scaling coefficient for vector addition
Chosen per task combination to maximize performance; not derived from first principles.

axioms (1)

domain assumption Task directions in parameter space are sufficiently linear and additive for the tested models and tasks.
Invoked throughout the experimental sections when combining vectors.

invented entities (1)

task vector no independent evidence
purpose: Direction in weight space that encodes a task's effect on the model.
Defined as the difference between fine-tuned and pre-trained weights; no independent evidence outside the empirical results.

pith-pipeline@v0.9.0 · 5557 in / 1350 out tokens · 23796 ms · 2026-05-13T08:05:00.608895+00:00 · methodology

discussion (0)

Forward citations

Cited by 34 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Defenses at Odds: Measuring and Explaining Defense Conflicts in Large Language Models
cs.CR 2026-05 conditional novelty 8.0

Sequential LLM defense deployment leads to risk exacerbation in 38.9% of cases due to anti-aligned updates in shared critical layers, addressed by conflict-guided layer freezing.
Crafting Reversible SFT Behaviors in Large Language Models
cs.LG 2026-05 unverdicted novelty 8.0

LCDD creates sparse carriers for SFT behaviors that SFT-Eraser can reverse, with ablations showing the sparse structure enables causal control.
Discovering Physical Directions in Weight Space: Composing Neural PDE Experts
cs.LG 2026-05 unverdicted novelty 7.0

Fine-tuning neural PDE operators to regime endpoints reveals a physical direction in weight space that CCM uses to compose accurate merged models for new or extrapolated regimes from metadata or short prefixes.
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
cs.LG 2026-05 unverdicted novelty 7.0

DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights
cs.CL 2026-05 unverdicted novelty 7.0

TFlow enables multi-agent LLMs to collaborate via transient low-rank LoRA perturbations derived from sender activations, yielding up to 8.5 accuracy gains and 83% token reduction versus text-based baselines on Qwen3-4...
CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models
cs.CV 2026-05 unverdicted novelty 7.0

Capability vectors extracted from parameter differences between standard and auxiliary-finetuned VLA models can be merged into pretrained weights to match auxiliary-training performance while reducing computational ov...
Erase Persona, Forget Lore: Benchmarking Multimodal Copyright Unlearning in Large Vision Language Models
cs.CV 2026-05 unverdicted novelty 7.0

CoVUBench is the first benchmark framework for evaluating multimodal copyright unlearning in LVLMs via synthetic data, systematic variations, and a dual protocol for forgetting efficacy and utility preservation.
Generalizing the Geometry of Model Merging Through Frechet Averages
cs.LG 2026-04 unverdicted novelty 7.0

Model merging is reframed as Fréchet averaging on manifolds whose geometry respects architectural symmetries, generalizing Fisher merging and enabling better LoRA merges.
Generalizing the Geometry of Model Merging Through Frechet Averages
cs.LG 2026-04 unverdicted novelty 7.0

Model merging is generalized as Fréchet averaging on symmetry-invariant manifolds, containing Fisher merging as a special case and offering a new approach for LoRA adapters.
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies
cs.RO 2026-04 unverdicted novelty 7.0

A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing ...
Differentially Private Model Merging
cs.LG 2026-04 unverdicted novelty 7.0

Post-processing via random selection or linear combination generates differentially private models for arbitrary privacy parameters from pre-trained models on the same dataset.
Exploring Language-Agnosticity in Function Vectors: A Case Study in Machine Translation
cs.CL 2026-04 unverdicted novelty 7.0

Translation function vectors extracted from English to one target language improve correct token ranking for translations to multiple other unseen target languages in decoder-only multilingual LLMs.
One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging
cs.CL 2026-04 unverdicted novelty 7.0

Merging fine-tuned models for multilingual translation fails because fine-tuning redistributes language-specific neurons rather than sharpening them, increasing representational divergence in output-generating layers.
Internalized Reasoning for Long-Context Visual Document Understanding
cs.CV 2026-03 unverdicted novelty 7.0

A synthetic pipeline creates and internalizes reasoning traces in VLMs for long-context visual document understanding, with a 32B model surpassing a 235B model on MMLongBenchDoc and showing 12.4x fewer output tokens.
Refusal in Language Models Is Mediated by a Single Direction
cs.LG 2024-06 accept novelty 7.0

Refusal in language models is mediated by a single direction in residual stream activations that can be erased to disable safety or added to elicit refusal.
Scalable Token-Level Hallucination Detection in Large Language Models
cs.CL 2026-05 unverdicted novelty 6.0

TokenHD uses a scalable data synthesis engine and importance-weighted training to create token-level hallucination detectors that work on free-form text and scale from 0.6B to 8B parameters, outperforming larger reaso...
Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models
cs.LG 2026-05 unverdicted novelty 6.0

Mutual Reinforcement Learning allows heterogeneous LLMs to exchange experience through mechanisms like Peer Rollout Pooling, Cross-Policy GRPO Advantage Sharing, and Success-Gated Transfer, with outcome-level sharing ...
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
cs.CV 2026-05 unverdicted novelty 6.0

UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies
cs.RO 2026-04 unverdicted novelty 6.0

Empirical study on robosuite tasks reveals a dominant-skill effect in compositions and shows that an atomic probe approximates full revalidation for skill updates at much lower cost.
Separable Expert Architecture: Toward Privacy-Preserving LLM Personalization via Composable Adapters and Deletable User Proxies
cs.AI 2026-04 unverdicted novelty 6.0

A separable expert architecture uses base models, LoRA adapters, and deletable per-user proxies to enable privacy-preserving personalization and deterministic unlearning in LLMs.
AlignCultura: Towards Culturally Aligned Large Language Models?
cs.CL 2026-04 unverdicted novelty 6.0

Align-Cultura introduces the CULTURAX dataset and shows that culturally fine-tuned LLMs improve joint HHH scores by 4-6%, cut cultural failures by 18%, and gain 10-12% efficiency with minimal leakage.
Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts
cs.LG 2026-04 unverdicted novelty 6.0

BAR trains independent domain experts via separate mid-training, SFT, and RL pipelines then composes them with a MoE router to match monolithic retraining performance at lower cost and without catastrophic forgetting.
PivotMerge: Bridging Heterogeneous Multimodal Pre-training via Post-Alignment Model Merging
cs.CV 2026-04 unverdicted novelty 6.0

PivotMerge merges heterogeneous multimodal pre-trained models via shared-space decomposition to filter conflicts and layer-wise weights based on alignment contributions, outperforming baselines on multimodal benchmarks.
Weight Patching: Toward Source-Level Mechanistic Localization in LLMs
cs.AI 2026-04 unverdicted novelty 6.0

Weight Patching localizes capabilities to specific parameter modules in LLMs by replacing weights from a behavior-specialized model into a base model and validating recovery via a vector-anchor interface, revealing a ...
WIN-U: Woodbury-Informed Newton-Unlearning as a retain-free Machine Unlearning Framework
cs.LG 2026-04 unverdicted novelty 6.0

WIN-U delivers a retain-free unlearning update that approximates the gold-standard retrained model via a Woodbury-informed Newton step using only forget-set curvature information.
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment
cs.LG 2026-04 unverdicted novelty 6.0

The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MAT...
Analytic Drift Resister for Non-Exemplar Continual Graph Learning
cs.LG 2026-04 unverdicted novelty 6.0

ADR achieves theoretically zero-forgetting class-incremental graph learning by combining backpropagation adaptation with ridge-regression-based layer-wise merging of GNN linear transformations.
GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs
cs.CV 2026-05 unverdicted novelty 5.0

GeoStack composes multiple domain experts into VLMs with preserved base knowledge and O(1) inference time via geometric stacking and a weight-folding property.
UNSEEN: A Cross-Stack LLM Unlearning Defense against AR-LLM Social Engineering Attacks
cs.CR 2026-04 unverdicted novelty 5.0

UNSEEN combines AR access control, LLM unlearning to suppress profiles, and agent guardrails to defend against AR-LLM social engineering attacks, tested in a 60-person user study with 360 conversations.
HiP-LoRA: Budgeted Spectral Plasticity for Robust Low-Rank Adaptation
cs.LG 2026-04 unverdicted novelty 5.0

HiP-LoRA decomposes LoRA updates into principal and residual spectral channels with a singular-value-weighted stability budget to reduce forgetting and interference during foundation model adaptation.
MAny: Merge Anything for Multimodal Continual Instruction Tuning
cs.LG 2026-04 unverdicted novelty 5.0

MAny addresses dual-forgetting in multimodal continual instruction tuning via CPM and LPM merging strategies, delivering up to 8.57% accuracy gains on UCIT benchmarks without additional training.
FREE-Switch: Frequency-based Dynamic LoRA Switch for Style Transfer
cs.CV 2026-04 unverdicted novelty 5.0

FREE-Switch dynamically switches LoRA adapters using frequency importance per diffusion step and adds semantic alignment to reduce content drift when merging specialized image generators.
SHIFT: Steering Hidden Intermediates in Flow Transformers
cs.CV 2026-04 unverdicted novelty 5.0

SHIFT learns and applies steering vectors to selected layers and timesteps in DiT models to suppress concepts, shift styles, or bias objects while keeping image quality and prompt adherence intact.
MOMO: Mars Orbital Model Foundation Model for Mars Orbital Applications
cs.CV 2026-04 unverdicted novelty 5.0

MOMO merges sensor-specific models from three Mars orbital instruments at matched validation loss stages to form a foundation model that outperforms ImageNet, Earth observation, sensor-specific, and supervised baselin...

Reference graph

Works this paper leans on

114 extracted references · 114 canonical work pages · cited by 32 Pith papers · 11 internal anchors

[1]

Task2vec: Task embedding for meta-learning

Alessandro Achille, Michael Lam, Rahul Tewari, Avinash Ravichandran, Subhransu Maji, Charless C Fowlkes, Stefano Soatto, and Pietro Perona. Task2vec: Task embedding for meta-learning. In International Conference on Computer Vision (ICCV) , 2019. https: //arxiv.org/abs/1902.03545

work page arXiv 2019
[2]

K., Hayase, J., and Srinivasa, S

Samuel K Ainsworth, Jonathan Hayase, and Siddhartha Srinivasa. Git re-basin: Merging mod- els modulo permutation symmetries, 2022. https://arxiv.org/abs/2209.04836

work page arXiv 2022
[3]

Flamingo: a Visual Language Model for Few-Shot Learning

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, et al. Flamingo: a visual language model for few-shot learning, 2022. https://arxiv.org/abs/2204.14198

work page internal anchor Pith review Pith/arXiv arXiv 2022
[4]

A General Language Assistant as a Laboratory for Alignment

Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, et al. A general language assistant as a laboratory for alignment, 2021. https://arxiv.org/abs/2112.00861

work page internal anchor Pith review Pith/arXiv arXiv 2021
[5]

The second pascal recognising textual entailment challenge

Roy Bar-Haim, Ido Dagan, Bill Dolan, Lisa Ferro, Danilo Giampiccolo, Bernardo Magnini, and Idan Szpektor. The second pascal recognising textual entailment challenge. In II PASCAL challenge, 2006

work page 2006
[6]

The ﬁfth pascal recognizing textual entailment challenge

Luisa Bentivogli, Peter Clark, Ido Dagan, and Danilo Giampiccolo. The ﬁfth pascal recognizing textual entailment challenge. In TAC, 2009. https://cris.fbk.eu/handle/11582/ 5351

work page 2009
[7]

Loss sur- face simplexes for mode connecting volumes and fast ensembling

Gregory Benton, Wesley Maddox, Sanae Lotﬁ, and Andrew Gordon Gordon Wilson. Loss sur- face simplexes for mode connecting volumes and fast ensembling. InInternational Conference on Machine Learning (ICML), 2021. https://arxiv.org/abs/2102.13042

work page arXiv 2021
[8]

Nuanced metrics for measuring unintended bias with real data for text classiﬁcation

Daniel Borkan, Lucas Dixon, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. Nuanced metrics for measuring unintended bias with real data for text classiﬁcation. In Companion Proceedings of the 2019 World Wide Web Conference, 2019. https://arxiv.org/abs/ 1903.04561

work page arXiv 2019
[9]

Language Models are Few-Shot Learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott G...

work page internal anchor Pith review Pith/arXiv arXiv 2020
[10]

Remote sensing image scene classiﬁcation: Benchmark and state of the art

Gong Cheng, Junwei Han, and Xiaoqiang Lu. Remote sensing image scene classiﬁcation: Benchmark and state of the art. Proceedings of the Institute of Electrical and Electronics En- gineers (IEEE), 2017. https://ieeexplore.ieee.org/abstract/document/ 7891544

work page 2017
[11]

Fusing finetuned models for better pretraining,

Leshem Choshen, Elad Venezian, Noam Slonim, and Yoav Katz. Fusing ﬁnetuned models for better pretraining, 2022. https://arxiv.org/abs/2204.03044. 10 Published as a conference paper at ICLR 2023

work page arXiv 2022
[12]

Describing textures in the wild

Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild. In Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2014. https://openaccess.thecvf.com/content_cvpr_2014/ html/Cimpoi_Describing_Textures_in_2014_CVPR_paper.html

work page 2014
[13]

A deep neural network’s loss surface contains every low-dimensional pattern, 2019

Wojciech Marian Czarnecki, Simon Osindero, Razvan Pascanu, and Max Jaderberg. A deep neural network’s loss surface contains every low-dimensional pattern, 2019. https: //arxiv.org/abs/1912.07559

work page arXiv 2019
[14]

The PASCAL recognising textual entailment challenge

Ido Dagan, Oren Glickman, and Bernardo Magnini. The pascal recognising textual entailment challenge. In Machine Learning Challenges Workshop, 2005. https://link.springer. com/chapter/10.1007/11736790_9

work page doi:10.1007/11736790_9 2005
[15]

Editing Factual Knowledge in Language Models, September 2021

Nicola De Cao, Wilker Aziz, and Ivan Titov. Editing factual knowledge in language models. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021. https: //arxiv.org/abs/2104.08164

work page arXiv 2021
[16]

Imagenet: A large- scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In Conference on Computer Vision and Pattern Recog- nition (CVPR), 2009. https://ieeexplore.ieee.org/abstract/document/ 5206848

work page 2009
[17]

BERT: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019. https: //aclanthology.org/N19-1423

work page 2019
[18]

Dodge, G

Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, and Noah Smith. Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping, 2020. https://arxiv.org/abs/2002.06305/

work page arXiv 2020
[19]

Dolan and Chris Brockett

William B. Dolan and Chris Brockett. Automatically constructing a corpus of sentential para- phrases. In International Workshop on Paraphrasing, 2005. https://aclanthology. org/I05-5002

work page 2005
[20]

Cold fusion: Collaborative descent for distributed multitask ﬁnetuning, 2022

Shachar Don-Yehiya, Elad Venezian, Colin Raffel, Noam Slonim, Yoav Katz, and Leshem Choshen. Cold fusion: Collaborative descent for distributed multitask ﬁnetuning, 2022. https://arxiv.org/abs/2212.01378

work page arXiv 2022
[21]

Essentially no barriers in neural network energy landscape

Felix Draxler, Kambis Veschgini, Manfred Salmhofer, and Fred Hamprecht. Essentially no barriers in neural network energy landscape. InInternational Conference on Machine Learning (ICML), 2018. https://arxiv.org/abs/1803.00885

work page arXiv 2018
[22]

How do humans sketch objects? ACM Trans- actions on graphics (TOG), 2012

Mathias Eitz, James Hays, and Marc Alexa. How do humans sketch objects? ACM Trans- actions on graphics (TOG), 2012. https://dl.acm.org/doi/10.1145/2185520. 2185540

work page doi:10.1145/2185520 2012
[23]

arXiv preprint arXiv:2110.06296 , year=

Rahim Entezari, Hanie Sedghi, Olga Saukh, and Behnam Neyshabur. The role of permutation invariance in linear mode connectivity of neural networks. In International Conference on Learning Representations (ICLR), 2022. https://arxiv.org/abs/2110.06296

work page arXiv 2022
[24]

Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir R

Alexander R. Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir R. Radev. Multi-news: a large-scale multi-document summarization dataset and abstractive hierarchical model, 2019. https://arxiv.org/abs/1906.01749

work page arXiv 2019
[25]

Deep ensembles: A loss landscape perspective, 2019

Stanislav Fort, Huiyi Hu, and Balaji Lakshminarayanan. Deep ensembles: A loss landscape perspective, 2019. https://arxiv.org/abs/1912.02757

work page arXiv 2019
[26]

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel

Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M Roy, and Surya Ganguli. Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel. In Advances in Neural Information Processing Systems (NeurIPS) , 2020. https://arxiv.org/abs/2010. 15110. 11...

work page 2020
[27]

Linear mode connectivity and the lottery ticket hypothesis

Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, and Michael Carbin. Linear mode connectivity and the lottery ticket hypothesis. In International Conference on Machine Learning (ICML), 2020. https://proceedings.mlr.press/v119/frankle20a. html

work page 2020
[28]

Loss surfaces, mode connectivity, and fast ensembling of dnns

Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov, and Andrew Gordon Wilson. Loss surfaces, mode connectivity, and fast ensembling of dnns. In Advances in Neural Information Processing Systems (NeurIPS) , 2018. https://arxiv.org/abs/1802. 10026

work page 2018
[29]

Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. Re- alToxicityPrompts: Evaluating neural toxic degeneration in language models. In Find- ings of the Association for Computational Linguistics: EMNLP 2020 , 2020. https: //aclanthology.org/2020.findings-emnlp.301

work page 2020
[30]

Lm-debugger: An interactive tool for inspection and intervention in transformer-based language models, 2022

Mor Geva, Avi Caciularu, Guy Dar, Paul Roit, Shoval Sadde, Micah Shlain, Bar Tamir, and Yoav Goldberg. Lm-debugger: An interactive tool for inspection and intervention in transformer-based language models, 2022. https://arxiv.org/abs/2204.12130

work page arXiv 2022
[31]

The third pascal recog- nizing textual entailment challenge

Danilo Giampiccolo, Bernardo Magnini, Ido Dagan, and Bill Dolan. The third pascal recog- nizing textual entailment challenge. In ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, 2007. https://aclanthology.org/W07-1401/

work page 2007
[32]

Improving alignment of dialogue agents via targeted human judgements, 2022

Amelia Glaese, Nat McAleese, Maja Trebacz, John Aslanides, Vlad Firoiu, Timo Ewalds, Mari- beth Rauh, Laura Weidinger, Martin Chadwick, Phoebe Thacker, Lucy Campbell-Gillingham, Jonathan Uesato, Po-Sen Huang, Ramona Comanescu, Fan Yang, Abigail See, Sumanth Dathathri, Rory Greig, Charlie Chen, Doug Fritz, Jaume Sanchez Elias, Richard Green, Sona Mokra, Ni...

work page 2022
[33]

Model patching: Closing the sub- group performance gap with data augmentation, 2020.https://arxiv.org/abs/2008

Karan Goel, Albert Gu, Yixuan Li, and Christopher R ´e. Model patching: Closing the sub- group performance gap with data augmentation, 2020.https://arxiv.org/abs/2008. 06775

work page 2020
[34]

Eternal sunshine of the spotless net: Selective forgetting in deep networks

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In Conference on Computer Vision and Pattern Recognition (CVPR), 2020. https://arxiv.org/abs/1911.04933

work page arXiv 2020
[35]

Detoxify, 2020

Laura Hanu and Unitary team. Detoxify, 2020. https://github.com/unitaryai/ detoxify

work page 2020
[36]

EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification

Patrick Helber, Benjamin Bischke, Andreas Dengel, and Damian Borth. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classiﬁcation. Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2019. https:// arxiv.org/abs/1709.00029

work page Pith review arXiv 2019
[37]

Natural adversarial examples

Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. Natural adversarial examples. In Conference on Computer Vision and Pattern Recognition (CVPR), 2021

work page 2021
[38]

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models, 2022. https://arxiv.org/abs/ 2203.15556

work page internal anchor Pith review Pith/arXiv arXiv 2022
[39]

Patching open-vocabulary models by interpolating weights

Gabriel Ilharco, Mitchell Wortsman, Samir Yitzhak Gadre, Shuran Song, Hannaneh Hajishirzi, Simon Kornblith, Ali Farhadi, and Ludwig Schmidt. Patching open-vocabulary models by interpolating weights. In Advances in Neural Information Processing Systems (NeurIPS), 2022. https://arXiv.org/abs/2208.05592. 12 Published as a conference paper at ICLR 2023

work page arXiv 2022
[40]

Averaging weights leads to wider optima and better generalization

Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. Averaging weights leads to wider optima and better generalization. In Conference on Uncertainty in Artiﬁcial Intelligence (UAI), 2018. https://arxiv.org/abs/1803. 05407

work page 2018
[41]

Neural tangent kernel: Convergence and generalization in neural networks

Arthur Jacot, Franck Gabriel, and Cl´ement Hongler. Neural tangent kernel: Convergence and generalization in neural networks. In Advances in Neural Information Processing Systems (NeurIPS), 2018. https://arxiv.org/abs/1806.07572

work page arXiv 2018
[42]

13 Published as a conference paper at ICLR 2026 Sheng Jin, Xueying Jiang, Jiaxing Huang, Lewei Lu, and Shijian Lu

Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V Le, Yunhsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representa- tion learning with noisy text supervision. In International Conference on Machine Learning (ICML), 2021. https://arxiv.org/abs/2102.05918

work page arXiv 2021
[43]

Linear connectivity reveals generalization strategies, 2022

Jeevesh Juneja, Rachit Bansal, Kyunghyun Cho, Jo ˜ao Sedoc, and Naomi Saphra. Linear connectivity reveals generalization strategies, 2022. https://arxiv.org/abs/2205. 12411/

work page 2022
[44]

In conversation with artiﬁcial intelligence: aligning language models with human values, 2022

Atoosa Kasirzadeh and Iason Gabriel. In conversation with artiﬁcial intelligence: aligning language models with human values, 2022. https://arxiv.org/abs/2209.00731

work page arXiv 2022
[45]

UNIFIEDQA: Crossing format boundaries with a single QA system

Daniel Khashabi, Sewon Min, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, and Hannaneh Hajishirzi. UNIFIEDQA: Crossing format boundaries with a single QA system. In Findings of the Association for Computational Linguistics (EMNLP) , 2020. https: //aclanthology.org/2020.findings-emnlp.171

work page 2020
[46]

Qasc: A dataset for question answering via sentence composition, 2020

Tushar Khot, Peter Clark, Michal Guerquin, Peter Jansen, and Ashish Sabharwal. Qasc: A dataset for question answering via sentence composition, 2020. https://arxiv.org/ abs/1910.11473v2

work page arXiv 2020
[47]

3d object representations for ﬁne- grained categorization

Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representations for ﬁne- grained categorization. In International Conference on Computer Vision Workshops (ICML),

work page
[48]

https://www.cv-foundation.org/openaccess/content_iccv_ workshops_2013/W19/html/Krause_3D_Object_Representations_ 2013_ICCV_paper.html

work page
[49]

Explaining landscape connectivity of low-cost solutions for multilayer nets

Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu, Rong Ge, and Sanjeev Arora. Explaining landscape connectivity of low-cost solutions for multilayer nets. Advances in Neural Information Processing Systems (NeurIPS), 2019. https://arxiv. org/abs/1906.06247

work page arXiv 2019
[50]

RACE: Large- scale ReAding comprehension dataset from examinations

Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. RACE: Large- scale ReAding comprehension dataset from examinations. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2017. https://aclanthology. org/D17-1082

work page 2017
[51]

Transforming task representations to perform novel tasks

Andrew K Lampinen and James L McClelland. Transforming task representations to perform novel tasks. Proceedings of the National Academy of Sciences, 2020

work page 2020
[52]

The mnist database of handwritten digits, 1998

Yann LeCun. The mnist database of handwritten digits, 1998. http://yann.lecun. com/exdb/mnist/

work page 1998
[53]

The power of scale for parameter-efﬁcient prompt tuning

Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efﬁcient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3045–3059, Online and Punta Cana, Dominican Republic, November

work page 2021
[54]

doi: 10.18653/v1/2021.emnlp-main.243

Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.243. URL https://aclanthology.org/2021.emnlp-main.243

work page doi:10.18653/v1/2021.emnlp-main.243 2021
[55]

Datasets: A community library for natural language processing

Quentin Lhoest, Albert Villanova del Moral, Yacine Jernite, Abhishek Thakur, Patrick von Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, Joe Davison, Mario ˇSaˇsko, Gunjan Chhablani, Bhavitvya Malik, Simon Brandeis, Teven Le Scao, Victor Sanh, Canwen Xu, Nicolas Patry, Angelina McMillan-Major, Philipp Schmid, Sylvain Gugge...

work page doi:10.18653/v1/2021.emnlp-demo.21 2023
[56]

Visualizing the loss landscape of neural nets

Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. Visualizing the loss landscape of neural nets. Advances in Neural Information Processing Systems (NeurIPS) ,

work page
[57]

https://arxiv.org/abs/1712.09913

work page Pith review arXiv
[58]

Branch-train-merge: Embarrassingly parallel training of expert language models.arXiv preprint arXiv:2208.03306, 2022

Margaret Li, Suchin Gururangan, Tim Dettmers, Mike Lewis, Tim Althoff, Noah A Smith, and Luke Zettlemoyer. Branch-train-merge: Embarrassingly parallel training of expert language models, 2022. https://arxiv.org/abs/2208.03306

work page arXiv 2022
[59]

CommonGen: A constrained text generation challenge for generative com- monsense reasoning

Bill Yuchen Lin, Wangchunshu Zhou, Ming Shen, Pei Zhou, Chandra Bhagavatula, Yejin Choi, and Xiang Ren. CommonGen: A constrained text generation challenge for generative com- monsense reasoning. In Findings of the Association for Computational Linguistics: EMNLP,

work page
[60]

https://www.aclweb.org/anthology/2020.findings-emnlp.165

work page 2020
[61]

Smith, and Yejin Choi

Alisa Liu, Maarten Sap, Ximing Lu, Swabha Swayamdipta, Chandra Bhagavatula, Noah A. Smith, and Yejin Choi. DExperts: Decoding-time controlled text generation with experts and anti-experts. In Annual Meeting of the Association for Computational Linguistics (ACL), 2021. https://aclanthology.org/2021.acl-long.522

work page 2021
[62]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR) , 2019. URL https://openreview. net/forum?id=Bkg6RiCqY7

work page 2019
[63]

Quark: Controllable text generation with reinforced unlearning,

Ximing Lu, Sean Welleck, Liwei Jiang, Jack Hessel, Lianhui Qin, Peter West, Prithviraj Ammanabrolu, and Yejin Choi. Quark: Controllable text generation with reinforced unlearning,

work page
[64]

https://arxiv.org/abs/2205.13636

work page arXiv
[65]

arXiv preprint arXiv:2211.08422 , year=

Ekdeep Singh Lubana, Eric J Bigelow, Robert P Dick, David Krueger, and Hidenori Tanaka. Mechanistic mode connectivity, 2022. https://arxiv.org/abs/2211.08422

work page arXiv 2022
[66]

Analyzing monotonic linear interpolation in neural network loss landscapes, 2021

James Lucas, Juhan Bae, Michael R Zhang, Stanislav Fort, Richard Zemel, and Roger Grosse. Analyzing monotonic linear interpolation in neural network loss landscapes, 2021. https: //arxiv.org/abs/2104.11044

work page arXiv 2021
[67]

Maas, Raymond E

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y . Ng, and Christo- pher Potts. Learning word vectors for sentiment analysis. In Annual Meeting of the Association for Computational Linguistics (ACL), 2011. http://www.aclweb.org/anthology/ P11-1015

work page 2011
[68]

Merging models with ﬁsher-weighted averaging

Michael Matena and Colin Raffel. Merging models with ﬁsher-weighted averaging. In Advances in Neural Information Processing Systems (NeurIPS), 2021. https://arxiv. org/abs/2111.09832

work page arXiv 2021
[69]

Comparison of the predicted and observed secondary structure of t4 phage lysozyme

Brian W Matthews. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure, 1975. https://www. sciencedirect.com/science/article/abs/pii/0005279575901099

work page arXiv 1975
[70]

2013 , isbn =

Julian McAuley and Jure Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In ACM Conference on Recommender Systems, 2013. https: //dl.acm.org/doi/10.1145/2507157.2507163

work page doi:10.1145/2507157.2507163 2013
[71]

Pointer Sentinel Mixture Models

Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models, 2016. https://arxiv.org/abs/1609.07843

work page internal anchor Pith review Pith/arXiv arXiv 2016
[72]

MetaICL: Learn- ing to learn in context

Sewon Min, Mike Lewis, Luke Zettlemoyer, and Hannaneh Hajishirzi. MetaICL: Learn- ing to learn in context. In Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022. https://aclanthology.org/2022. naacl-main.201. 14 Published as a conference paper at ICLR 2023

work page 2022
[73]

Cross-task general- ization via natural language crowdsourcing instructions

Swaroop Mishra, Daniel Khashabi, Chitta Baral, and Hannaneh Hajishirzi. Cross-task general- ization via natural language crowdsourcing instructions. In Annual Meeting of the Associa- tion for Computational Linguistics (ACL), 2022. https://aclanthology.org/2022. acl-long.244

work page 2022
[74]

Fast model editing at scale

Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D Manning. Fast model editing at scale. In International Conference on Learning Representations (ICLR), 2021. https://arxiv.org/abs/2110.11309

work page arXiv 2021
[75]

Memory-based model editing at scale

Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D Manning, and Chelsea Finn. Memory-based model editing at scale. In International Conference on Machine Learning,

work page
[76]

https://arxiv.org/abs/2206.06520

work page arXiv
[77]

Fixing model bugs with natural language patches

Shikhar Murty, Christopher D Manning, Scott Lundberg, and Marco Tulio Ribeiro. Fixing model bugs with natural language patches. In ACL Workshop on Learning with Natural Lan- guage Supervision, 2022. https://openreview.net/forum?id=blJrg3WvvDV

work page 2022
[78]

Read- ing digits in natural images with unsupervised feature learning

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Read- ing digits in natural images with unsupervised feature learning. InAdvances in Neural Informa- tion Processing Systems (NeurIPS) Workshops, 2011. https://storage.googleapis. com/pub-tools-public-publication-data/pdf/37648.pdf

work page 2011
[79]

What is being transferred in transfer learning? In Advances in Neural Information Processing Systems (NeurIPS), 2020

Behnam Neyshabur, Hanie Sedghi, and Chiyuan Zhang. What is being transferred in transfer learning? In Advances in Neural Information Processing Systems (NeurIPS), 2020. https: //arxiv.org/abs/2008.11687

work page arXiv 2020
[80]

Training language models to follow instructions with human feedback, 2022

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback, 2022. https://arxiv.org/abs/2203. 02155

work page 2022

Showing first 80 references.