pith. machine review for the scientific record. sign in

arxiv: 2212.04089 · v3 · submitted 2022-12-08 · 💻 cs.LG · cs.CL· cs.CV

Recognition: 1 theorem link

Editing Models with Task Arithmetic

Authors on Pith no claims yet

Pith reviewed 2026-05-13 08:05 UTC · model grok-4.3

classification 💻 cs.LG cs.CLcs.CV
keywords task vectorsmodel editingtask arithmeticweight spacepre-trained modelsfine-tuninganalogy tasksmodel steering
0
0 comments X

The pith

Task vectors steer pre-trained models by adding, subtracting, and combining directions in weight space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces task vectors as the difference between a model's fine-tuned weights and its original pre-trained weights. These vectors act as directional adjustments that support arithmetic: negation lowers performance on one task, addition raises performance across several tasks at once, and combinations based on task analogies improve a fourth task even when no examples from that task are available during training. The approach is tested across models, modalities, and tasks, showing that simple vector operations can edit model behavior without full retraining.

Core claim

A task vector is obtained by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a given task. Arithmetic on these vectors steers behavior: negation reduces accuracy on the target task, addition improves accuracy on multiple tasks simultaneously, and when tasks satisfy an analogy of the form A is to B as C is to D, the combination of three vectors raises performance on the fourth task without any training data from it.

What carries the argument

Task vectors, defined as the weight difference between a fine-tuned model and its pre-trained base, which function as linear directions in parameter space that combine via addition and negation to alter task performance.

If this is right

  • Negating a task vector lowers performance on its associated task while leaving performance on unrelated tasks largely unchanged.
  • Adding several task vectors raises performance on each of the corresponding tasks at the same time.
  • Vector combinations derived from task analogies improve accuracy on a fourth task even when no examples from that task are used.
  • The same arithmetic operations apply across different model architectures and data modalities in the reported experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • A library of pre-computed task vectors could allow quick assembly of custom models by selecting and combining desired directions.
  • Negating vectors linked to biased or undesired behaviors offers a route to debiasing without new labeled data.
  • The method suggests that task adaptations may remain modular enough to support sequential additions or removals of capabilities.

Load-bearing premise

Directions in weight space for different tasks add together with little destructive interference.

What would settle it

Run the analogy experiment on a held-out task and observe that the combined vector yields no accuracy gain over the plain pre-trained model.

read the original abstract

Changing how pre-trained models behave -- e.g., improving their performance on a downstream task or mitigating biases learned during pre-training -- is a common practice when developing machine learning systems. In this work, we propose a new paradigm for steering the behavior of neural networks, centered around \textit{task vectors}. A task vector specifies a direction in the weight space of a pre-trained model, such that movement in that direction improves performance on the task. We build task vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a task. We show that these task vectors can be modified and combined together through arithmetic operations such as negation and addition, and the behavior of the resulting model is steered accordingly. Negating a task vector decreases performance on the target task, with little change in model behavior on control tasks. Moreover, adding task vectors together can improve performance on multiple tasks at once. Finally, when tasks are linked by an analogy relationship of the form ``A is to B as C is to D", combining task vectors from three of the tasks can improve performance on the fourth, even when no data from the fourth task is used for training. Overall, our experiments with several models, modalities and tasks show that task arithmetic is a simple, efficient and effective way of editing models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces task vectors, defined as the difference between the weights of a model fine-tuned on a task and the weights of the corresponding pre-trained model. It demonstrates that these vectors support arithmetic operations such as negation (which decreases performance on the target task) and addition (which can improve performance on multiple tasks simultaneously). For tasks related by an analogy of the form 'A is to B as C is to D', the paper shows that combining task vectors from three tasks can improve performance on the fourth task without using any data from it. Experiments are reported across multiple models, modalities, and tasks.

Significance. If the empirical results hold, the work provides a simple, efficient method for editing pre-trained models without full retraining or access to task data in some cases. The multi-task addition and analogy-based editing results are particularly notable, as they suggest a form of weight-space compositionality that could reduce the need for task-specific fine-tuning. The experiments across models and modalities lend concrete support to the central claims.

major comments (2)
  1. [Experiments] Experiments section: the scaling coefficients used for vector addition are selected post-hoc for each reported result; this choice directly affects the magnitude of the claimed gains and should be accompanied by a sensitivity analysis or default selection rule to avoid the appearance of tuning to the test set.
  2. [Analogy experiments] Analogy experiments (the fourth-task improvement results): error bars or multiple random seeds are not reported for all gains; without them it is difficult to assess whether the observed improvements on the held-out task are statistically reliable or could be explained by variance in the base fine-tuning runs.
minor comments (2)
  1. [§3] Notation for task vectors should be introduced once with a clear equation (e.g., τ = θ_fine − θ_pre) and then used consistently; occasional redefinition in later sections reduces readability.
  2. [Figures] Several figures would benefit from explicit annotation of the scaling coefficient value used in each plotted curve.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the constructive comments. We address each major comment below.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the scaling coefficients used for vector addition are selected post-hoc for each reported result; this choice directly affects the magnitude of the claimed gains and should be accompanied by a sensitivity analysis or default selection rule to avoid the appearance of tuning to the test set.

    Authors: We agree that the scaling coefficients warrant additional justification. In the manuscript, coefficients were chosen based on validation performance for each combination, following common practice for such methods. To strengthen the presentation, we will add a sensitivity analysis in the revised version showing performance across a range of coefficients (e.g., 0.0 to 2.0) for the primary multi-task and analogy results. We will also state a default rule of using coefficient 1.0 when no validation data is available. revision: yes

  2. Referee: [Analogy experiments] Analogy experiments (the fourth-task improvement results): error bars or multiple random seeds are not reported for all gains; without them it is difficult to assess whether the observed improvements on the held-out task are statistically reliable or could be explained by variance in the base fine-tuning runs.

    Authors: We acknowledge the value of reporting variability for assessing reliability. The original experiments used single runs primarily due to computational cost. In the revision, we will rerun the analogy experiments with at least three random seeds, report mean performance with standard deviation error bars, and confirm that the observed gains remain statistically reliable. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper defines task vectors explicitly as the difference between fine-tuned and pre-trained weights, then demonstrates their arithmetic properties (negation, addition, and analogy-based combinations) through direct empirical evaluation on held-out test sets across multiple models and tasks. No equations or claims reduce a 'prediction' to a fitted parameter by construction, and the central results (including analogy editing without fourth-task data) are measured independently rather than derived tautologically from the inputs. There are no load-bearing self-citations, uniqueness theorems, or ansatzes that collapse the argument to prior author work. The work is self-contained as an experimental paradigm with falsifiable measurements.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the empirical observation that weight-space differences behave as approximately linear task directions. No free parameters are explicitly fitted beyond scaling coefficients chosen per experiment. No new entities are postulated beyond the task vector construct itself.

free parameters (1)
  • scaling coefficient for vector addition
    Chosen per task combination to maximize performance; not derived from first principles.
axioms (1)
  • domain assumption Task directions in parameter space are sufficiently linear and additive for the tested models and tasks.
    Invoked throughout the experimental sections when combining vectors.
invented entities (1)
  • task vector no independent evidence
    purpose: Direction in weight space that encodes a task's effect on the model.
    Defined as the difference between fine-tuned and pre-trained weights; no independent evidence outside the empirical results.

pith-pipeline@v0.9.0 · 5557 in / 1350 out tokens · 23796 ms · 2026-05-13T08:05:00.608895+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 34 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Defenses at Odds: Measuring and Explaining Defense Conflicts in Large Language Models

    cs.CR 2026-05 conditional novelty 8.0

    Sequential LLM defense deployment leads to risk exacerbation in 38.9% of cases due to anti-aligned updates in shared critical layers, addressed by conflict-guided layer freezing.

  2. Crafting Reversible SFT Behaviors in Large Language Models

    cs.LG 2026-05 unverdicted novelty 8.0

    LCDD creates sparse carriers for SFT behaviors that SFT-Eraser can reverse, with ablations showing the sparse structure enables causal control.

  3. Discovering Physical Directions in Weight Space: Composing Neural PDE Experts

    cs.LG 2026-05 unverdicted novelty 7.0

    Fine-tuning neural PDE operators to regime endpoints reveals a physical direction in weight space that CCM uses to compose accurate merged models for new or extrapolated regimes from metadata or short prefixes.

  4. Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

    cs.LG 2026-05 unverdicted novelty 7.0

    DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

  5. Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights

    cs.CL 2026-05 unverdicted novelty 7.0

    TFlow enables multi-agent LLMs to collaborate via transient low-rank LoRA perturbations derived from sender activations, yielding up to 8.5 accuracy gains and 83% token reduction versus text-based baselines on Qwen3-4...

  6. CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models

    cs.CV 2026-05 unverdicted novelty 7.0

    Capability vectors extracted from parameter differences between standard and auxiliary-finetuned VLA models can be merged into pretrained weights to match auxiliary-training performance while reducing computational ov...

  7. Erase Persona, Forget Lore: Benchmarking Multimodal Copyright Unlearning in Large Vision Language Models

    cs.CV 2026-05 unverdicted novelty 7.0

    CoVUBench is the first benchmark framework for evaluating multimodal copyright unlearning in LVLMs via synthetic data, systematic variations, and a dual protocol for forgetting efficacy and utility preservation.

  8. Generalizing the Geometry of Model Merging Through Frechet Averages

    cs.LG 2026-04 unverdicted novelty 7.0

    Model merging is reframed as Fréchet averaging on manifolds whose geometry respects architectural symmetries, generalizing Fisher merging and enabling better LoRA merges.

  9. Generalizing the Geometry of Model Merging Through Frechet Averages

    cs.LG 2026-04 unverdicted novelty 7.0

    Model merging is generalized as Fréchet averaging on symmetry-invariant manifolds, containing Fisher merging as a special case and offering a new approach for LoRA adapters.

  10. Atomic-Probe Governance for Skill Updates in Compositional Robot Policies

    cs.RO 2026-04 unverdicted novelty 7.0

    A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing ...

  11. Differentially Private Model Merging

    cs.LG 2026-04 unverdicted novelty 7.0

    Post-processing via random selection or linear combination generates differentially private models for arbitrary privacy parameters from pre-trained models on the same dataset.

  12. Exploring Language-Agnosticity in Function Vectors: A Case Study in Machine Translation

    cs.CL 2026-04 unverdicted novelty 7.0

    Translation function vectors extracted from English to one target language improve correct token ranking for translations to multiple other unseen target languages in decoder-only multilingual LLMs.

  13. One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging

    cs.CL 2026-04 unverdicted novelty 7.0

    Merging fine-tuned models for multilingual translation fails because fine-tuning redistributes language-specific neurons rather than sharpening them, increasing representational divergence in output-generating layers.

  14. Internalized Reasoning for Long-Context Visual Document Understanding

    cs.CV 2026-03 unverdicted novelty 7.0

    A synthetic pipeline creates and internalizes reasoning traces in VLMs for long-context visual document understanding, with a 32B model surpassing a 235B model on MMLongBenchDoc and showing 12.4x fewer output tokens.

  15. Refusal in Language Models Is Mediated by a Single Direction

    cs.LG 2024-06 accept novelty 7.0

    Refusal in language models is mediated by a single direction in residual stream activations that can be erased to disable safety or added to elicit refusal.

  16. Scalable Token-Level Hallucination Detection in Large Language Models

    cs.CL 2026-05 unverdicted novelty 6.0

    TokenHD uses a scalable data synthesis engine and importance-weighted training to create token-level hallucination detectors that work on free-form text and scale from 0.6B to 8B parameters, outperforming larger reaso...

  17. Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models

    cs.LG 2026-05 unverdicted novelty 6.0

    Mutual Reinforcement Learning allows heterogeneous LLMs to exchange experience through mechanisms like Peer Rollout Pooling, Cross-Policy GRPO Advantage Sharing, and Success-Gated Transfer, with outcome-level sharing ...

  18. UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

    cs.CV 2026-05 unverdicted novelty 6.0

    UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.

  19. Atomic-Probe Governance for Skill Updates in Compositional Robot Policies

    cs.RO 2026-04 unverdicted novelty 6.0

    Empirical study on robosuite tasks reveals a dominant-skill effect in compositions and shows that an atomic probe approximates full revalidation for skill updates at much lower cost.

  20. Separable Expert Architecture: Toward Privacy-Preserving LLM Personalization via Composable Adapters and Deletable User Proxies

    cs.AI 2026-04 unverdicted novelty 6.0

    A separable expert architecture uses base models, LoRA adapters, and deletable per-user proxies to enable privacy-preserving personalization and deterministic unlearning in LLMs.

  21. AlignCultura: Towards Culturally Aligned Large Language Models?

    cs.CL 2026-04 unverdicted novelty 6.0

    Align-Cultura introduces the CULTURAX dataset and shows that culturally fine-tuned LLMs improve joint HHH scores by 4-6%, cut cultural failures by 18%, and gain 10-12% efficiency with minimal leakage.

  22. Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts

    cs.LG 2026-04 unverdicted novelty 6.0

    BAR trains independent domain experts via separate mid-training, SFT, and RL pipelines then composes them with a MoE router to match monolithic retraining performance at lower cost and without catastrophic forgetting.

  23. PivotMerge: Bridging Heterogeneous Multimodal Pre-training via Post-Alignment Model Merging

    cs.CV 2026-04 unverdicted novelty 6.0

    PivotMerge merges heterogeneous multimodal pre-trained models via shared-space decomposition to filter conflicts and layer-wise weights based on alignment contributions, outperforming baselines on multimodal benchmarks.

  24. Weight Patching: Toward Source-Level Mechanistic Localization in LLMs

    cs.AI 2026-04 unverdicted novelty 6.0

    Weight Patching localizes capabilities to specific parameter modules in LLMs by replacing weights from a behavior-specialized model into a base model and validating recovery via a vector-anchor interface, revealing a ...

  25. WIN-U: Woodbury-Informed Newton-Unlearning as a retain-free Machine Unlearning Framework

    cs.LG 2026-04 unverdicted novelty 6.0

    WIN-U delivers a retain-free unlearning update that approximates the gold-standard retrained model via a Woodbury-informed Newton step using only forget-set curvature information.

  26. The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

    cs.LG 2026-04 unverdicted novelty 6.0

    The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MAT...

  27. Analytic Drift Resister for Non-Exemplar Continual Graph Learning

    cs.LG 2026-04 unverdicted novelty 6.0

    ADR achieves theoretically zero-forgetting class-incremental graph learning by combining backpropagation adaptation with ridge-regression-based layer-wise merging of GNN linear transformations.

  28. GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs

    cs.CV 2026-05 unverdicted novelty 5.0

    GeoStack composes multiple domain experts into VLMs with preserved base knowledge and O(1) inference time via geometric stacking and a weight-folding property.

  29. UNSEEN: A Cross-Stack LLM Unlearning Defense against AR-LLM Social Engineering Attacks

    cs.CR 2026-04 unverdicted novelty 5.0

    UNSEEN combines AR access control, LLM unlearning to suppress profiles, and agent guardrails to defend against AR-LLM social engineering attacks, tested in a 60-person user study with 360 conversations.

  30. HiP-LoRA: Budgeted Spectral Plasticity for Robust Low-Rank Adaptation

    cs.LG 2026-04 unverdicted novelty 5.0

    HiP-LoRA decomposes LoRA updates into principal and residual spectral channels with a singular-value-weighted stability budget to reduce forgetting and interference during foundation model adaptation.

  31. MAny: Merge Anything for Multimodal Continual Instruction Tuning

    cs.LG 2026-04 unverdicted novelty 5.0

    MAny addresses dual-forgetting in multimodal continual instruction tuning via CPM and LPM merging strategies, delivering up to 8.57% accuracy gains on UCIT benchmarks without additional training.

  32. FREE-Switch: Frequency-based Dynamic LoRA Switch for Style Transfer

    cs.CV 2026-04 unverdicted novelty 5.0

    FREE-Switch dynamically switches LoRA adapters using frequency importance per diffusion step and adds semantic alignment to reduce content drift when merging specialized image generators.

  33. SHIFT: Steering Hidden Intermediates in Flow Transformers

    cs.CV 2026-04 unverdicted novelty 5.0

    SHIFT learns and applies steering vectors to selected layers and timesteps in DiT models to suppress concepts, shift styles, or bias objects while keeping image quality and prompt adherence intact.

  34. MOMO: Mars Orbital Model Foundation Model for Mars Orbital Applications

    cs.CV 2026-04 unverdicted novelty 5.0

    MOMO merges sensor-specific models from three Mars orbital instruments at matched validation loss stages to form a foundation model that outperforms ImageNet, Earth observation, sensor-specific, and supervised baselin...

Reference graph

Works this paper leans on

114 extracted references · 114 canonical work pages · cited by 32 Pith papers · 11 internal anchors

  1. [1]

    Task2vec: Task embedding for meta-learning

    Alessandro Achille, Michael Lam, Rahul Tewari, Avinash Ravichandran, Subhransu Maji, Charless C Fowlkes, Stefano Soatto, and Pietro Perona. Task2vec: Task embedding for meta-learning. In International Conference on Computer Vision (ICCV) , 2019. https: //arxiv.org/abs/1902.03545

  2. [2]

    K., Hayase, J., and Srinivasa, S

    Samuel K Ainsworth, Jonathan Hayase, and Siddhartha Srinivasa. Git re-basin: Merging mod- els modulo permutation symmetries, 2022. https://arxiv.org/abs/2209.04836

  3. [3]

    Flamingo: a Visual Language Model for Few-Shot Learning

    Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, et al. Flamingo: a visual language model for few-shot learning, 2022. https://arxiv.org/abs/2204.14198

  4. [4]

    A General Language Assistant as a Laboratory for Alignment

    Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, et al. A general language assistant as a laboratory for alignment, 2021. https://arxiv.org/abs/2112.00861

  5. [5]

    The second pascal recognising textual entailment challenge

    Roy Bar-Haim, Ido Dagan, Bill Dolan, Lisa Ferro, Danilo Giampiccolo, Bernardo Magnini, and Idan Szpektor. The second pascal recognising textual entailment challenge. In II PASCAL challenge, 2006

  6. [6]

    The fifth pascal recognizing textual entailment challenge

    Luisa Bentivogli, Peter Clark, Ido Dagan, and Danilo Giampiccolo. The fifth pascal recognizing textual entailment challenge. In TAC, 2009. https://cris.fbk.eu/handle/11582/ 5351

  7. [7]

    Loss sur- face simplexes for mode connecting volumes and fast ensembling

    Gregory Benton, Wesley Maddox, Sanae Lotfi, and Andrew Gordon Gordon Wilson. Loss sur- face simplexes for mode connecting volumes and fast ensembling. InInternational Conference on Machine Learning (ICML), 2021. https://arxiv.org/abs/2102.13042

  8. [8]

    Nuanced metrics for measuring unintended bias with real data for text classification

    Daniel Borkan, Lucas Dixon, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. Nuanced metrics for measuring unintended bias with real data for text classification. In Companion Proceedings of the 2019 World Wide Web Conference, 2019. https://arxiv.org/abs/ 1903.04561

  9. [9]

    Language Models are Few-Shot Learners

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott G...

  10. [10]

    Remote sensing image scene classification: Benchmark and state of the art

    Gong Cheng, Junwei Han, and Xiaoqiang Lu. Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the Institute of Electrical and Electronics En- gineers (IEEE), 2017. https://ieeexplore.ieee.org/abstract/document/ 7891544

  11. [11]

    Fusing finetuned models for better pretraining,

    Leshem Choshen, Elad Venezian, Noam Slonim, and Yoav Katz. Fusing finetuned models for better pretraining, 2022. https://arxiv.org/abs/2204.03044. 10 Published as a conference paper at ICLR 2023

  12. [12]

    Describing textures in the wild

    Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild. In Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2014. https://openaccess.thecvf.com/content_cvpr_2014/ html/Cimpoi_Describing_Textures_in_2014_CVPR_paper.html

  13. [13]

    A deep neural network’s loss surface contains every low-dimensional pattern, 2019

    Wojciech Marian Czarnecki, Simon Osindero, Razvan Pascanu, and Max Jaderberg. A deep neural network’s loss surface contains every low-dimensional pattern, 2019. https: //arxiv.org/abs/1912.07559

  14. [14]

    The PASCAL recognising textual entailment challenge

    Ido Dagan, Oren Glickman, and Bernardo Magnini. The pascal recognising textual entailment challenge. In Machine Learning Challenges Workshop, 2005. https://link.springer. com/chapter/10.1007/11736790_9

  15. [15]

    Editing Factual Knowledge in Language Models, September 2021

    Nicola De Cao, Wilker Aziz, and Ivan Titov. Editing factual knowledge in language models. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021. https: //arxiv.org/abs/2104.08164

  16. [16]

    Imagenet: A large- scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In Conference on Computer Vision and Pattern Recog- nition (CVPR), 2009. https://ieeexplore.ieee.org/abstract/document/ 5206848

  17. [17]

    BERT: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019. https: //aclanthology.org/N19-1423

  18. [18]

    Dodge, G

    Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, and Noah Smith. Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping, 2020. https://arxiv.org/abs/2002.06305/

  19. [19]

    Dolan and Chris Brockett

    William B. Dolan and Chris Brockett. Automatically constructing a corpus of sentential para- phrases. In International Workshop on Paraphrasing, 2005. https://aclanthology. org/I05-5002

  20. [20]

    Cold fusion: Collaborative descent for distributed multitask finetuning, 2022

    Shachar Don-Yehiya, Elad Venezian, Colin Raffel, Noam Slonim, Yoav Katz, and Leshem Choshen. Cold fusion: Collaborative descent for distributed multitask finetuning, 2022. https://arxiv.org/abs/2212.01378

  21. [21]

    Essentially no barriers in neural network energy landscape

    Felix Draxler, Kambis Veschgini, Manfred Salmhofer, and Fred Hamprecht. Essentially no barriers in neural network energy landscape. InInternational Conference on Machine Learning (ICML), 2018. https://arxiv.org/abs/1803.00885

  22. [22]

    How do humans sketch objects? ACM Trans- actions on graphics (TOG), 2012

    Mathias Eitz, James Hays, and Marc Alexa. How do humans sketch objects? ACM Trans- actions on graphics (TOG), 2012. https://dl.acm.org/doi/10.1145/2185520. 2185540

  23. [23]

    arXiv preprint arXiv:2110.06296 , year=

    Rahim Entezari, Hanie Sedghi, Olga Saukh, and Behnam Neyshabur. The role of permutation invariance in linear mode connectivity of neural networks. In International Conference on Learning Representations (ICLR), 2022. https://arxiv.org/abs/2110.06296

  24. [24]

    Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir R

    Alexander R. Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir R. Radev. Multi-news: a large-scale multi-document summarization dataset and abstractive hierarchical model, 2019. https://arxiv.org/abs/1906.01749

  25. [25]

    Deep ensembles: A loss landscape perspective, 2019

    Stanislav Fort, Huiyi Hu, and Balaji Lakshminarayanan. Deep ensembles: A loss landscape perspective, 2019. https://arxiv.org/abs/1912.02757

  26. [26]

    Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel

    Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M Roy, and Surya Ganguli. Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel. In Advances in Neural Information Processing Systems (NeurIPS) , 2020. https://arxiv.org/abs/2010. 15110. 11...

  27. [27]

    Linear mode connectivity and the lottery ticket hypothesis

    Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, and Michael Carbin. Linear mode connectivity and the lottery ticket hypothesis. In International Conference on Machine Learning (ICML), 2020. https://proceedings.mlr.press/v119/frankle20a. html

  28. [28]

    Loss surfaces, mode connectivity, and fast ensembling of dnns

    Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov, and Andrew Gordon Wilson. Loss surfaces, mode connectivity, and fast ensembling of dnns. In Advances in Neural Information Processing Systems (NeurIPS) , 2018. https://arxiv.org/abs/1802. 10026

  29. [29]

    Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. Re- alToxicityPrompts: Evaluating neural toxic degeneration in language models. In Find- ings of the Association for Computational Linguistics: EMNLP 2020 , 2020. https: //aclanthology.org/2020.findings-emnlp.301

  30. [30]

    Lm-debugger: An interactive tool for inspection and intervention in transformer-based language models, 2022

    Mor Geva, Avi Caciularu, Guy Dar, Paul Roit, Shoval Sadde, Micah Shlain, Bar Tamir, and Yoav Goldberg. Lm-debugger: An interactive tool for inspection and intervention in transformer-based language models, 2022. https://arxiv.org/abs/2204.12130

  31. [31]

    The third pascal recog- nizing textual entailment challenge

    Danilo Giampiccolo, Bernardo Magnini, Ido Dagan, and Bill Dolan. The third pascal recog- nizing textual entailment challenge. In ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, 2007. https://aclanthology.org/W07-1401/

  32. [32]

    Improving alignment of dialogue agents via targeted human judgements, 2022

    Amelia Glaese, Nat McAleese, Maja Trebacz, John Aslanides, Vlad Firoiu, Timo Ewalds, Mari- beth Rauh, Laura Weidinger, Martin Chadwick, Phoebe Thacker, Lucy Campbell-Gillingham, Jonathan Uesato, Po-Sen Huang, Ramona Comanescu, Fan Yang, Abigail See, Sumanth Dathathri, Rory Greig, Charlie Chen, Doug Fritz, Jaume Sanchez Elias, Richard Green, Sona Mokra, Ni...

  33. [33]

    Model patching: Closing the sub- group performance gap with data augmentation, 2020.https://arxiv.org/abs/2008

    Karan Goel, Albert Gu, Yixuan Li, and Christopher R ´e. Model patching: Closing the sub- group performance gap with data augmentation, 2020.https://arxiv.org/abs/2008. 06775

  34. [34]

    Eternal sunshine of the spotless net: Selective forgetting in deep networks

    Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In Conference on Computer Vision and Pattern Recognition (CVPR), 2020. https://arxiv.org/abs/1911.04933

  35. [35]

    Detoxify, 2020

    Laura Hanu and Unitary team. Detoxify, 2020. https://github.com/unitaryai/ detoxify

  36. [36]

    EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification

    Patrick Helber, Benjamin Bischke, Andreas Dengel, and Damian Borth. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2019. https:// arxiv.org/abs/1709.00029

  37. [37]

    Natural adversarial examples

    Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. Natural adversarial examples. In Conference on Computer Vision and Pattern Recognition (CVPR), 2021

  38. [38]

    Training Compute-Optimal Large Language Models

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models, 2022. https://arxiv.org/abs/ 2203.15556

  39. [39]

    Patching open-vocabulary models by interpolating weights

    Gabriel Ilharco, Mitchell Wortsman, Samir Yitzhak Gadre, Shuran Song, Hannaneh Hajishirzi, Simon Kornblith, Ali Farhadi, and Ludwig Schmidt. Patching open-vocabulary models by interpolating weights. In Advances in Neural Information Processing Systems (NeurIPS), 2022. https://arXiv.org/abs/2208.05592. 12 Published as a conference paper at ICLR 2023

  40. [40]

    Averaging weights leads to wider optima and better generalization

    Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. Averaging weights leads to wider optima and better generalization. In Conference on Uncertainty in Artificial Intelligence (UAI), 2018. https://arxiv.org/abs/1803. 05407

  41. [41]

    Neural tangent kernel: Convergence and generalization in neural networks

    Arthur Jacot, Franck Gabriel, and Cl´ement Hongler. Neural tangent kernel: Convergence and generalization in neural networks. In Advances in Neural Information Processing Systems (NeurIPS), 2018. https://arxiv.org/abs/1806.07572

  42. [42]

    13 Published as a conference paper at ICLR 2026 Sheng Jin, Xueying Jiang, Jiaxing Huang, Lewei Lu, and Shijian Lu

    Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V Le, Yunhsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representa- tion learning with noisy text supervision. In International Conference on Machine Learning (ICML), 2021. https://arxiv.org/abs/2102.05918

  43. [43]

    Linear connectivity reveals generalization strategies, 2022

    Jeevesh Juneja, Rachit Bansal, Kyunghyun Cho, Jo ˜ao Sedoc, and Naomi Saphra. Linear connectivity reveals generalization strategies, 2022. https://arxiv.org/abs/2205. 12411/

  44. [44]

    In conversation with artificial intelligence: aligning language models with human values, 2022

    Atoosa Kasirzadeh and Iason Gabriel. In conversation with artificial intelligence: aligning language models with human values, 2022. https://arxiv.org/abs/2209.00731

  45. [45]

    UNIFIEDQA: Crossing format boundaries with a single QA system

    Daniel Khashabi, Sewon Min, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, and Hannaneh Hajishirzi. UNIFIEDQA: Crossing format boundaries with a single QA system. In Findings of the Association for Computational Linguistics (EMNLP) , 2020. https: //aclanthology.org/2020.findings-emnlp.171

  46. [46]

    Qasc: A dataset for question answering via sentence composition, 2020

    Tushar Khot, Peter Clark, Michal Guerquin, Peter Jansen, and Ashish Sabharwal. Qasc: A dataset for question answering via sentence composition, 2020. https://arxiv.org/ abs/1910.11473v2

  47. [47]

    3d object representations for fine- grained categorization

    Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representations for fine- grained categorization. In International Conference on Computer Vision Workshops (ICML),

  48. [48]

    https://www.cv-foundation.org/openaccess/content_iccv_ workshops_2013/W19/html/Krause_3D_Object_Representations_ 2013_ICCV_paper.html

  49. [49]

    Explaining landscape connectivity of low-cost solutions for multilayer nets

    Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu, Rong Ge, and Sanjeev Arora. Explaining landscape connectivity of low-cost solutions for multilayer nets. Advances in Neural Information Processing Systems (NeurIPS), 2019. https://arxiv. org/abs/1906.06247

  50. [50]

    RACE: Large- scale ReAding comprehension dataset from examinations

    Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. RACE: Large- scale ReAding comprehension dataset from examinations. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2017. https://aclanthology. org/D17-1082

  51. [51]

    Transforming task representations to perform novel tasks

    Andrew K Lampinen and James L McClelland. Transforming task representations to perform novel tasks. Proceedings of the National Academy of Sciences, 2020

  52. [52]

    The mnist database of handwritten digits, 1998

    Yann LeCun. The mnist database of handwritten digits, 1998. http://yann.lecun. com/exdb/mnist/

  53. [53]

    The power of scale for parameter-efficient prompt tuning

    Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3045–3059, Online and Punta Cana, Dominican Republic, November

  54. [54]

    doi: 10.18653/v1/2021.emnlp-main.243

    Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.243. URL https://aclanthology.org/2021.emnlp-main.243

  55. [55]

    Datasets: A community library for natural language processing

    Quentin Lhoest, Albert Villanova del Moral, Yacine Jernite, Abhishek Thakur, Patrick von Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, Joe Davison, Mario ˇSaˇsko, Gunjan Chhablani, Bhavitvya Malik, Simon Brandeis, Teven Le Scao, Victor Sanh, Canwen Xu, Nicolas Patry, Angelina McMillan-Major, Philipp Schmid, Sylvain Gugge...

  56. [56]

    Visualizing the loss landscape of neural nets

    Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. Visualizing the loss landscape of neural nets. Advances in Neural Information Processing Systems (NeurIPS) ,

  57. [57]

    https://arxiv.org/abs/1712.09913

  58. [58]

    Branch-train-merge: Embarrassingly parallel training of expert language models.arXiv preprint arXiv:2208.03306, 2022

    Margaret Li, Suchin Gururangan, Tim Dettmers, Mike Lewis, Tim Althoff, Noah A Smith, and Luke Zettlemoyer. Branch-train-merge: Embarrassingly parallel training of expert language models, 2022. https://arxiv.org/abs/2208.03306

  59. [59]

    CommonGen: A constrained text generation challenge for generative com- monsense reasoning

    Bill Yuchen Lin, Wangchunshu Zhou, Ming Shen, Pei Zhou, Chandra Bhagavatula, Yejin Choi, and Xiang Ren. CommonGen: A constrained text generation challenge for generative com- monsense reasoning. In Findings of the Association for Computational Linguistics: EMNLP,

  60. [60]

    https://www.aclweb.org/anthology/2020.findings-emnlp.165

  61. [61]

    Smith, and Yejin Choi

    Alisa Liu, Maarten Sap, Ximing Lu, Swabha Swayamdipta, Chandra Bhagavatula, Noah A. Smith, and Yejin Choi. DExperts: Decoding-time controlled text generation with experts and anti-experts. In Annual Meeting of the Association for Computational Linguistics (ACL), 2021. https://aclanthology.org/2021.acl-long.522

  62. [62]

    Decoupled weight decay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR) , 2019. URL https://openreview. net/forum?id=Bkg6RiCqY7

  63. [63]

    Quark: Controllable text generation with reinforced unlearning,

    Ximing Lu, Sean Welleck, Liwei Jiang, Jack Hessel, Lianhui Qin, Peter West, Prithviraj Ammanabrolu, and Yejin Choi. Quark: Controllable text generation with reinforced unlearning,

  64. [64]

    https://arxiv.org/abs/2205.13636

  65. [65]

    arXiv preprint arXiv:2211.08422 , year=

    Ekdeep Singh Lubana, Eric J Bigelow, Robert P Dick, David Krueger, and Hidenori Tanaka. Mechanistic mode connectivity, 2022. https://arxiv.org/abs/2211.08422

  66. [66]

    Analyzing monotonic linear interpolation in neural network loss landscapes, 2021

    James Lucas, Juhan Bae, Michael R Zhang, Stanislav Fort, Richard Zemel, and Roger Grosse. Analyzing monotonic linear interpolation in neural network loss landscapes, 2021. https: //arxiv.org/abs/2104.11044

  67. [67]

    Maas, Raymond E

    Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y . Ng, and Christo- pher Potts. Learning word vectors for sentiment analysis. In Annual Meeting of the Association for Computational Linguistics (ACL), 2011. http://www.aclweb.org/anthology/ P11-1015

  68. [68]

    Merging models with fisher-weighted averaging

    Michael Matena and Colin Raffel. Merging models with fisher-weighted averaging. In Advances in Neural Information Processing Systems (NeurIPS), 2021. https://arxiv. org/abs/2111.09832

  69. [69]

    Comparison of the predicted and observed secondary structure of t4 phage lysozyme

    Brian W Matthews. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure, 1975. https://www. sciencedirect.com/science/article/abs/pii/0005279575901099

  70. [70]

    2013 , isbn =

    Julian McAuley and Jure Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In ACM Conference on Recommender Systems, 2013. https: //dl.acm.org/doi/10.1145/2507157.2507163

  71. [71]

    Pointer Sentinel Mixture Models

    Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models, 2016. https://arxiv.org/abs/1609.07843

  72. [72]

    MetaICL: Learn- ing to learn in context

    Sewon Min, Mike Lewis, Luke Zettlemoyer, and Hannaneh Hajishirzi. MetaICL: Learn- ing to learn in context. In Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022. https://aclanthology.org/2022. naacl-main.201. 14 Published as a conference paper at ICLR 2023

  73. [73]

    Cross-task general- ization via natural language crowdsourcing instructions

    Swaroop Mishra, Daniel Khashabi, Chitta Baral, and Hannaneh Hajishirzi. Cross-task general- ization via natural language crowdsourcing instructions. In Annual Meeting of the Associa- tion for Computational Linguistics (ACL), 2022. https://aclanthology.org/2022. acl-long.244

  74. [74]

    Fast model editing at scale

    Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D Manning. Fast model editing at scale. In International Conference on Learning Representations (ICLR), 2021. https://arxiv.org/abs/2110.11309

  75. [75]

    Memory-based model editing at scale

    Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D Manning, and Chelsea Finn. Memory-based model editing at scale. In International Conference on Machine Learning,

  76. [76]

    https://arxiv.org/abs/2206.06520

  77. [77]

    Fixing model bugs with natural language patches

    Shikhar Murty, Christopher D Manning, Scott Lundberg, and Marco Tulio Ribeiro. Fixing model bugs with natural language patches. In ACL Workshop on Learning with Natural Lan- guage Supervision, 2022. https://openreview.net/forum?id=blJrg3WvvDV

  78. [78]

    Read- ing digits in natural images with unsupervised feature learning

    Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Read- ing digits in natural images with unsupervised feature learning. InAdvances in Neural Informa- tion Processing Systems (NeurIPS) Workshops, 2011. https://storage.googleapis. com/pub-tools-public-publication-data/pdf/37648.pdf

  79. [79]

    What is being transferred in transfer learning? In Advances in Neural Information Processing Systems (NeurIPS), 2020

    Behnam Neyshabur, Hanie Sedghi, and Chiyuan Zhang. What is being transferred in transfer learning? In Advances in Neural Information Processing Systems (NeurIPS), 2020. https: //arxiv.org/abs/2008.11687

  80. [80]

    Training language models to follow instructions with human feedback, 2022

    Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback, 2022. https://arxiv.org/abs/2203. 02155

Showing first 80 references.