pith. sign in

arxiv: 2408.01119 · v4 · pith:PEYCWU4Unew · submitted 2024-08-02 · 💻 cs.CL

Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer

Pith reviewed 2026-05-23 22:10 UTC · model grok-4.3

classification 💻 cs.CL
keywords task prompt vectorssoft prompt tuningprompt arithmeticmulti-task learninglow-resource adaptationnatural language understandingparameter-efficient fine-tuning
0
0 comments X

The pith

Task prompt vectors formed by subtracting random initialization from tuned soft prompts enable effective low-resource initialization and arithmetic combination across tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the element-wise difference between a tuned soft prompt and its random starting weights isolates task-specific information. This vector transfers across different random seeds and two language model architectures, allowing it to initialize prompt tuning on related tasks with limited data. It also supports simple addition of vectors from multiple tasks to achieve multi-task results without full retraining. A sympathetic reader would care because current soft-prompt methods lose modularity when adding tasks, while full-model task vectors already show arithmetic benefits; this approach aims to bring similar reuse to the more efficient prompt-tuning setting.

Core claim

Task prompt vectors are created by the element-wise difference between the weights of tuned soft-prompts and their random initialization. These vectors isolate transferable task-specific information that can initialize prompt tuning effectively on similar tasks in low-resource settings. The vectors prove independent of the particular random initialization and work across two different language model architectures, which permits arithmetic operations such as addition of vectors from multiple tasks to deliver competitive multi-task performance.

What carries the argument

Task prompt vectors, defined as the element-wise difference between tuned soft-prompt weights and their random initialization, isolate transferable task information for initialization and arithmetic.

If this is right

  • Task prompt vectors can initialize prompt tuning effectively on similar tasks using limited data.
  • Task prompt vectors remain independent of the random initialization seed used during original tuning.
  • Arithmetic addition of task prompt vectors from multiple tasks yields competitive multi-task performance.
  • The vectors transfer across at least two different language model architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Precomputed task prompt vectors could be stored and reused to lower the cost of adapting models to new task combinations.
  • The subtraction technique might extend to other parameter-efficient methods such as adapters or LoRA modules.
  • Independence from architecture opens the possibility of transferring task vectors between models of different sizes or families.

Load-bearing premise

The element-wise difference between a tuned soft prompt and its random initialization must capture only transferable task-specific information that remains useful when added to a new random prompt on a related task.

What would settle it

An experiment showing that task prompt vectors from similar tasks produce no improvement over random initialization when used to start prompt tuning on a new related task, or that their addition yields worse results than training a single combined prompt.

Figures

Figures reproduced from arXiv: 2408.01119 by Ivan Srba, Maria Bielikova, Robert Belanec, Simon Ostermann.

Figure 1
Figure 1. Figure 1: An illustration of task prompt vector creation and the combination via addition [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of average cosine similarities between task prompts and task prompt [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of relative exact match performance of combinations of task prompt [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Test results of training T5-base model with random, single, and multi-task soft [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
read the original abstract

Prompt tuning is an efficient solution for training large language models (LLMs). However, current soft-prompt-based methods often sacrifice multi-task modularity, requiring the training process to be fully or partially repeated for each newly added task. While recent work on task vectors applied arithmetic operations on full model weights to achieve the desired multi-task performance, a similar approach for soft-prompts is still missing. To this end, we introduce Task Prompt Vectors, created by element-wise difference between weights of tuned soft-prompts and their random initialization. Experimental results on 12 NLU datasets show that task prompt vectors can be used in low-resource settings to effectively initialize prompt tuning on similar tasks. In addition, we show that task prompt vectors are independent of the random initialization of prompt tuning on 2 different language model architectures. This allows prompt arithmetics with the pre-trained vectors from different tasks. In this way, we provide a competitive alternative to state-of-the-art baselines by arithmetic addition of task prompt vectors from multiple tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Task Prompt Vectors (TPVs), defined as the element-wise difference between a tuned soft prompt and its random initialization. It reports experimental results on 12 NLU datasets showing that TPVs enable effective initialization of prompt tuning for similar tasks in low-resource settings, that TPVs are independent of the random seed and of the underlying LM architecture (tested on two models), and that arithmetic combinations of TPVs from multiple tasks yield multi-task performance competitive with state-of-the-art baselines.

Significance. If the empirical claims hold under rigorous controls, the work supplies a modular mechanism for transferring and composing task-specific knowledge directly in the soft-prompt space. This extends the task-vector paradigm from full-model weights to the more parameter-efficient prompt-tuning regime and could reduce the need to retrain prompts from scratch when adding related tasks or combining multiple tasks.

major comments (2)
  1. [Abstract / Experimental results] The abstract and high-level claims assert positive results on 12 NLU datasets and independence from initialization/architecture, yet no information is supplied on the choice of baselines, data splits, number of random seeds, or any statistical significance tests. Because these details are required to evaluate whether the reported gains are reliable and not confounded, the support for the central empirical claims cannot be assessed from the provided description.
  2. [§3 (method) and §4 (experiments)] The weakest assumption—that the element-wise difference isolates transferable task information independent of seed and architecture—is load-bearing for both the initialization and arithmetic claims. Without quantitative evidence (e.g., cosine similarity or downstream performance variance across multiple seeds and the two architectures), it remains unclear whether the observed benefits are robust or specific to the particular experimental conditions.
minor comments (1)
  1. [Tables/Figures] Ensure that all tables and figures reporting performance numbers include error bars or standard deviations across runs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on clarifying experimental details and strengthening evidence for independence claims. We address each point below and will revise the manuscript to improve transparency and add requested analyses.

read point-by-point responses
  1. Referee: [Abstract / Experimental results] The abstract and high-level claims assert positive results on 12 NLU datasets and independence from initialization/architecture, yet no information is supplied on the choice of baselines, data splits, number of random seeds, or any statistical significance tests. Because these details are required to evaluate whether the reported gains are reliable and not confounded, the support for the central empirical claims cannot be assessed from the provided description.

    Authors: We agree the abstract lacks these specifics. The full manuscript (Section 4) details the baselines (standard single-task prompt tuning and multi-task methods), data splits (official splits for the 12 NLU datasets), use of multiple random seeds with averaged results, and includes performance metrics. To address the concern directly, we will revise the abstract to note that results are averaged over 3 seeds with standard deviations and that statistical tests (paired t-tests) confirm significance of gains where reported. This makes the protocol explicit at the high level without altering the claims. revision: yes

  2. Referee: [§3 (method) and §4 (experiments)] The weakest assumption—that the element-wise difference isolates transferable task information independent of seed and architecture—is load-bearing for both the initialization and arithmetic claims. Without quantitative evidence (e.g., cosine similarity or downstream performance variance across multiple seeds and the two architectures), it remains unclear whether the observed benefits are robust or specific to the particular experimental conditions.

    Authors: The manuscript demonstrates independence via consistent transfer performance across seeds and the two architectures (T5 and RoBERTa). However, we acknowledge that direct quantitative measures such as cosine similarity between TPVs from different seeds or explicit variance tables would provide stronger support. In revision we will add these analyses: cosine similarities of TPVs across seeds for the same task, and performance variance tables across seeds and architectures, to confirm the element-wise difference isolates task information robustly. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces Task Prompt Vectors as an empirical construction (element-wise difference between tuned soft-prompt weights and random initialization) and validates their use for low-resource initialization and multi-task arithmetic via experiments on 12 NLU datasets across two model architectures. No derivation chain, equations, or predictions are presented that reduce the reported outcomes to definitional equivalence with the inputs; claims rest on experimental results rather than self-referential fitting or self-citation loops. The method is self-contained against external benchmarks with no load-bearing self-citations or ansatzes identified.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the empirical utility of the newly defined task prompt vectors; the abstract introduces no explicit free parameters, mathematical axioms, or additional invented entities beyond the vectors themselves.

invented entities (1)
  • Task Prompt Vectors no independent evidence
    purpose: Capture task-specific information transferable across similar tasks via initialization and arithmetic
    Defined directly as element-wise difference between tuned and random-initialized soft prompts; no independent evidence outside the reported experiments is mentioned.

pith-pipeline@v0.9.0 · 5709 in / 1254 out tokens · 34126 ms · 2026-05-23T22:10:10.262749+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities

    cs.LG 2024-08 accept novelty 4.0

    The paper introduces a new taxonomy for model merging methods and reviews their applications in LLMs, MLLMs, continual learning, multi-task learning, and other subfields while outlining open challenges.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · cited by 1 Pith paper · 5 internal anchors

  1. [1]

    In: Goldberg, Y ., Kozareva, Z., Zhang, Y

    Asai, A., Salehi, M., Peters, M., Hajishirzi, H.: ATTEMPT: Parameter-efficient multi-task tuning via attentional mixtures of soft prompts. In: Goldberg, Y ., Kozareva, Z., Zhang, Y . (eds.) Proceedings of the 2022 Conference on EMNLP. pp. 6655–6672. ACL, Abu Dhabi, United Arab Emirates (Dec 2022). https://doi.org/10.18653/v1/2022.emnlp-main. 446

  2. [2]

    In: Aberer, K., Choi, K.S., Noy, N., Allemang, D., Lee, K.I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P

    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: Dbpedia: A nucleus for a web of open data. In: Aberer, K., Choi, K.S., Noy, N., Allemang, D., Lee, K.I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) The Semantic Web. pp. 722–735. Springer Berlin Heidelberg, Berlin, Heidelberg (2007)

  3. [3]

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Bi, X., Chen, D., Chen, G., Chen, S., Dai, D., Deng, C., Ding, H., Dong, K., Du, Q., Fu, Z., et al.: Deepseek llm: Scaling open-source language models with longtermism. arXiv preprint arXiv:2401.02954 (2024)

  4. [4]

    In: Proceedings of the 2015 Conference on EMNLP (EMNLP)

    Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on EMNLP (EMNLP). ACL (2015)

  5. [5]

    Advances in neural information processing systems 33, 1877–1901 (2020)

    Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901 (2020)

  6. [6]

    S em E val-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation

    Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: SemEval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In: Bethard, S., Carpuat, M., Apidianaki, M., Mohammad, S.M., Cer, D., Jurgens, D. (eds.) Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). pp. 1–14. ACL, Vancou...

  7. [7]

    arXiv preprint arXiv:2311.09344 (2023)

    Chronopoulou, A., Pfeiffer, J., Maynez, J., Wang, X., Ruder, S., Agrawal, P.: Language and task arithmetic with parameter-efficient layers for zero-shot summarization. arXiv preprint arXiv:2311.09344 (2023)

  8. [8]

    No Language Left Behind: Scaling Human-Centered Machine Translation

    Costa-Jussà, M.R., Cross, J., Çelebi, O., Elbayad, M., Heafield, K., Heffernan, K., Kalbassi, E., Lam, J., Licht, D., Maillard, J., et al.: No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672 (2022) 16 Belanec et al

  9. [9]

    Davari, M., Belilovsky, E.: Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks (Dec 2023), http://arxiv.org/abs/2312.06795, arXiv:2312.06795 [cs]

  10. [10]

    In: Proceedings of the International Workshop on Paraphrasing (2005)

    Dolan, W.B., Brockett, C.: Automatically constructing a corpus of sentential paraphrases. In: Proceedings of the International Workshop on Paraphrasing (2005)

  11. [11]

    The Llama 3 Herd of Models

    Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al.: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)

  12. [12]

    Journal of the American Statistical Association 54(287), 613–621 (1959)

    Dunn, O.J.: Confidence intervals for the means of dependent, normally distributed variables. Journal of the American Statistical Association 54(287), 613–621 (1959)

  13. [13]

    Fourrier, C., Habib, N., Wolf, T., Tunstall, L.: Lighteval: A lightweight framework for llm evaluation (2023), https://github.com/huggingface/lighteval

  14. [14]

    In: ICML

    Frankle, J., Dziugaite, G.K., Roy, D., Carbin, M.: Linear mode connectivity and the lottery ticket hypothesis. In: ICML. pp. 3259–3269. PMLR (2020)

  15. [15]

    PPT : Pre-trained prompt tuning for few-shot learning

    Gu, Y ., Han, X., Liu, Z., Huang, M.: PPT: Pre-trained prompt tuning for few-shot learning. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Proceedings of the 60th Annual Meeting of the ACL (V olume 1: Long Papers). pp. 8410–8423. ACL, Dublin, Ireland (May 2022). https://doi.org/10.18653/v1/2022.acl-long.576

  16. [16]

    arXiv preprint arXiv:2502.10140 (2025)

    Gurgurov, D., Vykopal, I., van Genabith, J., Ostermann, S.: Small models, big impact: Efficient corpus and graph-based adaptation of small multilingual language models for low-resource languages. arXiv preprint arXiv:2502.10140 (2025)

  17. [17]

    In: ICML

    Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., Gelly, S.: Parameter-efficient transfer learning for nlp. In: ICML. pp. 2790–

  18. [18]

    In: Proceedings of the First International Conference on Human Language Technology Research (2001)

    Hovy, E., Gerber, L., Hermjakob, U., Lin, C.Y ., Ravichandran, D.: Toward semantics-based answer pinpointing. In: Proceedings of the First International Conference on Human Language Technology Research (2001)

  19. [19]

    In: ICLR (2022), https://openreview

    Hu, E.J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y ., Wang, S., Wang, L., Chen, W.: LoRA: Low-rank adaptation of large language models. In: ICLR (2022), https://openreview. net/forum?id=nZeVKeeFYf9

  20. [20]

    In: The Eleventh ICLR (2022)

    Ilharco, G., Ribeiro, M.T., Wortsman, M., Schmidt, L., Hajishirzi, H., Farhadi, A.: Editing models with task arithmetic. In: The Eleventh ICLR (2022)

  21. [21]

    In: AAAI Conference on Artificial Intelligence (2018), https://api

    Khot, T., Sabharwal, A., Clark, P.: Scitail: A textual entailment dataset from science ques- tion answering. In: AAAI Conference on Artificial Intelligence (2018), https://api. semanticscholar.org/CorpusID:24462950

  22. [22]

    arXiv preprint arXiv:2404.15737 (2024)

    Klimaszewski, M., Andruszkiewicz, P., Birch, A.: No train but gain: Language arithmetic for training-free language adapters enhancement. arXiv preprint arXiv:2404.15737 (2024)

  23. [23]

    Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Xin Zhao, and Ji-Rong Wen

    Lee, H., Jeong, M., Yun, S.Y ., Kim, K.E.: Bayesian multi-task transfer learning for soft prompt tuning. In: Bouamor, H., Pino, J., Bali, K. (eds.) Findings of the ACL: EMNLP 2023. pp. 4942–4958. ACL, Singapore (Dec 2023). https://doi.org/10.18653/v1/2023. findings-emnlp.329

  24. [24]

    URL https://aclanthology.org/2021

    Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. In: Moens, M.F., Huang, X., Specia, L., Yih, S.W.t. (eds.) Proceedings of the 2021 Conference on EMNLP. pp. 3045–3059. ACL, Online and Punta Cana, Dominican Republic (Nov 2021). https://doi.org/10.18653/v1/2021.emnlp-main.243

  25. [25]

    Li, M., Gururangan, S., Dettmers, T., Lewis, M., Althoff, T., Smith, N.A., Zettlemoyer, L.: Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models (Aug 2022), http://arxiv.org/abs/2208.03306, arXiv:2208.03306 [cs]

  26. [26]

    In: Zong, C., Xia, F., Li, W., Navigli, R

    Li, X.L., Liang, P.: Prefix-tuning: Optimizing continuous prompts for generation. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Proceedings of the 59th Annual Meeting of the ACL and Task Prompt Vectors 17 the 11th International Joint Conference on Natural Language Processing (V olume 1: Long Papers). pp. 4582–4597. ACL, Online (Aug 2021). https://doi....

  27. [27]

    In: COLING 2002: The 19th International Conference on Computational Linguistics (2002)

    Li, X., Roth, D.: Learning question classifiers. In: COLING 2002: The 19th International Conference on Computational Linguistics (2002)

  28. [28]

    AI Open (2023)

    Liu, X., Zheng, Y ., Du, Z., Ding, M., Qian, Y ., Yang, Z., Tang, J.: Gpt understands, too. AI Open (2023)

  29. [29]

    RoBERTa: A Robustly Optimized BERT Pretraining Approach

    Liu, Y ., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V .: Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  30. [30]

    In: Lin, D., Matsumoto, Y ., Mihalcea, R

    Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y ., Potts, C.: Learning word vectors for sentiment analysis. In: Lin, D., Matsumoto, Y ., Mihalcea, R. (eds.) Proceedings of the 49th Annual Meeting of the ACL: Human Language Technologies. pp. 142–150. ACL, Portland, Oregon, USA (Jun 2011)

  31. [31]

    Matena, M., Raffel, C.: Merging Models with Fisher-Weighted Averaging (Aug 2022),http: //arxiv.org/abs/2111.09832, arXiv:2111.09832 [cs]

  32. [32]

    Advances in Neural Information Processing Systems 36 (2024)

    Ortiz-Jimenez, G., Favero, A., Frossard, P.: Task arithmetic in the tangent space: Improved editing of pre-trained models. Advances in Neural Information Processing Systems 36 (2024)

  33. [33]

    arXiv preprint arXiv:2402.12819 (2024)

    Pecher, B., Srba, I., Bielikova, M.: Comparing specialised small and general large language models on text classification: 100 labelled samples to achieve break-even performance. arXiv preprint arXiv:2402.12819 (2024)

  34. [34]

    In: Merlo, P., Tiedemann, J., Tsarfaty, R

    Pfeiffer, J., Kamath, A., Rücklé, A., Cho, K., Gurevych, I.: AdapterFusion: Non-destructive task composition for transfer learning. In: Merlo, P., Tiedemann, J., Tsarfaty, R. (eds.) Proceed- ings of the 16th Conference of the European Chapter of the ACL: Main V olume. pp. 487–503. ACL, Online (Apr 2021). https://doi.org/10.18653/v1/2021.eacl-main. 39

  35. [35]

    IEEE/ACM Trans

    Qin, Y ., Wang, X., Su, Y ., Lin, Y ., Ding, N., Yi, J., Chen, W., Liu, Z., Li, J., Hou, L., Li, P., Sun, M., Zhou, J.: Exploring universal intrinsic task subspace for few-shot learning via prompt tuning. IEEE/ACM Trans. Audio, Speech and Lang. Proc. 32, 3631–3643 (Jul 2024). https://doi.org/10.1109/TASLP.2024.3430545, https://doi.org/10. 1109/TASLP.2024.3430545

  36. [36]

    OpenAI blog 1(8), 9 (2019)

    Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)

  37. [37]

    Journal of machine learning research 21(140), 1–67 (2020)

    Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y ., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research 21(140), 1–67 (2020)

  38. [38]

    In: Gurevych, I., Miyao, Y

    Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: Unanswerable questions for SQuAD. In: Gurevych, I., Miyao, Y . (eds.) Proceedings of the 56th Annual Meeting of the ACL (V olume 2: Short Papers). pp. 784–789. ACL, Melbourne, Australia (Jul 2018). https://doi.org/10.18653/v1/P18-2124

  39. [39]

    In: ICML

    Ramé, A., Ahuja, K., Zhang, J., Cord, M., Bottou, L., Lopez-Paz, D.: Model ratatouille: Recycling diverse models for out-of-distribution generalization. In: ICML. pp. 28656–28679. PMLR (2023)

  40. [40]

    In: The Twelfth ICLR (2024), https://openreview.net/forum?id=KjegfPGRde

    Shi, Z., Lipani, A.: DePT: Decomposed prompt tuning for parameter-efficient fine-tuning. In: The Twelfth ICLR (2024), https://openreview.net/forum?id=KjegfPGRde

  41. [41]

    In: Proceedings of the 2013 conference on EMNLP

    Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y ., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on EMNLP. pp. 1631–1642 (2013)

  42. [42]

    03053, arXiv:2305.03053 [cs] 18 Belanec et al

    Stoica, G., Bolya, D., Bjorner, J., Ramesh, P., Hearn, T., Hoffman, J.: ZipIt! Merging Models from Different Tasks without Training (Mar 2024), http://arxiv.org/abs/2305. 03053, arXiv:2305.03053 [cs] 18 Belanec et al

  43. [43]

    Biometrika pp

    Student: The probable error of a mean. Biometrika pp. 1–25 (1908)

  44. [44]

    In: Carpuat, M., de Marneffe, M.C., Meza Ruiz, I.V

    Su, Y ., Wang, X., Qin, Y ., Chan, C.M., Lin, Y ., Wang, H., Wen, K., Liu, Z., Li, P., Li, J., Hou, L., Sun, M., Zhou, J.: On transferability of prompt tuning for natural language processing. In: Carpuat, M., de Marneffe, M.C., Meza Ruiz, I.V . (eds.) Proceedings of the 2022 Conference of the North American Chapter of the ACL: Human Language Technologies....

  45. [45]

    In: Muresan, S., Nakov, P., Villavicencio, A

    Vu, T., Lester, B., Constant, N., Al-Rfou’, R., Cer, D.: SPoT: Better frozen model adaptation through soft prompt transfer. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Proceedings of the 60th Annual Meeting of the ACL (V olume 1: Long Papers). pp. 5039–5059. ACL, Dublin, Ireland (May 2022). https://doi.org/10.18653/v1/2022.acl-long.346

  46. [46]

    Proceedings of the 2018

    Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: GLUE: A multi-task benchmark and analysis platform for natural language understanding. In: Linzen, T., Chrupała, G., Alishahi, A. (eds.) Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. pp. 353–355. ACL, Brussels, Belgium (Nov 2018). ht...

  47. [47]

    In: The Eleventh ICLR (2023), https:// openreview.net/forum?id=Nk2pDtuhTq

    Wang, Z., Panda, R., Karlinsky, L., Feris, R., Sun, H., Kim, Y .: Multitask prompt tuning enables parameter-efficient transfer learning. In: The Eleventh ICLR (2023), https:// openreview.net/forum?id=Nk2pDtuhTq

  48. [48]

    Transactions of the ACL 7, 625–641 (2019)

    Warstadt, A., Singh, A., Bowman, S.R.: Neural network acceptability judgments. Transactions of the ACL 7, 625–641 (2019). https://doi.org/10.1162/tacl_a_00290

  49. [49]

    A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

    Williams, A., Nangia, N., Bowman, S.: A broad-coverage challenge corpus for sentence understanding through inference. In: Walker, M., Ji, H., Stent, A. (eds.) Proceedings of the 2018 Conference of the North American Chapter of the ACL: Human Language Technologies, V olume 1 (Long Papers). pp. 1112–1122. ACL, New Orleans, Louisiana (Jun 2018).https: //doi....

  50. [50]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Wortsman, M., Ilharco, G., Kim, J.W., Li, M., Kornblith, S., Roelofs, R., Lopes, R.G., Hajishirzi, H., Farhadi, A., Namkoong, H., et al.: Robust fine-tuning of zero-shot models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7959–7971 (2022)

  51. [51]

    arXiv preprint arXiv:2312.12148 (2023)

    Xu, L., Xie, H., Qin, S.Z.J., Tao, X., Wang, F.L.: Parameter-efficient fine-tuning meth- ods for pretrained language models: A critical review and assessment. arXiv preprint arXiv:2312.12148 (2023)

  52. [52]

    Advances in Neural Information Processing Systems 36, 12589–12610 (2023)

    Zhang, J., Liu, J., He, J., et al.: Composing parameter-efficient modules with arithmetic operation. Advances in Neural Information Processing Systems 36, 12589–12610 (2023)

  53. [53]

    Advances in neural information processing systems 28 (2015)

    Zhang, X., Zhao, J., LeCun, Y .: Character-level convolutional networks for text classification. Advances in neural information processing systems 28 (2015)