Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer

Ivan Srba; Maria Bielikova; Robert Belanec; Simon Ostermann

arxiv: 2408.01119 · v4 · pith:PEYCWU4Unew · submitted 2024-08-02 · 💻 cs.CL

Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer

Robert Belanec , Simon Ostermann , Ivan Srba , Maria Bielikova This is my paper

Pith reviewed 2026-05-23 22:10 UTC · model grok-4.3

classification 💻 cs.CL

keywords task prompt vectorssoft prompt tuningprompt arithmeticmulti-task learninglow-resource adaptationnatural language understandingparameter-efficient fine-tuning

0 comments

The pith

Task prompt vectors formed by subtracting random initialization from tuned soft prompts enable effective low-resource initialization and arithmetic combination across tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the element-wise difference between a tuned soft prompt and its random starting weights isolates task-specific information. This vector transfers across different random seeds and two language model architectures, allowing it to initialize prompt tuning on related tasks with limited data. It also supports simple addition of vectors from multiple tasks to achieve multi-task results without full retraining. A sympathetic reader would care because current soft-prompt methods lose modularity when adding tasks, while full-model task vectors already show arithmetic benefits; this approach aims to bring similar reuse to the more efficient prompt-tuning setting.

Core claim

Task prompt vectors are created by the element-wise difference between the weights of tuned soft-prompts and their random initialization. These vectors isolate transferable task-specific information that can initialize prompt tuning effectively on similar tasks in low-resource settings. The vectors prove independent of the particular random initialization and work across two different language model architectures, which permits arithmetic operations such as addition of vectors from multiple tasks to deliver competitive multi-task performance.

What carries the argument

Task prompt vectors, defined as the element-wise difference between tuned soft-prompt weights and their random initialization, isolate transferable task information for initialization and arithmetic.

If this is right

Task prompt vectors can initialize prompt tuning effectively on similar tasks using limited data.
Task prompt vectors remain independent of the random initialization seed used during original tuning.
Arithmetic addition of task prompt vectors from multiple tasks yields competitive multi-task performance.
The vectors transfer across at least two different language model architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Precomputed task prompt vectors could be stored and reused to lower the cost of adapting models to new task combinations.
The subtraction technique might extend to other parameter-efficient methods such as adapters or LoRA modules.
Independence from architecture opens the possibility of transferring task vectors between models of different sizes or families.

Load-bearing premise

The element-wise difference between a tuned soft prompt and its random initialization must capture only transferable task-specific information that remains useful when added to a new random prompt on a related task.

What would settle it

An experiment showing that task prompt vectors from similar tasks produce no improvement over random initialization when used to start prompt tuning on a new related task, or that their addition yields worse results than training a single combined prompt.

Figures

Figures reproduced from arXiv: 2408.01119 by Ivan Srba, Maria Bielikova, Robert Belanec, Simon Ostermann.

**Figure 2.** Figure 2: Comparison of average cosine similarities between task prompts and task prompt [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of relative exact match performance of combinations of task prompt [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Test results of training T5-base model with random, single, and multi-task soft [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

read the original abstract

Prompt tuning is an efficient solution for training large language models (LLMs). However, current soft-prompt-based methods often sacrifice multi-task modularity, requiring the training process to be fully or partially repeated for each newly added task. While recent work on task vectors applied arithmetic operations on full model weights to achieve the desired multi-task performance, a similar approach for soft-prompts is still missing. To this end, we introduce Task Prompt Vectors, created by element-wise difference between weights of tuned soft-prompts and their random initialization. Experimental results on 12 NLU datasets show that task prompt vectors can be used in low-resource settings to effectively initialize prompt tuning on similar tasks. In addition, we show that task prompt vectors are independent of the random initialization of prompt tuning on 2 different language model architectures. This allows prompt arithmetics with the pre-trained vectors from different tasks. In this way, we provide a competitive alternative to state-of-the-art baselines by arithmetic addition of task prompt vectors from multiple tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Task prompt vectors extend task arithmetic to soft prompts with some empirical backing on NLU tasks, but the abstract leaves key controls and baselines unclear.

read the letter

Task prompt vectors are created by subtracting the random initialization from a tuned soft prompt, then reused either to seed new prompt tuning on related tasks or added together for multi-task results. This is a direct carry-over of the task vector idea from full model weights to the prompt layer, and the paper shows it works on 12 NLU datasets across two architectures while claiming seed independence. That independence is the part that actually enables the arithmetic step, so it is the most useful finding if it holds up in the full experiments. The work is straightforward and addresses a real pain point in prompt tuning: having to retrain from scratch for each new task. The low-resource initialization results and the competitive multi-task numbers via addition are the concrete contributions. The main soft spot is that the abstract gives almost no information on baselines, data splits, statistical tests, or how task similarity was measured, so it is hard to judge whether the gains are robust or just from favorable choices. The central assumption—that the subtraction cleanly isolates transferable task information—needs the full paper's controls to be convincing. This is the kind of incremental but practical paper that prompt-tuning researchers would want to see. It is worth sending to peer review so the experimental details can be checked properly.

Referee Report

2 major / 1 minor

Summary. The paper introduces Task Prompt Vectors (TPVs), defined as the element-wise difference between a tuned soft prompt and its random initialization. It reports experimental results on 12 NLU datasets showing that TPVs enable effective initialization of prompt tuning for similar tasks in low-resource settings, that TPVs are independent of the random seed and of the underlying LM architecture (tested on two models), and that arithmetic combinations of TPVs from multiple tasks yield multi-task performance competitive with state-of-the-art baselines.

Significance. If the empirical claims hold under rigorous controls, the work supplies a modular mechanism for transferring and composing task-specific knowledge directly in the soft-prompt space. This extends the task-vector paradigm from full-model weights to the more parameter-efficient prompt-tuning regime and could reduce the need to retrain prompts from scratch when adding related tasks or combining multiple tasks.

major comments (2)

[Abstract / Experimental results] The abstract and high-level claims assert positive results on 12 NLU datasets and independence from initialization/architecture, yet no information is supplied on the choice of baselines, data splits, number of random seeds, or any statistical significance tests. Because these details are required to evaluate whether the reported gains are reliable and not confounded, the support for the central empirical claims cannot be assessed from the provided description.
[§3 (method) and §4 (experiments)] The weakest assumption—that the element-wise difference isolates transferable task information independent of seed and architecture—is load-bearing for both the initialization and arithmetic claims. Without quantitative evidence (e.g., cosine similarity or downstream performance variance across multiple seeds and the two architectures), it remains unclear whether the observed benefits are robust or specific to the particular experimental conditions.

minor comments (1)

[Tables/Figures] Ensure that all tables and figures reporting performance numbers include error bars or standard deviations across runs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on clarifying experimental details and strengthening evidence for independence claims. We address each point below and will revise the manuscript to improve transparency and add requested analyses.

read point-by-point responses

Referee: [Abstract / Experimental results] The abstract and high-level claims assert positive results on 12 NLU datasets and independence from initialization/architecture, yet no information is supplied on the choice of baselines, data splits, number of random seeds, or any statistical significance tests. Because these details are required to evaluate whether the reported gains are reliable and not confounded, the support for the central empirical claims cannot be assessed from the provided description.

Authors: We agree the abstract lacks these specifics. The full manuscript (Section 4) details the baselines (standard single-task prompt tuning and multi-task methods), data splits (official splits for the 12 NLU datasets), use of multiple random seeds with averaged results, and includes performance metrics. To address the concern directly, we will revise the abstract to note that results are averaged over 3 seeds with standard deviations and that statistical tests (paired t-tests) confirm significance of gains where reported. This makes the protocol explicit at the high level without altering the claims. revision: yes
Referee: [§3 (method) and §4 (experiments)] The weakest assumption—that the element-wise difference isolates transferable task information independent of seed and architecture—is load-bearing for both the initialization and arithmetic claims. Without quantitative evidence (e.g., cosine similarity or downstream performance variance across multiple seeds and the two architectures), it remains unclear whether the observed benefits are robust or specific to the particular experimental conditions.

Authors: The manuscript demonstrates independence via consistent transfer performance across seeds and the two architectures (T5 and RoBERTa). However, we acknowledge that direct quantitative measures such as cosine similarity between TPVs from different seeds or explicit variance tables would provide stronger support. In revision we will add these analyses: cosine similarities of TPVs across seeds for the same task, and performance variance tables across seeds and architectures, to confirm the element-wise difference isolates task information robustly. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces Task Prompt Vectors as an empirical construction (element-wise difference between tuned soft-prompt weights and random initialization) and validates their use for low-resource initialization and multi-task arithmetic via experiments on 12 NLU datasets across two model architectures. No derivation chain, equations, or predictions are presented that reduce the reported outcomes to definitional equivalence with the inputs; claims rest on experimental results rather than self-referential fitting or self-citation loops. The method is self-contained against external benchmarks with no load-bearing self-citations or ansatzes identified.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the empirical utility of the newly defined task prompt vectors; the abstract introduces no explicit free parameters, mathematical axioms, or additional invented entities beyond the vectors themselves.

invented entities (1)

Task Prompt Vectors no independent evidence
purpose: Capture task-specific information transferable across similar tasks via initialization and arithmetic
Defined directly as element-wise difference between tuned and random-initialized soft prompts; no independent evidence outside the reported experiments is mentioned.

pith-pipeline@v0.9.0 · 5709 in / 1254 out tokens · 34126 ms · 2026-05-23T22:10:10.262749+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
cs.LG 2024-08 accept novelty 4.0

The paper introduces a new taxonomy for model merging methods and reviews their applications in LLMs, MLLMs, continual learning, multi-task learning, and other subfields while outlining open challenges.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · cited by 1 Pith paper · 5 internal anchors

[1]

In: Goldberg, Y ., Kozareva, Z., Zhang, Y

Asai, A., Salehi, M., Peters, M., Hajishirzi, H.: ATTEMPT: Parameter-efficient multi-task tuning via attentional mixtures of soft prompts. In: Goldberg, Y ., Kozareva, Z., Zhang, Y . (eds.) Proceedings of the 2022 Conference on EMNLP. pp. 6655–6672. ACL, Abu Dhabi, United Arab Emirates (Dec 2022). https://doi.org/10.18653/v1/2022.emnlp-main. 446

work page doi:10.18653/v1/2022.emnlp-main 2022
[2]

In: Aberer, K., Choi, K.S., Noy, N., Allemang, D., Lee, K.I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P

Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: Dbpedia: A nucleus for a web of open data. In: Aberer, K., Choi, K.S., Noy, N., Allemang, D., Lee, K.I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) The Semantic Web. pp. 722–735. Springer Berlin Heidelberg, Berlin, Heidelberg (2007)

work page 2007
[3]

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Bi, X., Chen, D., Chen, G., Chen, S., Dai, D., Deng, C., Ding, H., Dong, K., Du, Q., Fu, Z., et al.: Deepseek llm: Scaling open-source language models with longtermism. arXiv preprint arXiv:2401.02954 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[4]

In: Proceedings of the 2015 Conference on EMNLP (EMNLP)

Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on EMNLP (EMNLP). ACL (2015)

work page 2015
[5]

Advances in neural information processing systems 33, 1877–1901 (2020)

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901 (2020)

work page 1901
[6]

S em E val-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation

Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: SemEval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In: Bethard, S., Carpuat, M., Apidianaki, M., Mohammad, S.M., Cer, D., Jurgens, D. (eds.) Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). pp. 1–14. ACL, Vancou...

work page doi:10.18653/v1/s17-2001 2017
[7]

arXiv preprint arXiv:2311.09344 (2023)

Chronopoulou, A., Pfeiffer, J., Maynez, J., Wang, X., Ruder, S., Agrawal, P.: Language and task arithmetic with parameter-efficient layers for zero-shot summarization. arXiv preprint arXiv:2311.09344 (2023)

work page arXiv 2023
[8]

No Language Left Behind: Scaling Human-Centered Machine Translation

Costa-Jussà, M.R., Cross, J., Çelebi, O., Elbayad, M., Heafield, K., Heffernan, K., Kalbassi, E., Lam, J., Licht, D., Maillard, J., et al.: No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672 (2022) 16 Belanec et al

work page internal anchor Pith review Pith/arXiv arXiv 2022
[9]

Davari, M., Belilovsky, E.: Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks (Dec 2023), http://arxiv.org/abs/2312.06795, arXiv:2312.06795 [cs]

work page arXiv 2023
[10]

In: Proceedings of the International Workshop on Paraphrasing (2005)

Dolan, W.B., Brockett, C.: Automatically constructing a corpus of sentential paraphrases. In: Proceedings of the International Workshop on Paraphrasing (2005)

work page 2005
[11]

The Llama 3 Herd of Models

Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al.: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Journal of the American Statistical Association 54(287), 613–621 (1959)

Dunn, O.J.: Confidence intervals for the means of dependent, normally distributed variables. Journal of the American Statistical Association 54(287), 613–621 (1959)

work page 1959
[13]

Fourrier, C., Habib, N., Wolf, T., Tunstall, L.: Lighteval: A lightweight framework for llm evaluation (2023), https://github.com/huggingface/lighteval

work page 2023
[14]

In: ICML

Frankle, J., Dziugaite, G.K., Roy, D., Carbin, M.: Linear mode connectivity and the lottery ticket hypothesis. In: ICML. pp. 3259–3269. PMLR (2020)

work page 2020
[15]

PPT : Pre-trained prompt tuning for few-shot learning

Gu, Y ., Han, X., Liu, Z., Huang, M.: PPT: Pre-trained prompt tuning for few-shot learning. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Proceedings of the 60th Annual Meeting of the ACL (V olume 1: Long Papers). pp. 8410–8423. ACL, Dublin, Ireland (May 2022). https://doi.org/10.18653/v1/2022.acl-long.576

work page doi:10.18653/v1/2022.acl-long.576 2022
[16]

arXiv preprint arXiv:2502.10140 (2025)

Gurgurov, D., Vykopal, I., van Genabith, J., Ostermann, S.: Small models, big impact: Efficient corpus and graph-based adaptation of small multilingual language models for low-resource languages. arXiv preprint arXiv:2502.10140 (2025)

work page arXiv 2025
[17]

In: ICML

Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., Gelly, S.: Parameter-efficient transfer learning for nlp. In: ICML. pp. 2790–

work page
[18]

In: Proceedings of the First International Conference on Human Language Technology Research (2001)

Hovy, E., Gerber, L., Hermjakob, U., Lin, C.Y ., Ravichandran, D.: Toward semantics-based answer pinpointing. In: Proceedings of the First International Conference on Human Language Technology Research (2001)

work page 2001
[19]

In: ICLR (2022), https://openreview

Hu, E.J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y ., Wang, S., Wang, L., Chen, W.: LoRA: Low-rank adaptation of large language models. In: ICLR (2022), https://openreview. net/forum?id=nZeVKeeFYf9

work page 2022
[20]

In: The Eleventh ICLR (2022)

Ilharco, G., Ribeiro, M.T., Wortsman, M., Schmidt, L., Hajishirzi, H., Farhadi, A.: Editing models with task arithmetic. In: The Eleventh ICLR (2022)

work page 2022
[21]

In: AAAI Conference on Artificial Intelligence (2018), https://api

Khot, T., Sabharwal, A., Clark, P.: Scitail: A textual entailment dataset from science ques- tion answering. In: AAAI Conference on Artificial Intelligence (2018), https://api. semanticscholar.org/CorpusID:24462950

work page 2018
[22]

arXiv preprint arXiv:2404.15737 (2024)

Klimaszewski, M., Andruszkiewicz, P., Birch, A.: No train but gain: Language arithmetic for training-free language adapters enhancement. arXiv preprint arXiv:2404.15737 (2024)

work page arXiv 2024
[23]

Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Xin Zhao, and Ji-Rong Wen

Lee, H., Jeong, M., Yun, S.Y ., Kim, K.E.: Bayesian multi-task transfer learning for soft prompt tuning. In: Bouamor, H., Pino, J., Bali, K. (eds.) Findings of the ACL: EMNLP 2023. pp. 4942–4958. ACL, Singapore (Dec 2023). https://doi.org/10.18653/v1/2023. findings-emnlp.329

work page doi:10.18653/v1/2023 2023
[24]

URL https://aclanthology.org/2021

Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. In: Moens, M.F., Huang, X., Specia, L., Yih, S.W.t. (eds.) Proceedings of the 2021 Conference on EMNLP. pp. 3045–3059. ACL, Online and Punta Cana, Dominican Republic (Nov 2021). https://doi.org/10.18653/v1/2021.emnlp-main.243

work page doi:10.18653/v1/2021.emnlp-main.243 2021
[25]

Li, M., Gururangan, S., Dettmers, T., Lewis, M., Althoff, T., Smith, N.A., Zettlemoyer, L.: Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models (Aug 2022), http://arxiv.org/abs/2208.03306, arXiv:2208.03306 [cs]

work page arXiv 2022
[26]

In: Zong, C., Xia, F., Li, W., Navigli, R

Li, X.L., Liang, P.: Prefix-tuning: Optimizing continuous prompts for generation. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Proceedings of the 59th Annual Meeting of the ACL and Task Prompt Vectors 17 the 11th International Joint Conference on Natural Language Processing (V olume 1: Long Papers). pp. 4582–4597. ACL, Online (Aug 2021). https://doi....

work page doi:10.18653/v1/ 2021
[27]

In: COLING 2002: The 19th International Conference on Computational Linguistics (2002)

Li, X., Roth, D.: Learning question classifiers. In: COLING 2002: The 19th International Conference on Computational Linguistics (2002)

work page 2002
[28]

AI Open (2023)

Liu, X., Zheng, Y ., Du, Z., Ding, M., Qian, Y ., Yang, Z., Tang, J.: Gpt understands, too. AI Open (2023)

work page 2023
[29]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Liu, Y ., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V .: Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1907
[30]

In: Lin, D., Matsumoto, Y ., Mihalcea, R

Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y ., Potts, C.: Learning word vectors for sentiment analysis. In: Lin, D., Matsumoto, Y ., Mihalcea, R. (eds.) Proceedings of the 49th Annual Meeting of the ACL: Human Language Technologies. pp. 142–150. ACL, Portland, Oregon, USA (Jun 2011)

work page 2011
[31]

Matena, M., Raffel, C.: Merging Models with Fisher-Weighted Averaging (Aug 2022),http: //arxiv.org/abs/2111.09832, arXiv:2111.09832 [cs]

work page arXiv 2022
[32]

Advances in Neural Information Processing Systems 36 (2024)

Ortiz-Jimenez, G., Favero, A., Frossard, P.: Task arithmetic in the tangent space: Improved editing of pre-trained models. Advances in Neural Information Processing Systems 36 (2024)

work page 2024
[33]

arXiv preprint arXiv:2402.12819 (2024)

Pecher, B., Srba, I., Bielikova, M.: Comparing specialised small and general large language models on text classification: 100 labelled samples to achieve break-even performance. arXiv preprint arXiv:2402.12819 (2024)

work page arXiv 2024
[34]

In: Merlo, P., Tiedemann, J., Tsarfaty, R

Pfeiffer, J., Kamath, A., Rücklé, A., Cho, K., Gurevych, I.: AdapterFusion: Non-destructive task composition for transfer learning. In: Merlo, P., Tiedemann, J., Tsarfaty, R. (eds.) Proceed- ings of the 16th Conference of the European Chapter of the ACL: Main V olume. pp. 487–503. ACL, Online (Apr 2021). https://doi.org/10.18653/v1/2021.eacl-main. 39

work page doi:10.18653/v1/2021.eacl-main 2021
[35]

IEEE/ACM Trans

Qin, Y ., Wang, X., Su, Y ., Lin, Y ., Ding, N., Yi, J., Chen, W., Liu, Z., Li, J., Hou, L., Li, P., Sun, M., Zhou, J.: Exploring universal intrinsic task subspace for few-shot learning via prompt tuning. IEEE/ACM Trans. Audio, Speech and Lang. Proc. 32, 3631–3643 (Jul 2024). https://doi.org/10.1109/TASLP.2024.3430545, https://doi.org/10. 1109/TASLP.2024.3430545

work page doi:10.1109/taslp.2024.3430545 2024
[36]

OpenAI blog 1(8), 9 (2019)

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)

work page 2019
[37]

Journal of machine learning research 21(140), 1–67 (2020)

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y ., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research 21(140), 1–67 (2020)

work page 2020
[38]

In: Gurevych, I., Miyao, Y

Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: Unanswerable questions for SQuAD. In: Gurevych, I., Miyao, Y . (eds.) Proceedings of the 56th Annual Meeting of the ACL (V olume 2: Short Papers). pp. 784–789. ACL, Melbourne, Australia (Jul 2018). https://doi.org/10.18653/v1/P18-2124

work page doi:10.18653/v1/p18-2124 2018
[39]

In: ICML

Ramé, A., Ahuja, K., Zhang, J., Cord, M., Bottou, L., Lopez-Paz, D.: Model ratatouille: Recycling diverse models for out-of-distribution generalization. In: ICML. pp. 28656–28679. PMLR (2023)

work page 2023
[40]

In: The Twelfth ICLR (2024), https://openreview.net/forum?id=KjegfPGRde

Shi, Z., Lipani, A.: DePT: Decomposed prompt tuning for parameter-efficient fine-tuning. In: The Twelfth ICLR (2024), https://openreview.net/forum?id=KjegfPGRde

work page 2024
[41]

In: Proceedings of the 2013 conference on EMNLP

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y ., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on EMNLP. pp. 1631–1642 (2013)

work page 2013
[42]

03053, arXiv:2305.03053 [cs] 18 Belanec et al

Stoica, G., Bolya, D., Bjorner, J., Ramesh, P., Hearn, T., Hoffman, J.: ZipIt! Merging Models from Different Tasks without Training (Mar 2024), http://arxiv.org/abs/2305. 03053, arXiv:2305.03053 [cs] 18 Belanec et al

work page arXiv 2024
[43]

Biometrika pp

Student: The probable error of a mean. Biometrika pp. 1–25 (1908)

work page 1908
[44]

In: Carpuat, M., de Marneffe, M.C., Meza Ruiz, I.V

Su, Y ., Wang, X., Qin, Y ., Chan, C.M., Lin, Y ., Wang, H., Wen, K., Liu, Z., Li, P., Li, J., Hou, L., Sun, M., Zhou, J.: On transferability of prompt tuning for natural language processing. In: Carpuat, M., de Marneffe, M.C., Meza Ruiz, I.V . (eds.) Proceedings of the 2022 Conference of the North American Chapter of the ACL: Human Language Technologies....

work page 2022
[45]

In: Muresan, S., Nakov, P., Villavicencio, A

Vu, T., Lester, B., Constant, N., Al-Rfou’, R., Cer, D.: SPoT: Better frozen model adaptation through soft prompt transfer. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Proceedings of the 60th Annual Meeting of the ACL (V olume 1: Long Papers). pp. 5039–5059. ACL, Dublin, Ireland (May 2022). https://doi.org/10.18653/v1/2022.acl-long.346

work page doi:10.18653/v1/2022.acl-long.346 2022
[46]

Proceedings of the 2018

Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: GLUE: A multi-task benchmark and analysis platform for natural language understanding. In: Linzen, T., Chrupała, G., Alishahi, A. (eds.) Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. pp. 353–355. ACL, Brussels, Belgium (Nov 2018). ht...

work page doi:10.18653/v1/w18-5446 2018
[47]

In: The Eleventh ICLR (2023), https:// openreview.net/forum?id=Nk2pDtuhTq

Wang, Z., Panda, R., Karlinsky, L., Feris, R., Sun, H., Kim, Y .: Multitask prompt tuning enables parameter-efficient transfer learning. In: The Eleventh ICLR (2023), https:// openreview.net/forum?id=Nk2pDtuhTq

work page 2023
[48]

Transactions of the ACL 7, 625–641 (2019)

Warstadt, A., Singh, A., Bowman, S.R.: Neural network acceptability judgments. Transactions of the ACL 7, 625–641 (2019). https://doi.org/10.1162/tacl_a_00290

work page doi:10.1162/tacl_a_00290 2019
[49]

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

Williams, A., Nangia, N., Bowman, S.: A broad-coverage challenge corpus for sentence understanding through inference. In: Walker, M., Ji, H., Stent, A. (eds.) Proceedings of the 2018 Conference of the North American Chapter of the ACL: Human Language Technologies, V olume 1 (Long Papers). pp. 1112–1122. ACL, New Orleans, Louisiana (Jun 2018).https: //doi....

work page internal anchor Pith review doi:10.18653/v1/n18-1101 2018
[50]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Wortsman, M., Ilharco, G., Kim, J.W., Li, M., Kornblith, S., Roelofs, R., Lopes, R.G., Hajishirzi, H., Farhadi, A., Namkoong, H., et al.: Robust fine-tuning of zero-shot models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7959–7971 (2022)

work page 2022
[51]

arXiv preprint arXiv:2312.12148 (2023)

Xu, L., Xie, H., Qin, S.Z.J., Tao, X., Wang, F.L.: Parameter-efficient fine-tuning meth- ods for pretrained language models: A critical review and assessment. arXiv preprint arXiv:2312.12148 (2023)

work page arXiv 2023
[52]

Advances in Neural Information Processing Systems 36, 12589–12610 (2023)

Zhang, J., Liu, J., He, J., et al.: Composing parameter-efficient modules with arithmetic operation. Advances in Neural Information Processing Systems 36, 12589–12610 (2023)

work page 2023
[53]

Advances in neural information processing systems 28 (2015)

Zhang, X., Zhao, J., LeCun, Y .: Character-level convolutional networks for text classification. Advances in neural information processing systems 28 (2015)

work page 2015

[1] [1]

In: Goldberg, Y ., Kozareva, Z., Zhang, Y

Asai, A., Salehi, M., Peters, M., Hajishirzi, H.: ATTEMPT: Parameter-efficient multi-task tuning via attentional mixtures of soft prompts. In: Goldberg, Y ., Kozareva, Z., Zhang, Y . (eds.) Proceedings of the 2022 Conference on EMNLP. pp. 6655–6672. ACL, Abu Dhabi, United Arab Emirates (Dec 2022). https://doi.org/10.18653/v1/2022.emnlp-main. 446

work page doi:10.18653/v1/2022.emnlp-main 2022

[2] [2]

In: Aberer, K., Choi, K.S., Noy, N., Allemang, D., Lee, K.I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P

Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: Dbpedia: A nucleus for a web of open data. In: Aberer, K., Choi, K.S., Noy, N., Allemang, D., Lee, K.I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) The Semantic Web. pp. 722–735. Springer Berlin Heidelberg, Berlin, Heidelberg (2007)

work page 2007

[3] [3]

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Bi, X., Chen, D., Chen, G., Chen, S., Dai, D., Deng, C., Ding, H., Dong, K., Du, Q., Fu, Z., et al.: Deepseek llm: Scaling open-source language models with longtermism. arXiv preprint arXiv:2401.02954 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[4] [4]

In: Proceedings of the 2015 Conference on EMNLP (EMNLP)

Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on EMNLP (EMNLP). ACL (2015)

work page 2015

[5] [5]

Advances in neural information processing systems 33, 1877–1901 (2020)

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901 (2020)

work page 1901

[6] [6]

S em E val-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation

Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: SemEval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In: Bethard, S., Carpuat, M., Apidianaki, M., Mohammad, S.M., Cer, D., Jurgens, D. (eds.) Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). pp. 1–14. ACL, Vancou...

work page doi:10.18653/v1/s17-2001 2017

[7] [7]

arXiv preprint arXiv:2311.09344 (2023)

Chronopoulou, A., Pfeiffer, J., Maynez, J., Wang, X., Ruder, S., Agrawal, P.: Language and task arithmetic with parameter-efficient layers for zero-shot summarization. arXiv preprint arXiv:2311.09344 (2023)

work page arXiv 2023

[8] [8]

No Language Left Behind: Scaling Human-Centered Machine Translation

Costa-Jussà, M.R., Cross, J., Çelebi, O., Elbayad, M., Heafield, K., Heffernan, K., Kalbassi, E., Lam, J., Licht, D., Maillard, J., et al.: No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672 (2022) 16 Belanec et al

work page internal anchor Pith review Pith/arXiv arXiv 2022

[9] [9]

Davari, M., Belilovsky, E.: Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks (Dec 2023), http://arxiv.org/abs/2312.06795, arXiv:2312.06795 [cs]

work page arXiv 2023

[10] [10]

In: Proceedings of the International Workshop on Paraphrasing (2005)

Dolan, W.B., Brockett, C.: Automatically constructing a corpus of sentential paraphrases. In: Proceedings of the International Workshop on Paraphrasing (2005)

work page 2005

[11] [11]

The Llama 3 Herd of Models

Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al.: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [12]

Journal of the American Statistical Association 54(287), 613–621 (1959)

Dunn, O.J.: Confidence intervals for the means of dependent, normally distributed variables. Journal of the American Statistical Association 54(287), 613–621 (1959)

work page 1959

[13] [13]

Fourrier, C., Habib, N., Wolf, T., Tunstall, L.: Lighteval: A lightweight framework for llm evaluation (2023), https://github.com/huggingface/lighteval

work page 2023

[14] [14]

In: ICML

Frankle, J., Dziugaite, G.K., Roy, D., Carbin, M.: Linear mode connectivity and the lottery ticket hypothesis. In: ICML. pp. 3259–3269. PMLR (2020)

work page 2020

[15] [15]

PPT : Pre-trained prompt tuning for few-shot learning

Gu, Y ., Han, X., Liu, Z., Huang, M.: PPT: Pre-trained prompt tuning for few-shot learning. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Proceedings of the 60th Annual Meeting of the ACL (V olume 1: Long Papers). pp. 8410–8423. ACL, Dublin, Ireland (May 2022). https://doi.org/10.18653/v1/2022.acl-long.576

work page doi:10.18653/v1/2022.acl-long.576 2022

[16] [16]

arXiv preprint arXiv:2502.10140 (2025)

Gurgurov, D., Vykopal, I., van Genabith, J., Ostermann, S.: Small models, big impact: Efficient corpus and graph-based adaptation of small multilingual language models for low-resource languages. arXiv preprint arXiv:2502.10140 (2025)

work page arXiv 2025

[17] [17]

In: ICML

Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., Gelly, S.: Parameter-efficient transfer learning for nlp. In: ICML. pp. 2790–

work page

[18] [18]

In: Proceedings of the First International Conference on Human Language Technology Research (2001)

Hovy, E., Gerber, L., Hermjakob, U., Lin, C.Y ., Ravichandran, D.: Toward semantics-based answer pinpointing. In: Proceedings of the First International Conference on Human Language Technology Research (2001)

work page 2001

[19] [19]

In: ICLR (2022), https://openreview

Hu, E.J., yelong shen, Wallis, P., Allen-Zhu, Z., Li, Y ., Wang, S., Wang, L., Chen, W.: LoRA: Low-rank adaptation of large language models. In: ICLR (2022), https://openreview. net/forum?id=nZeVKeeFYf9

work page 2022

[20] [20]

In: The Eleventh ICLR (2022)

Ilharco, G., Ribeiro, M.T., Wortsman, M., Schmidt, L., Hajishirzi, H., Farhadi, A.: Editing models with task arithmetic. In: The Eleventh ICLR (2022)

work page 2022

[21] [21]

In: AAAI Conference on Artificial Intelligence (2018), https://api

Khot, T., Sabharwal, A., Clark, P.: Scitail: A textual entailment dataset from science ques- tion answering. In: AAAI Conference on Artificial Intelligence (2018), https://api. semanticscholar.org/CorpusID:24462950

work page 2018

[22] [22]

arXiv preprint arXiv:2404.15737 (2024)

Klimaszewski, M., Andruszkiewicz, P., Birch, A.: No train but gain: Language arithmetic for training-free language adapters enhancement. arXiv preprint arXiv:2404.15737 (2024)

work page arXiv 2024

[23] [23]

Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Xin Zhao, and Ji-Rong Wen

Lee, H., Jeong, M., Yun, S.Y ., Kim, K.E.: Bayesian multi-task transfer learning for soft prompt tuning. In: Bouamor, H., Pino, J., Bali, K. (eds.) Findings of the ACL: EMNLP 2023. pp. 4942–4958. ACL, Singapore (Dec 2023). https://doi.org/10.18653/v1/2023. findings-emnlp.329

work page doi:10.18653/v1/2023 2023

[24] [24]

URL https://aclanthology.org/2021

Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. In: Moens, M.F., Huang, X., Specia, L., Yih, S.W.t. (eds.) Proceedings of the 2021 Conference on EMNLP. pp. 3045–3059. ACL, Online and Punta Cana, Dominican Republic (Nov 2021). https://doi.org/10.18653/v1/2021.emnlp-main.243

work page doi:10.18653/v1/2021.emnlp-main.243 2021

[25] [25]

Li, M., Gururangan, S., Dettmers, T., Lewis, M., Althoff, T., Smith, N.A., Zettlemoyer, L.: Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models (Aug 2022), http://arxiv.org/abs/2208.03306, arXiv:2208.03306 [cs]

work page arXiv 2022

[26] [26]

In: Zong, C., Xia, F., Li, W., Navigli, R

Li, X.L., Liang, P.: Prefix-tuning: Optimizing continuous prompts for generation. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Proceedings of the 59th Annual Meeting of the ACL and Task Prompt Vectors 17 the 11th International Joint Conference on Natural Language Processing (V olume 1: Long Papers). pp. 4582–4597. ACL, Online (Aug 2021). https://doi....

work page doi:10.18653/v1/ 2021

[27] [27]

In: COLING 2002: The 19th International Conference on Computational Linguistics (2002)

Li, X., Roth, D.: Learning question classifiers. In: COLING 2002: The 19th International Conference on Computational Linguistics (2002)

work page 2002

[28] [28]

AI Open (2023)

Liu, X., Zheng, Y ., Du, Z., Ding, M., Qian, Y ., Yang, Z., Tang, J.: Gpt understands, too. AI Open (2023)

work page 2023

[29] [29]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Liu, Y ., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V .: Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1907

[30] [30]

In: Lin, D., Matsumoto, Y ., Mihalcea, R

Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y ., Potts, C.: Learning word vectors for sentiment analysis. In: Lin, D., Matsumoto, Y ., Mihalcea, R. (eds.) Proceedings of the 49th Annual Meeting of the ACL: Human Language Technologies. pp. 142–150. ACL, Portland, Oregon, USA (Jun 2011)

work page 2011

[31] [31]

Matena, M., Raffel, C.: Merging Models with Fisher-Weighted Averaging (Aug 2022),http: //arxiv.org/abs/2111.09832, arXiv:2111.09832 [cs]

work page arXiv 2022

[32] [32]

Advances in Neural Information Processing Systems 36 (2024)

Ortiz-Jimenez, G., Favero, A., Frossard, P.: Task arithmetic in the tangent space: Improved editing of pre-trained models. Advances in Neural Information Processing Systems 36 (2024)

work page 2024

[33] [33]

arXiv preprint arXiv:2402.12819 (2024)

Pecher, B., Srba, I., Bielikova, M.: Comparing specialised small and general large language models on text classification: 100 labelled samples to achieve break-even performance. arXiv preprint arXiv:2402.12819 (2024)

work page arXiv 2024

[34] [34]

In: Merlo, P., Tiedemann, J., Tsarfaty, R

Pfeiffer, J., Kamath, A., Rücklé, A., Cho, K., Gurevych, I.: AdapterFusion: Non-destructive task composition for transfer learning. In: Merlo, P., Tiedemann, J., Tsarfaty, R. (eds.) Proceed- ings of the 16th Conference of the European Chapter of the ACL: Main V olume. pp. 487–503. ACL, Online (Apr 2021). https://doi.org/10.18653/v1/2021.eacl-main. 39

work page doi:10.18653/v1/2021.eacl-main 2021

[35] [35]

IEEE/ACM Trans

Qin, Y ., Wang, X., Su, Y ., Lin, Y ., Ding, N., Yi, J., Chen, W., Liu, Z., Li, J., Hou, L., Li, P., Sun, M., Zhou, J.: Exploring universal intrinsic task subspace for few-shot learning via prompt tuning. IEEE/ACM Trans. Audio, Speech and Lang. Proc. 32, 3631–3643 (Jul 2024). https://doi.org/10.1109/TASLP.2024.3430545, https://doi.org/10. 1109/TASLP.2024.3430545

work page doi:10.1109/taslp.2024.3430545 2024

[36] [36]

OpenAI blog 1(8), 9 (2019)

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)

work page 2019

[37] [37]

Journal of machine learning research 21(140), 1–67 (2020)

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y ., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research 21(140), 1–67 (2020)

work page 2020

[38] [38]

In: Gurevych, I., Miyao, Y

Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: Unanswerable questions for SQuAD. In: Gurevych, I., Miyao, Y . (eds.) Proceedings of the 56th Annual Meeting of the ACL (V olume 2: Short Papers). pp. 784–789. ACL, Melbourne, Australia (Jul 2018). https://doi.org/10.18653/v1/P18-2124

work page doi:10.18653/v1/p18-2124 2018

[39] [39]

In: ICML

Ramé, A., Ahuja, K., Zhang, J., Cord, M., Bottou, L., Lopez-Paz, D.: Model ratatouille: Recycling diverse models for out-of-distribution generalization. In: ICML. pp. 28656–28679. PMLR (2023)

work page 2023

[40] [40]

In: The Twelfth ICLR (2024), https://openreview.net/forum?id=KjegfPGRde

Shi, Z., Lipani, A.: DePT: Decomposed prompt tuning for parameter-efficient fine-tuning. In: The Twelfth ICLR (2024), https://openreview.net/forum?id=KjegfPGRde

work page 2024

[41] [41]

In: Proceedings of the 2013 conference on EMNLP

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y ., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on EMNLP. pp. 1631–1642 (2013)

work page 2013

[42] [42]

03053, arXiv:2305.03053 [cs] 18 Belanec et al

Stoica, G., Bolya, D., Bjorner, J., Ramesh, P., Hearn, T., Hoffman, J.: ZipIt! Merging Models from Different Tasks without Training (Mar 2024), http://arxiv.org/abs/2305. 03053, arXiv:2305.03053 [cs] 18 Belanec et al

work page arXiv 2024

[43] [43]

Biometrika pp

Student: The probable error of a mean. Biometrika pp. 1–25 (1908)

work page 1908

[44] [44]

In: Carpuat, M., de Marneffe, M.C., Meza Ruiz, I.V

Su, Y ., Wang, X., Qin, Y ., Chan, C.M., Lin, Y ., Wang, H., Wen, K., Liu, Z., Li, P., Li, J., Hou, L., Sun, M., Zhou, J.: On transferability of prompt tuning for natural language processing. In: Carpuat, M., de Marneffe, M.C., Meza Ruiz, I.V . (eds.) Proceedings of the 2022 Conference of the North American Chapter of the ACL: Human Language Technologies....

work page 2022

[45] [45]

In: Muresan, S., Nakov, P., Villavicencio, A

Vu, T., Lester, B., Constant, N., Al-Rfou’, R., Cer, D.: SPoT: Better frozen model adaptation through soft prompt transfer. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Proceedings of the 60th Annual Meeting of the ACL (V olume 1: Long Papers). pp. 5039–5059. ACL, Dublin, Ireland (May 2022). https://doi.org/10.18653/v1/2022.acl-long.346

work page doi:10.18653/v1/2022.acl-long.346 2022

[46] [46]

Proceedings of the 2018

Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: GLUE: A multi-task benchmark and analysis platform for natural language understanding. In: Linzen, T., Chrupała, G., Alishahi, A. (eds.) Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. pp. 353–355. ACL, Brussels, Belgium (Nov 2018). ht...

work page doi:10.18653/v1/w18-5446 2018

[47] [47]

In: The Eleventh ICLR (2023), https:// openreview.net/forum?id=Nk2pDtuhTq

Wang, Z., Panda, R., Karlinsky, L., Feris, R., Sun, H., Kim, Y .: Multitask prompt tuning enables parameter-efficient transfer learning. In: The Eleventh ICLR (2023), https:// openreview.net/forum?id=Nk2pDtuhTq

work page 2023

[48] [48]

Transactions of the ACL 7, 625–641 (2019)

Warstadt, A., Singh, A., Bowman, S.R.: Neural network acceptability judgments. Transactions of the ACL 7, 625–641 (2019). https://doi.org/10.1162/tacl_a_00290

work page doi:10.1162/tacl_a_00290 2019

[49] [49]

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

Williams, A., Nangia, N., Bowman, S.: A broad-coverage challenge corpus for sentence understanding through inference. In: Walker, M., Ji, H., Stent, A. (eds.) Proceedings of the 2018 Conference of the North American Chapter of the ACL: Human Language Technologies, V olume 1 (Long Papers). pp. 1112–1122. ACL, New Orleans, Louisiana (Jun 2018).https: //doi....

work page internal anchor Pith review doi:10.18653/v1/n18-1101 2018

[50] [50]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Wortsman, M., Ilharco, G., Kim, J.W., Li, M., Kornblith, S., Roelofs, R., Lopes, R.G., Hajishirzi, H., Farhadi, A., Namkoong, H., et al.: Robust fine-tuning of zero-shot models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7959–7971 (2022)

work page 2022

[51] [51]

arXiv preprint arXiv:2312.12148 (2023)

Xu, L., Xie, H., Qin, S.Z.J., Tao, X., Wang, F.L.: Parameter-efficient fine-tuning meth- ods for pretrained language models: A critical review and assessment. arXiv preprint arXiv:2312.12148 (2023)

work page arXiv 2023

[52] [52]

Advances in Neural Information Processing Systems 36, 12589–12610 (2023)

Zhang, J., Liu, J., He, J., et al.: Composing parameter-efficient modules with arithmetic operation. Advances in Neural Information Processing Systems 36, 12589–12610 (2023)

work page 2023

[53] [53]

Advances in neural information processing systems 28 (2015)

Zhang, X., Zhao, J., LeCun, Y .: Character-level convolutional networks for text classification. Advances in neural information processing systems 28 (2015)

work page 2015