pith. sign in

arxiv: 2606.24841 · v1 · pith:IOAMESU7new · submitted 2026-06-23 · 💻 cs.AI · cs.CL

Matching Tasks to Objectives: Fine-Tuning and Prompt-Tuning Strategies for Encoder-Decoder Pre-trained Language Models

Pith reviewed 2026-06-25 23:04 UTC · model grok-4.3

classification 💻 cs.AI cs.CL
keywords MTO frameworkencoder-decoder modelspre-training objectivesfine-tuningprompt-tuningfew-shot learningcommonsense knowledgequestion answering
0
0 comments X

The pith

The MTO framework matches tasks to pre-training objectives to achieve over 120% performance gains in few-shot encoder-decoder model adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how pre-training objectives influence encoder-decoder language models on generation and question answering tasks that involve commonsense knowledge retrieval and completion. It introduces the Match Task to Objective framework to select suitable objectives and prepare related data through unsupervised methods for adaptation. Novel templates are designed to align with the chosen objectives during both fine-tuning and prompt-tuning stages. When the objectives match the task needs, the strategies produce gains exceeding 120 percent over conventional approaches in few-shot conditions while also beating baselines in full-data settings. The work extends the same alignment principle to soft prompt engineering for further performance lifts.

Core claim

The central claim is that the Match Task to Objective framework identifies the appropriate pre-training objective for a task, prepares task-related data via unsupervised training based on that objective, and supports novel templates that align with the objectives in fine-tuning and prompt-tuning; when this matching occurs, encoder-decoder models deliver performance gains of over 120 percent compared to conventional methods in few-shot settings, outperform related works in those regimes, and exceed the baseline even with full datasets, with similar benefits observed when extending the approach to prompt-tuning.

What carries the argument

The Match Task to Objective (MTO) framework, which determines the suitable pre-training objective for a task and supplies automated unsupervised methods to prepare data for adaptation.

If this is right

  • Encoder-decoder models adapted with matched objectives outperform conventional methods by over 120 percent in few-shot regimes for commonsense tasks.
  • The same matching strategy improves results even when full training datasets are available.
  • Novel templates aligned to objectives enhance both fine-tuning and prompt-tuning performance.
  • Guidance emerges for selecting objectives and optimizing soft prompts for specific tasks.
  • The approach significantly exceeds related works in few-shot settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Objective alignment may address a core mismatch that limits current adaptation methods across different model families.
  • The unsupervised data preparation step could reduce reliance on labeled examples when extending to new domains.
  • Applying the framework to additional tasks such as summarization or translation would test whether the gains generalize.
  • Template design might itself be automated by reusing the objective-matching logic.

Load-bearing premise

The assumption that the MTO framework can accurately identify the appropriate pre-training objective for a given task and that the novel templates will align effectively to produce the claimed performance improvements.

What would settle it

A controlled experiment on the same generation and question answering tasks in which applying MTO-selected objectives and aligned templates produces no gain or a performance drop relative to standard fine-tuning and prompt-tuning in few-shot settings.

read the original abstract

Prompt-based learning has emerged as a dominant paradigm in natural language processing. This study explores the impact of diverse pre-training objectives on the performance of encoder-decoder pre-trained language models across generation and question answering tasks, with a focus on commonsense knowledge retrieval and completion. We highlight the benefits of incorporating multiple objectives during both pre-training and fine-tuning stages. We introduce the Match Task to Objective (MTO) framework and methods for determining the appropriate objective for a given task. This framework offers automated methods to prepare task-related data for adaptation through unsupervised training, based on the identified objective. In the fine-tuning stage, we design novel templates that align with the objectives of the pre-training and adaptation stages. When aligned with task requirements, these strategies can achieve a performance gain of over 120\% compared to conventional methods in few-shot settings. They significantly outperform related works in few-shot settings and exceed the baseline even in full-dataset scenarios. Furthermore, we extend this approach to include prompt-tuning methodologies, providing guidance for more effective soft prompt engineering and optimization. Our strategies significantly enhance prompt-tuning performance as well. These insights hold substantial value, precisely guiding the selection and optimization of models customized for specific tasks. Code is available at https://github.com/puraminy/MTO/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Match Task to Objective (MTO) framework for encoder-decoder pre-trained language models. It automates unsupervised identification of suitable pre-training objectives for generation and QA tasks (focusing on commonsense knowledge), prepares adaptation data accordingly, and designs aligned templates for fine-tuning and prompt-tuning. The central empirical claim is that these strategies yield over 120% performance gains versus conventional methods in few-shot settings, outperform related work in few-shot regimes, and exceed baselines even with full data; the approach is also extended to soft prompt engineering.

Significance. If the MTO identification procedure and template contributions are shown to be reliable via ablations and statistical tests, the work could supply actionable guidance for objective-aware adaptation of encoder-decoder models in low-data regimes. The public code release at https://github.com/puraminy/MTO/ is a clear strength for reproducibility.

major comments (2)
  1. [Abstract] Abstract: the headline claim of >120% few-shot gain (and outperformance statements) rests on the unverified assumptions that (1) the unsupervised MTO procedure reliably selects the correct pre-training objective for each task and (2) the novel templates, rather than incidental factors, drive the reported improvements. No quantitative measure of identification accuracy, no ablation isolating objective choice versus random/fixed objectives, and no controlled comparison of novel versus standard templates at fixed objective are described.
  2. [Abstract] Abstract and methods description: the experimental setup, baselines, datasets, few-shot sampling protocol, and statistical significance tests are not detailed, making it impossible to assess whether the performance numbers support the central claims about MTO alignment.
minor comments (1)
  1. [Abstract] The abstract states that the framework 'offers automated methods to prepare task-related data' but does not specify the unsupervised criteria or any validation of those criteria.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for stronger empirical support and clearer experimental details. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and analyses.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim of >120% few-shot gain (and outperformance statements) rests on the unverified assumptions that (1) the unsupervised MTO procedure reliably selects the correct pre-training objective for each task and (2) the novel templates, rather than incidental factors, drive the reported improvements. No quantitative measure of identification accuracy, no ablation isolating objective choice versus random/fixed objectives, and no controlled comparison of novel versus standard templates at fixed objective are described.

    Authors: We agree that the abstract's performance claims would be strengthened by explicit validation of the MTO procedure and template contributions. The current manuscript presents the MTO framework and reports gains but does not include a quantitative accuracy metric for objective identification or the requested ablations. We will add (1) an evaluation of MTO identification accuracy against held-out task-objective pairs, (2) an ablation comparing MTO-selected objectives against random and fixed-objective baselines, and (3) a controlled comparison of the novel templates versus standard templates while holding the objective fixed. These additions will be placed in a new experimental subsection and referenced from the abstract. revision: yes

  2. Referee: [Abstract] Abstract and methods description: the experimental setup, baselines, datasets, few-shot sampling protocol, and statistical significance tests are not detailed, making it impossible to assess whether the performance numbers support the central claims about MTO alignment.

    Authors: We acknowledge that the abstract and main text do not provide sufficient detail on these elements. The full manuscript contains some description of datasets and baselines, but the few-shot sampling protocol and statistical tests are indeed underspecified. We will expand the experimental setup section to include: exact dataset citations and splits, the few-shot sampling procedure (including seed and size details), the complete list of baselines with references, and results of statistical significance tests (e.g., paired t-tests or bootstrap confidence intervals) for all reported gains. These details will also be summarized concisely in the abstract where space permits. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation of MTO framework

full rationale

The paper introduces the MTO framework, automated data preparation, and novel templates, then reports experimental performance gains on generation and QA tasks. All claims rest on empirical comparisons to baselines and related works rather than any derivation, equation, or fitted parameter that reduces to its own inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, no ansatzes are smuggled, and no predictions are statistically forced from subsets of the same data. The work is self-contained against external benchmarks with code released for reproduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based solely on the abstract; the paper introduces a new framework but details on parameters or axioms are not provided.

axioms (1)
  • domain assumption Standard assumptions in machine learning about model generalization and task alignment
    The framework relies on the idea that pre-training objectives can be matched to downstream tasks effectively.
invented entities (1)
  • MTO framework no independent evidence
    purpose: To determine appropriate objectives for tasks and prepare data accordingly
    Newly proposed in this paper as a method for matching tasks to objectives.

pith-pipeline@v0.9.1-grok · 5761 in / 1278 out tokens · 32974 ms · 2026-06-25T23:04:59.216408+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 11 canonical work pages · 4 internal anchors

  1. [1]

    In: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), pp 6612–6619, URL https://www.aaai.org/Papers/AAAI/2020GB/AAAI-BosselutA.6612.pdf

    Bosselut A, Harrison A, Anastasopoulos A, et al (2020) Comet-atomic 2020: On symbolic and neural commonsense knowledge graphs. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), pp 6612–6619, URL https://www.aaai.org/Papers/AAAI/2020GB/AAAI-BosselutA.6612.pdf

  2. [2]

    COMET: Commonsense Transformers for Automatic Knowledge Graph Construction

    Bosselut A, Rashkin H, Sap M, et al (2020) CoMET: Commonsense trans- formers for automatic knowledge graph construction. In: ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference. Association for Computational Linguistics, Florence, Italy, pp 4762–4779, https://doi.org/10.18653/v1/p19-1470, URL http...

  3. [3]

    Advances in neural information processing systems 33:1877–1901

    Brown T, Mann B, Ryder N, et al (2020) Language models are few-shot learners. Advances in neural information processing systems 33:1877–1901

  4. [4]

    Cao B, Lin H, Han X, et al (2021) Knowledgeable or educated guess? revisiting language models as knowledge bases. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Associ- ation for Computational Linguistics, Online...

  5. [5]

    Machine Intelligence Research pp 1–22

    Cao B, Lin H, Han X, et al (2024) The life cycle of knowledge in big language models: A survey. Machine Intelligence Research pp 1–22

  6. [6]

    In: Conference on Automated Knowledge Base Construction, URL https://api.semanticscholar.org/CorpusID:235657379

    Da J, Bras RL, Lu X, et al (2021) Analyzing commonsense emergence in few-shot knowledge models. In: Conference on Automated Knowledge Base Construction, URL https://api.semanticscholar.org/CorpusID:235657379

  7. [7]

    Feldman J, Davison J, Rush AM (2020) Commonsense knowledge mining from pretrained models. In: EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Confer- ence on Natural Language Processing, Proceedings of the Conference. Association for Computational Linguistics, Hong Kong, China, pp 1173–11...

  8. [8]

    In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, pp 1295–1309

    Feng Y, Chen X, Lin BY, et al (2020) Scalable multi-hop relational reasoning for knowledge-aware question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, pp 1295–1309

  9. [9]

    In: 3rd Conference on Auto- mated Knowledge Base Construction, https://doi.org/10.24432/C5RC75, URL https://openreview.net/forum?id=o7sMlpr9yBW

    Fichtel L, Kalo JC, Balke WT (2021) Prompt tuning or fine-tuning - investigating relational knowledge in pre-trained language models. In: 3rd Conference on Auto- mated Knowledge Base Construction, https://doi.org/10.24432/C5RC75, URL https://openreview.net/forum?id=o7sMlpr9yBW

  10. [10]

    In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, pp 8342–8360

    Gururangan S, Marasovi´ c A, Swayamdipta S, et al (2020) Don’t stop pretrain- ing: Adapt language models to domains and tasks. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, pp 8342–8360

  11. [11]

    In: Vlachos A, Augenstein I (eds) Proceedings of the 17th Conference of the European Chapter of the Associ- ation for Computational Linguistics

    Hase P, Diab M, Celikyilmaz A, et al (2023) Methods for measuring, updating, and visualizing factual beliefs in language models. In: Vlachos A, Augenstein I (eds) Proceedings of the 17th Conference of the European Chapter of the Associ- ation for Computational Linguistics. Association for Computational Linguistics, Dubrovnik, Croatia, pp 2714–2731, https:...

  12. [12]

    Artificial Intelligence p 104149

    He M, Fang T, Wang W, et al (2024) Acquiring and modeling abstract commonsense knowledge via conceptualization. Artificial Intelligence p 104149

  13. [13]

    In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 13417–13432

    Huang Y, Li Y, Xu Y, et al (2023) Mvp-tuning: Multi-view knowledge retrieval with prompt tuning for commonsense reasoning. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 13417–13432

  14. [14]

    URL https: //aclanthology.org/2022.tacl-1.66/

    Jiang Z, Xu FF, Araki J, et al (2020) How can we know what language models know? Transactions of the Association for Computational Linguistics 8:423–438. URL https://www.mitpressjournals.org/doi/abs/10.1162/tacl a 00323

  15. [15]

    In: Oh A, Naumann T, Globerson A, et al (eds) Advances in Neu- ral Information Processing Systems, vol 36

    Kang M, Lee S, Baek J, et al (2023) Knowledge-augmented reason- ing distillation for small language models in knowledge-intensive tasks. In: Oh A, Naumann T, Globerson A, et al (eds) Advances in Neu- ral Information Processing Systems, vol 36. Curran Associates, Inc., pp 48573–48602, URL https://proceedings.neurips.cc/paper files/paper/2023/file/ 97faedc9...

  16. [16]

    arXiv e-prints pp arXiv–2301

    Kazemi M, Mittal S, Ramachandran D (2023) Understanding finetuning for factual knowledge extraction from language models. arXiv e-prints pp arXiv–2301

  17. [17]

    In: Findings of the Association for Computational Lin- guistics: EMNLP 2020

    Khashabi D, Min S, Khot T, et al (2020) Unifiedqa: Crossing format boundaries with a single qa system. In: Findings of the Association for Computational Lin- guistics: EMNLP 2020. Association for Computational Linguistics, pp 1896–1907, 33 URL https://aclanthology.org/2020.findings-emnlp.171

  18. [18]

    In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

    Lester B, Al-Rfou R, Constant N (2021) The power of scale for parameter-efficient prompt tuning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp 3045–3059, https://doi.org/10.18653/ v1/2021.emnlp-main.243, URL https://aclant...

  19. [19]

    In: Liu F, Duan N, Xu Q, et al (eds) Natural Language Processing and Chinese Computing

    Li J, Wang C, Chen Y, et al (2023) What events do pre-trained language mod- els learn from text? probing event-based commonsense knowledge by confidence sorting. In: Liu F, Duan N, Xu Q, et al (eds) Natural Language Processing and Chinese Computing. Springer Nature Switzerland, Cham, pp 669–681

  20. [20]

    Li XL, Liang P (2021) Prefix-tuning: Optimizing continuous prompts for gen- eration. ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Nat- ural Language Processing, Proceedings of the Conference pp 4582–4597. https: //doi.org/10.18653/v1/2021.acl-long.353, 2101.00190

  21. [21]

    In: Annual Meeting of the Association for Computational Linguistics, URL https: //api.semanticscholar.org/CorpusID:964287

    Lin CY (2004) Rouge: A package for automatic evaluation of summaries. In: Annual Meeting of the Association for Computational Linguistics, URL https: //api.semanticscholar.org/CorpusID:964287

  22. [22]

    In: BT technology journal, vol 22

    Liu H, Singh P (2004) Conceptnet: A practical commonsense reasoning toolkit. In: BT technology journal, vol 22. Springer, pp 211–226

  23. [23]

    ACM Computing Surveys 55(9):1–35

    Liu P, Yuan W, Fu J, et al (2023) Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys 55(9):1–35

  24. [24]

    CoRR abs/2103.10385

    Liu X, Zheng Y, Du Z, et al (2021) Gpt understands, too. CoRR abs/2103.10385

  25. [25]

    arXiv preprint arXiv:190711692

    Liu Y, Ott M, Goyal N, et al (2019) Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:190711692

  26. [26]

    In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 13480–13488

    Lourie N, Le Bras R, Bhagavatula C, et al (2021) Unicorn on rainbow: A universal commonsense reasoning model on a new multitask benchmark. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 13480–13488

  27. [27]

    Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

    Mihaylov T, Clark P, Khot T, et al (2018) Can a suit of armor conduct electricity? a new dataset for open book question answering. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, pp 2381–2391, https://doi.org/ 10.18653/v1/D18-1260, URL https://aclanth...

  28. [28]

    Association for Computa- tional Linguistics, Hong Kong, China, pp 2463–2473, https://doi.org/10.18653/ v1/d19-1250, URL https://www.aclweb.org/anthology/D19-1250, 1909.01066

    Petroni F, Rockt¨ aschel T, Lewis P, et al (2020) Language models as knowledge bases? In: EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in 34 Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference. Association for Computa- tional Linguistics, Hong Kong, China, pp 2463–247...

  29. [29]

    In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5203–5212

    Qin G, Eisner J (2021) Learning how to ask: Querying lms with mixtures of soft prompts. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5203–5212

  30. [30]

    Journal of Machine Learning Research 21(140):1–67

    Raffel C, Shazeer N, Roberts A, et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21(140):1–67. URL http://jmlr.org/papers/v21/20-074.html

  31. [31]

    In: ICLR 2022-Tenth International Conference on Learning Representations

    Sanh V, Webson A, Raffel C, et al (2022) Multitask prompted training enables zero-shot task generalization. In: ICLR 2022-Tenth International Conference on Learning Representations

  32. [32]

    Sap M, Le Bras R, Allaway E, et al (2019) ATOMIC: An atlas of machine commonsense for if-then reasoning. In: 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, pp 3027–3035, https:...

  33. [33]

    In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

    Schick T, Sch¨ utze H (2021) Few-Shot Text Generation with Natural Language Instructions. EMNLP 2021 - 2021 Conference on Empirical Methods in Natu- ral Language Processing, Proceedings pp 390–402. https://doi.org/10.18653/v1/ 2021.emnlp-main.32

  34. [34]

    In: International Conference on Machine Learning, PMLR, pp 4596– 4604

    Shazeer N, Stern M (2018) Adafactor: Adaptive learning rates with sublinear memory cost. In: International Conference on Machine Learning, PMLR, pp 4596– 4604

  35. [35]

    Shin T, Razeghi Y, Logan IV RL, et al (2020) Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, pp 4222–4235, URL https: //www.aclweb.org/anthology/2020.emnlp-main.343/

  36. [36]

    Talmor A, Herzig J, Lourie N, et al (2019) CommonsenseQA: A question answer- ing challenge targeting commonsense knowledge. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minnea...

  37. [37]

    In: Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp 174–183

    Wallat J, Singh J, Anand A (2020) Bertnesia: Investigating the capture and forgetting of knowledge in bert. In: Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp 174–183

  38. [38]

    In: International Conference on Learning Representations

    Wang K, Zhang Y, Yang D, et al (2021) Gnn is a counter? revisiting gnn for question answering. In: International Conference on Learning Representations

  39. [39]

    Wang T, Roberts A, Hesslow D, et al (2022) What language model archi- tecture and pretraining objective works best for zero-shot generalization? In: International Conference on Machine Learning, PMLR, pp 22964–22984

  40. [40]

    In: International Conference on Learning Representations (ICLR)

    Wangchunshu Zhou RKSSLBYLXRDong-Ho Lee (2021) Pre-training text-to-text transformers for concept-centric common sense. In: International Conference on Learning Representations (ICLR)

  41. [41]

    West P, Bhagavatula C, Hessel J, et al (2022) Symbolic knowledge distillation: from general language models to commonsense models. In: Carpuat M, de Marn- effe MC, Meza Ruiz IV (eds) Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lan- guage Technologies. Association for Computationa...

  42. [42]

    Yasunaga M, Ren H, Bosselut A, et al (2021) Qa-gnn: reasoning with language models and knowledge graphs for question answering. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, Association for Computational Linguistics, Online, pp 535–546

  43. [43]

    In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp 2039–2055

    Yin D, Bansal H, Monajatipoor M, et al (2022) Geomlama: Geo-diverse com- monsense probing on multilingual pre-trained language models. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp 2039–2055

  44. [44]

    Artificial Intelligence 309:103740

    Zhang H, Liu X, Pan H, et al (2022) Aser: Towards large-scale commonsense knowledge acquisition via higher-order selectional preference over eventualities. Artificial Intelligence 309:103740

  45. [45]

    In: Luo B, Cheng L, Wu ZG, et al (eds) Neural Information Processing

    Zhang L, Li R (2024) Knowledge prompting with contrastive learning for unsuper- vised commonsenseqa. In: Luo B, Cheng L, Wu ZG, et al (eds) Neural Information Processing. Springer Nature Singapore, Singapore, pp 27–38

  46. [46]

    In: International Conference on Learning Representations (ICLR), URL https://openreview.net/forum?id=SkeHuCVFDr 36

    Zhang T, Kishore V, Wu F, et al (2020) Bertscore: Evaluating text generation with bert. In: International Conference on Learning Representations (ICLR), URL https://openreview.net/forum?id=SkeHuCVFDr 36

  47. [47]

    In: International Conference on Representation Learning (ICLR)

    Zhang X, Bosselut A, Yasunaga M, et al (2022) Greaselm: Graph reasoning enhanced language models for question answering. In: International Conference on Representation Learning (ICLR)

  48. [48]

    In: Findings of the North American Chapter of the Associa- tion for Computational Linguistics: NAACL 2022, Association for Computational Linguistics

    Zhao WX, Jiang J, Zhou K, et al (2022) Great truths are always simple: A rather simple knowledge encoder for enhancing the commonsense reasoning capacity of pre-trained models. In: Findings of the North American Chapter of the Associa- tion for Computational Linguistics: NAACL 2022, Association for Computational Linguistics

  49. [49]

    In: International Conference on Machine Learning, PMLR, pp 12697–12706

    Zhao Z, Wallace E, Feng S, et al (2021) Calibrate before use: Improving few- shot performance of language models. In: International Conference on Machine Learning, PMLR, pp 12697–12706

  50. [50]

    learning to recall

    Zhong Z, Friedman D, Chen D (2021) Factual probing is [mask]: Learning vs. learning to recall. In: Proceedings of the 2021 Conference of the North Ameri- can Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5017–5033

  51. [51]

    Zhou X, Zhang Y, Cui L, et al (2020) Evaluating commonsense in pre-trained lan- guage models. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 9733–9740 Appendix A Templates and Tasks A.1 Templates for the relations Table A1 presents a compilation of natural language phrases that have been employed for formatting relation tuples into ...