Matching Tasks to Objectives: Fine-Tuning and Prompt-Tuning Strategies for Encoder-Decoder Pre-trained Language Models

Ahmad Pouramini; Hesham Faili

arxiv: 2606.24841 · v1 · pith:IOAMESU7new · submitted 2026-06-23 · 💻 cs.AI · cs.CL

Matching Tasks to Objectives: Fine-Tuning and Prompt-Tuning Strategies for Encoder-Decoder Pre-trained Language Models

Ahmad Pouramini , Hesham Faili This is my paper

Pith reviewed 2026-06-25 23:04 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords MTO frameworkencoder-decoder modelspre-training objectivesfine-tuningprompt-tuningfew-shot learningcommonsense knowledgequestion answering

0 comments

The pith

The MTO framework matches tasks to pre-training objectives to achieve over 120% performance gains in few-shot encoder-decoder model adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how pre-training objectives influence encoder-decoder language models on generation and question answering tasks that involve commonsense knowledge retrieval and completion. It introduces the Match Task to Objective framework to select suitable objectives and prepare related data through unsupervised methods for adaptation. Novel templates are designed to align with the chosen objectives during both fine-tuning and prompt-tuning stages. When the objectives match the task needs, the strategies produce gains exceeding 120 percent over conventional approaches in few-shot conditions while also beating baselines in full-data settings. The work extends the same alignment principle to soft prompt engineering for further performance lifts.

Core claim

The central claim is that the Match Task to Objective framework identifies the appropriate pre-training objective for a task, prepares task-related data via unsupervised training based on that objective, and supports novel templates that align with the objectives in fine-tuning and prompt-tuning; when this matching occurs, encoder-decoder models deliver performance gains of over 120 percent compared to conventional methods in few-shot settings, outperform related works in those regimes, and exceed the baseline even with full datasets, with similar benefits observed when extending the approach to prompt-tuning.

What carries the argument

The Match Task to Objective (MTO) framework, which determines the suitable pre-training objective for a task and supplies automated unsupervised methods to prepare data for adaptation.

If this is right

Encoder-decoder models adapted with matched objectives outperform conventional methods by over 120 percent in few-shot regimes for commonsense tasks.
The same matching strategy improves results even when full training datasets are available.
Novel templates aligned to objectives enhance both fine-tuning and prompt-tuning performance.
Guidance emerges for selecting objectives and optimizing soft prompts for specific tasks.
The approach significantly exceeds related works in few-shot settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Objective alignment may address a core mismatch that limits current adaptation methods across different model families.
The unsupervised data preparation step could reduce reliance on labeled examples when extending to new domains.
Applying the framework to additional tasks such as summarization or translation would test whether the gains generalize.
Template design might itself be automated by reusing the objective-matching logic.

Load-bearing premise

The assumption that the MTO framework can accurately identify the appropriate pre-training objective for a given task and that the novel templates will align effectively to produce the claimed performance improvements.

What would settle it

A controlled experiment on the same generation and question answering tasks in which applying MTO-selected objectives and aligned templates produces no gain or a performance drop relative to standard fine-tuning and prompt-tuning in few-shot settings.

read the original abstract

Prompt-based learning has emerged as a dominant paradigm in natural language processing. This study explores the impact of diverse pre-training objectives on the performance of encoder-decoder pre-trained language models across generation and question answering tasks, with a focus on commonsense knowledge retrieval and completion. We highlight the benefits of incorporating multiple objectives during both pre-training and fine-tuning stages. We introduce the Match Task to Objective (MTO) framework and methods for determining the appropriate objective for a given task. This framework offers automated methods to prepare task-related data for adaptation through unsupervised training, based on the identified objective. In the fine-tuning stage, we design novel templates that align with the objectives of the pre-training and adaptation stages. When aligned with task requirements, these strategies can achieve a performance gain of over 120\% compared to conventional methods in few-shot settings. They significantly outperform related works in few-shot settings and exceed the baseline even in full-dataset scenarios. Furthermore, we extend this approach to include prompt-tuning methodologies, providing guidance for more effective soft prompt engineering and optimization. Our strategies significantly enhance prompt-tuning performance as well. These insights hold substantial value, precisely guiding the selection and optimization of models customized for specific tasks. Code is available at https://github.com/puraminy/MTO/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The MTO framework is a practical idea for objective matching, but the large performance claims lack the necessary ablations and validation.

read the letter

The paper's main contribution is the Match Task to Objective (MTO) framework, which tries to pick the right pre-training objective for a given task and then uses that to guide data preparation and template design for fine-tuning and prompt-tuning.

They apply this to encoder-decoder models on generation and question answering, with a focus on commonsense tasks. The automated unsupervised preparation of task-related data is a practical step, and they show how to align templates with the objectives. Extending the same logic to soft prompt engineering is also reasonable. Releasing the code on GitHub is a plus.

The big claims about over 120% gains in few-shot settings and outperforming baselines even in full data are the part that needs more scrutiny. The abstract describes the methods but does not include details on how they validate that the MTO procedure picks the correct objective, or any ablation that holds the objective fixed and varies only the templates. Without those, it's not clear if the gains come from the alignment or from other experimental choices.

Overall the thinking is straightforward and engages with the prompt-based learning literature without obvious contradictions.

This kind of work is aimed at practitioners and researchers who adapt large language models for specific tasks with limited data. Someone looking for concrete strategies on objective selection and template design could find useful pointers here.

I would recommend sending it for peer review. The idea has potential and the code makes it checkable, but the experiments will need to address the validation gaps to be convincing.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Match Task to Objective (MTO) framework for encoder-decoder pre-trained language models. It automates unsupervised identification of suitable pre-training objectives for generation and QA tasks (focusing on commonsense knowledge), prepares adaptation data accordingly, and designs aligned templates for fine-tuning and prompt-tuning. The central empirical claim is that these strategies yield over 120% performance gains versus conventional methods in few-shot settings, outperform related work in few-shot regimes, and exceed baselines even with full data; the approach is also extended to soft prompt engineering.

Significance. If the MTO identification procedure and template contributions are shown to be reliable via ablations and statistical tests, the work could supply actionable guidance for objective-aware adaptation of encoder-decoder models in low-data regimes. The public code release at https://github.com/puraminy/MTO/ is a clear strength for reproducibility.

major comments (2)

[Abstract] Abstract: the headline claim of >120% few-shot gain (and outperformance statements) rests on the unverified assumptions that (1) the unsupervised MTO procedure reliably selects the correct pre-training objective for each task and (2) the novel templates, rather than incidental factors, drive the reported improvements. No quantitative measure of identification accuracy, no ablation isolating objective choice versus random/fixed objectives, and no controlled comparison of novel versus standard templates at fixed objective are described.
[Abstract] Abstract and methods description: the experimental setup, baselines, datasets, few-shot sampling protocol, and statistical significance tests are not detailed, making it impossible to assess whether the performance numbers support the central claims about MTO alignment.

minor comments (1)

[Abstract] The abstract states that the framework 'offers automated methods to prepare task-related data' but does not specify the unsupervised criteria or any validation of those criteria.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for stronger empirical support and clearer experimental details. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and analyses.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim of >120% few-shot gain (and outperformance statements) rests on the unverified assumptions that (1) the unsupervised MTO procedure reliably selects the correct pre-training objective for each task and (2) the novel templates, rather than incidental factors, drive the reported improvements. No quantitative measure of identification accuracy, no ablation isolating objective choice versus random/fixed objectives, and no controlled comparison of novel versus standard templates at fixed objective are described.

Authors: We agree that the abstract's performance claims would be strengthened by explicit validation of the MTO procedure and template contributions. The current manuscript presents the MTO framework and reports gains but does not include a quantitative accuracy metric for objective identification or the requested ablations. We will add (1) an evaluation of MTO identification accuracy against held-out task-objective pairs, (2) an ablation comparing MTO-selected objectives against random and fixed-objective baselines, and (3) a controlled comparison of the novel templates versus standard templates while holding the objective fixed. These additions will be placed in a new experimental subsection and referenced from the abstract. revision: yes
Referee: [Abstract] Abstract and methods description: the experimental setup, baselines, datasets, few-shot sampling protocol, and statistical significance tests are not detailed, making it impossible to assess whether the performance numbers support the central claims about MTO alignment.

Authors: We acknowledge that the abstract and main text do not provide sufficient detail on these elements. The full manuscript contains some description of datasets and baselines, but the few-shot sampling protocol and statistical tests are indeed underspecified. We will expand the experimental setup section to include: exact dataset citations and splits, the few-shot sampling procedure (including seed and size details), the complete list of baselines with references, and results of statistical significance tests (e.g., paired t-tests or bootstrap confidence intervals) for all reported gains. These details will also be summarized concisely in the abstract where space permits. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation of MTO framework

full rationale

The paper introduces the MTO framework, automated data preparation, and novel templates, then reports experimental performance gains on generation and QA tasks. All claims rest on empirical comparisons to baselines and related works rather than any derivation, equation, or fitted parameter that reduces to its own inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, no ansatzes are smuggled, and no predictions are statistically forced from subsets of the same data. The work is self-contained against external benchmarks with code released for reproduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based solely on the abstract; the paper introduces a new framework but details on parameters or axioms are not provided.

axioms (1)

domain assumption Standard assumptions in machine learning about model generalization and task alignment
The framework relies on the idea that pre-training objectives can be matched to downstream tasks effectively.

invented entities (1)

MTO framework no independent evidence
purpose: To determine appropriate objectives for tasks and prepare data accordingly
Newly proposed in this paper as a method for matching tasks to objectives.

pith-pipeline@v0.9.1-grok · 5761 in / 1278 out tokens · 32974 ms · 2026-06-25T23:04:59.216408+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 11 canonical work pages · 4 internal anchors

[1]

In: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), pp 6612–6619, URL https://www.aaai.org/Papers/AAAI/2020GB/AAAI-BosselutA.6612.pdf

Bosselut A, Harrison A, Anastasopoulos A, et al (2020) Comet-atomic 2020: On symbolic and neural commonsense knowledge graphs. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), pp 6612–6619, URL https://www.aaai.org/Papers/AAAI/2020GB/AAAI-BosselutA.6612.pdf

2020
[2]

COMET: Commonsense Transformers for Automatic Knowledge Graph Construction

Bosselut A, Rashkin H, Sap M, et al (2020) CoMET: Commonsense trans- formers for automatic knowledge graph construction. In: ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference. Association for Computational Linguistics, Florence, Italy, pp 4762–4779, https://doi.org/10.18653/v1/p19-1470, URL http...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/p19-1470 2020
[3]

Advances in neural information processing systems 33:1877–1901

Brown T, Mann B, Ryder N, et al (2020) Language models are few-shot learners. Advances in neural information processing systems 33:1877–1901

2020
[4]

Cao B, Lin H, Han X, et al (2021) Knowledgeable or educated guess? revisiting language models as knowledge bases. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Associ- ation for Computational Linguistics, Online...

2021
[5]

Machine Intelligence Research pp 1–22

Cao B, Lin H, Han X, et al (2024) The life cycle of knowledge in big language models: A survey. Machine Intelligence Research pp 1–22

2024
[6]

In: Conference on Automated Knowledge Base Construction, URL https://api.semanticscholar.org/CorpusID:235657379

Da J, Bras RL, Lu X, et al (2021) Analyzing commonsense emergence in few-shot knowledge models. In: Conference on Automated Knowledge Base Construction, URL https://api.semanticscholar.org/CorpusID:235657379

2021
[7]

Feldman J, Davison J, Rush AM (2020) Commonsense knowledge mining from pretrained models. In: EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Confer- ence on Natural Language Processing, Proceedings of the Conference. Association for Computational Linguistics, Hong Kong, China, pp 1173–11...

work page doi:10.18653/v1/d19-1109 2020
[8]

In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, pp 1295–1309

Feng Y, Chen X, Lin BY, et al (2020) Scalable multi-hop relational reasoning for knowledge-aware question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, pp 1295–1309

2020
[9]

In: 3rd Conference on Auto- mated Knowledge Base Construction, https://doi.org/10.24432/C5RC75, URL https://openreview.net/forum?id=o7sMlpr9yBW

Fichtel L, Kalo JC, Balke WT (2021) Prompt tuning or fine-tuning - investigating relational knowledge in pre-trained language models. In: 3rd Conference on Auto- mated Knowledge Base Construction, https://doi.org/10.24432/C5RC75, URL https://openreview.net/forum?id=o7sMlpr9yBW

work page doi:10.24432/c5rc75 2021
[10]

In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, pp 8342–8360

Gururangan S, Marasovi´ c A, Swayamdipta S, et al (2020) Don’t stop pretrain- ing: Adapt language models to domains and tasks. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, pp 8342–8360

2020
[11]

In: Vlachos A, Augenstein I (eds) Proceedings of the 17th Conference of the European Chapter of the Associ- ation for Computational Linguistics

Hase P, Diab M, Celikyilmaz A, et al (2023) Methods for measuring, updating, and visualizing factual beliefs in language models. In: Vlachos A, Augenstein I (eds) Proceedings of the 17th Conference of the European Chapter of the Associ- ation for Computational Linguistics. Association for Computational Linguistics, Dubrovnik, Croatia, pp 2714–2731, https:...

work page doi:10.18653/v1/2023.eacl-main 2023
[12]

Artificial Intelligence p 104149

He M, Fang T, Wang W, et al (2024) Acquiring and modeling abstract commonsense knowledge via conceptualization. Artificial Intelligence p 104149

2024
[13]

In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 13417–13432

Huang Y, Li Y, Xu Y, et al (2023) Mvp-tuning: Multi-view knowledge retrieval with prompt tuning for commonsense reasoning. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 13417–13432

2023
[14]

URL https: //aclanthology.org/2022.tacl-1.66/

Jiang Z, Xu FF, Araki J, et al (2020) How can we know what language models know? Transactions of the Association for Computational Linguistics 8:423–438. URL https://www.mitpressjournals.org/doi/abs/10.1162/tacl a 00323

work page internal anchor Pith review doi:10.1162/tacl 2020
[15]

In: Oh A, Naumann T, Globerson A, et al (eds) Advances in Neu- ral Information Processing Systems, vol 36

Kang M, Lee S, Baek J, et al (2023) Knowledge-augmented reason- ing distillation for small language models in knowledge-intensive tasks. In: Oh A, Naumann T, Globerson A, et al (eds) Advances in Neu- ral Information Processing Systems, vol 36. Curran Associates, Inc., pp 48573–48602, URL https://proceedings.neurips.cc/paper files/paper/2023/file/ 97faedc9...

2023
[16]

arXiv e-prints pp arXiv–2301

Kazemi M, Mittal S, Ramachandran D (2023) Understanding finetuning for factual knowledge extraction from language models. arXiv e-prints pp arXiv–2301

2023
[17]

In: Findings of the Association for Computational Lin- guistics: EMNLP 2020

Khashabi D, Min S, Khot T, et al (2020) Unifiedqa: Crossing format boundaries with a single qa system. In: Findings of the Association for Computational Lin- guistics: EMNLP 2020. Association for Computational Linguistics, pp 1896–1907, 33 URL https://aclanthology.org/2020.findings-emnlp.171

2020
[18]

In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Lester B, Al-Rfou R, Constant N (2021) The power of scale for parameter-efficient prompt tuning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp 3045–3059, https://doi.org/10.18653/ v1/2021.emnlp-main.243, URL https://aclant...

2021
[19]

In: Liu F, Duan N, Xu Q, et al (eds) Natural Language Processing and Chinese Computing

Li J, Wang C, Chen Y, et al (2023) What events do pre-trained language mod- els learn from text? probing event-based commonsense knowledge by confidence sorting. In: Liu F, Duan N, Xu Q, et al (eds) Natural Language Processing and Chinese Computing. Springer Nature Switzerland, Cham, pp 669–681

2023
[20]

Li XL, Liang P (2021) Prefix-tuning: Optimizing continuous prompts for gen- eration. ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Nat- ural Language Processing, Proceedings of the Conference pp 4582–4597. https: //doi.org/10.18653/v1/2021.acl-long.353, 2101.00190

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2021.acl-long.353 2021
[21]

In: Annual Meeting of the Association for Computational Linguistics, URL https: //api.semanticscholar.org/CorpusID:964287

Lin CY (2004) Rouge: A package for automatic evaluation of summaries. In: Annual Meeting of the Association for Computational Linguistics, URL https: //api.semanticscholar.org/CorpusID:964287

2004
[22]

In: BT technology journal, vol 22

Liu H, Singh P (2004) Conceptnet: A practical commonsense reasoning toolkit. In: BT technology journal, vol 22. Springer, pp 211–226

2004
[23]

ACM Computing Surveys 55(9):1–35

Liu P, Yuan W, Fu J, et al (2023) Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys 55(9):1–35

2023
[24]

CoRR abs/2103.10385

Liu X, Zheng Y, Du Z, et al (2021) Gpt understands, too. CoRR abs/2103.10385

arXiv 2021
[25]

arXiv preprint arXiv:190711692

Liu Y, Ott M, Goyal N, et al (2019) Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:190711692

2019
[26]

In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 13480–13488

Lourie N, Le Bras R, Bhagavatula C, et al (2021) Unicorn on rainbow: A universal commonsense reasoning model on a new multitask benchmark. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 13480–13488

2021
[27]

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

Mihaylov T, Clark P, Khot T, et al (2018) Can a suit of armor conduct electricity? a new dataset for open book question answering. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, pp 2381–2391, https://doi.org/ 10.18653/v1/D18-1260, URL https://aclanth...

work page doi:10.18653/v1/d18-1260 2018
[28]

Association for Computa- tional Linguistics, Hong Kong, China, pp 2463–2473, https://doi.org/10.18653/ v1/d19-1250, URL https://www.aclweb.org/anthology/D19-1250, 1909.01066

Petroni F, Rockt¨ aschel T, Lewis P, et al (2020) Language models as knowledge bases? In: EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in 34 Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference. Association for Computa- tional Linguistics, Hong Kong, China, pp 2463–247...

Pith/arXiv arXiv 2020
[29]

In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5203–5212

Qin G, Eisner J (2021) Learning how to ask: Querying lms with mixtures of soft prompts. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5203–5212

2021
[30]

Journal of Machine Learning Research 21(140):1–67

Raffel C, Shazeer N, Roberts A, et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21(140):1–67. URL http://jmlr.org/papers/v21/20-074.html

2020
[31]

In: ICLR 2022-Tenth International Conference on Learning Representations

Sanh V, Webson A, Raffel C, et al (2022) Multitask prompted training enables zero-shot task generalization. In: ICLR 2022-Tenth International Conference on Learning Representations

2022
[32]

Sap M, Le Bras R, Allaway E, et al (2019) ATOMIC: An atlas of machine commonsense for if-then reasoning. In: 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, pp 3027–3035, https:...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1609/aaai 2019
[33]

In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Schick T, Sch¨ utze H (2021) Few-Shot Text Generation with Natural Language Instructions. EMNLP 2021 - 2021 Conference on Empirical Methods in Natu- ral Language Processing, Proceedings pp 390–402. https://doi.org/10.18653/v1/ 2021.emnlp-main.32

work page doi:10.18653/v1/ 2021
[34]

In: International Conference on Machine Learning, PMLR, pp 4596– 4604

Shazeer N, Stern M (2018) Adafactor: Adaptive learning rates with sublinear memory cost. In: International Conference on Machine Learning, PMLR, pp 4596– 4604

2018
[35]

Shin T, Razeghi Y, Logan IV RL, et al (2020) Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, pp 4222–4235, URL https: //www.aclweb.org/anthology/2020.emnlp-main.343/

2020
[36]

Talmor A, Herzig J, Lourie N, et al (2019) CommonsenseQA: A question answer- ing challenge targeting commonsense knowledge. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minnea...

work page doi:10.18653/v1/n19-1421 2019
[37]

In: Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp 174–183

Wallat J, Singh J, Anand A (2020) Bertnesia: Investigating the capture and forgetting of knowledge in bert. In: Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp 174–183

2020
[38]

In: International Conference on Learning Representations

Wang K, Zhang Y, Yang D, et al (2021) Gnn is a counter? revisiting gnn for question answering. In: International Conference on Learning Representations

2021
[39]

Wang T, Roberts A, Hesslow D, et al (2022) What language model archi- tecture and pretraining objective works best for zero-shot generalization? In: International Conference on Machine Learning, PMLR, pp 22964–22984

2022
[40]

In: International Conference on Learning Representations (ICLR)

Wangchunshu Zhou RKSSLBYLXRDong-Ho Lee (2021) Pre-training text-to-text transformers for concept-centric common sense. In: International Conference on Learning Representations (ICLR)

2021
[41]

West P, Bhagavatula C, Hessel J, et al (2022) Symbolic knowledge distillation: from general language models to commonsense models. In: Carpuat M, de Marn- effe MC, Meza Ruiz IV (eds) Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lan- guage Technologies. Association for Computationa...

work page doi:10.18653/v1/2022.naacl-main.341 2022
[42]

Yasunaga M, Ren H, Bosselut A, et al (2021) Qa-gnn: reasoning with language models and knowledge graphs for question answering. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, Association for Computational Linguistics, Online, pp 535–546

2021
[43]

In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp 2039–2055

Yin D, Bansal H, Monajatipoor M, et al (2022) Geomlama: Geo-diverse com- monsense probing on multilingual pre-trained language models. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp 2039–2055

2022
[44]

Artificial Intelligence 309:103740

Zhang H, Liu X, Pan H, et al (2022) Aser: Towards large-scale commonsense knowledge acquisition via higher-order selectional preference over eventualities. Artificial Intelligence 309:103740

2022
[45]

In: Luo B, Cheng L, Wu ZG, et al (eds) Neural Information Processing

Zhang L, Li R (2024) Knowledge prompting with contrastive learning for unsuper- vised commonsenseqa. In: Luo B, Cheng L, Wu ZG, et al (eds) Neural Information Processing. Springer Nature Singapore, Singapore, pp 27–38

2024
[46]

In: International Conference on Learning Representations (ICLR), URL https://openreview.net/forum?id=SkeHuCVFDr 36

Zhang T, Kishore V, Wu F, et al (2020) Bertscore: Evaluating text generation with bert. In: International Conference on Learning Representations (ICLR), URL https://openreview.net/forum?id=SkeHuCVFDr 36

2020
[47]

In: International Conference on Representation Learning (ICLR)

Zhang X, Bosselut A, Yasunaga M, et al (2022) Greaselm: Graph reasoning enhanced language models for question answering. In: International Conference on Representation Learning (ICLR)

2022
[48]

In: Findings of the North American Chapter of the Associa- tion for Computational Linguistics: NAACL 2022, Association for Computational Linguistics

Zhao WX, Jiang J, Zhou K, et al (2022) Great truths are always simple: A rather simple knowledge encoder for enhancing the commonsense reasoning capacity of pre-trained models. In: Findings of the North American Chapter of the Associa- tion for Computational Linguistics: NAACL 2022, Association for Computational Linguistics

2022
[49]

In: International Conference on Machine Learning, PMLR, pp 12697–12706

Zhao Z, Wallace E, Feng S, et al (2021) Calibrate before use: Improving few- shot performance of language models. In: International Conference on Machine Learning, PMLR, pp 12697–12706

2021
[50]

learning to recall

Zhong Z, Friedman D, Chen D (2021) Factual probing is [mask]: Learning vs. learning to recall. In: Proceedings of the 2021 Conference of the North Ameri- can Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5017–5033

2021
[51]

Zhou X, Zhang Y, Cui L, et al (2020) Evaluating commonsense in pre-trained lan- guage models. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 9733–9740 Appendix A Templates and Tasks A.1 Templates for the relations Table A1 presents a compilation of natural language phrases that have been employed for formatting relation tuples into ...

2020

[1] [1]

In: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), pp 6612–6619, URL https://www.aaai.org/Papers/AAAI/2020GB/AAAI-BosselutA.6612.pdf

Bosselut A, Harrison A, Anastasopoulos A, et al (2020) Comet-atomic 2020: On symbolic and neural commonsense knowledge graphs. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), pp 6612–6619, URL https://www.aaai.org/Papers/AAAI/2020GB/AAAI-BosselutA.6612.pdf

2020

[2] [2]

COMET: Commonsense Transformers for Automatic Knowledge Graph Construction

Bosselut A, Rashkin H, Sap M, et al (2020) CoMET: Commonsense trans- formers for automatic knowledge graph construction. In: ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference. Association for Computational Linguistics, Florence, Italy, pp 4762–4779, https://doi.org/10.18653/v1/p19-1470, URL http...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/p19-1470 2020

[3] [3]

Advances in neural information processing systems 33:1877–1901

Brown T, Mann B, Ryder N, et al (2020) Language models are few-shot learners. Advances in neural information processing systems 33:1877–1901

2020

[4] [4]

Cao B, Lin H, Han X, et al (2021) Knowledgeable or educated guess? revisiting language models as knowledge bases. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Associ- ation for Computational Linguistics, Online...

2021

[5] [5]

Machine Intelligence Research pp 1–22

Cao B, Lin H, Han X, et al (2024) The life cycle of knowledge in big language models: A survey. Machine Intelligence Research pp 1–22

2024

[6] [6]

In: Conference on Automated Knowledge Base Construction, URL https://api.semanticscholar.org/CorpusID:235657379

Da J, Bras RL, Lu X, et al (2021) Analyzing commonsense emergence in few-shot knowledge models. In: Conference on Automated Knowledge Base Construction, URL https://api.semanticscholar.org/CorpusID:235657379

2021

[7] [7]

Feldman J, Davison J, Rush AM (2020) Commonsense knowledge mining from pretrained models. In: EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Confer- ence on Natural Language Processing, Proceedings of the Conference. Association for Computational Linguistics, Hong Kong, China, pp 1173–11...

work page doi:10.18653/v1/d19-1109 2020

[8] [8]

In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, pp 1295–1309

Feng Y, Chen X, Lin BY, et al (2020) Scalable multi-hop relational reasoning for knowledge-aware question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, pp 1295–1309

2020

[9] [9]

In: 3rd Conference on Auto- mated Knowledge Base Construction, https://doi.org/10.24432/C5RC75, URL https://openreview.net/forum?id=o7sMlpr9yBW

Fichtel L, Kalo JC, Balke WT (2021) Prompt tuning or fine-tuning - investigating relational knowledge in pre-trained language models. In: 3rd Conference on Auto- mated Knowledge Base Construction, https://doi.org/10.24432/C5RC75, URL https://openreview.net/forum?id=o7sMlpr9yBW

work page doi:10.24432/c5rc75 2021

[10] [10]

In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, pp 8342–8360

Gururangan S, Marasovi´ c A, Swayamdipta S, et al (2020) Don’t stop pretrain- ing: Adapt language models to domains and tasks. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, pp 8342–8360

2020

[11] [11]

In: Vlachos A, Augenstein I (eds) Proceedings of the 17th Conference of the European Chapter of the Associ- ation for Computational Linguistics

Hase P, Diab M, Celikyilmaz A, et al (2023) Methods for measuring, updating, and visualizing factual beliefs in language models. In: Vlachos A, Augenstein I (eds) Proceedings of the 17th Conference of the European Chapter of the Associ- ation for Computational Linguistics. Association for Computational Linguistics, Dubrovnik, Croatia, pp 2714–2731, https:...

work page doi:10.18653/v1/2023.eacl-main 2023

[12] [12]

Artificial Intelligence p 104149

He M, Fang T, Wang W, et al (2024) Acquiring and modeling abstract commonsense knowledge via conceptualization. Artificial Intelligence p 104149

2024

[13] [13]

In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 13417–13432

Huang Y, Li Y, Xu Y, et al (2023) Mvp-tuning: Multi-view knowledge retrieval with prompt tuning for commonsense reasoning. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 13417–13432

2023

[14] [14]

URL https: //aclanthology.org/2022.tacl-1.66/

Jiang Z, Xu FF, Araki J, et al (2020) How can we know what language models know? Transactions of the Association for Computational Linguistics 8:423–438. URL https://www.mitpressjournals.org/doi/abs/10.1162/tacl a 00323

work page internal anchor Pith review doi:10.1162/tacl 2020

[15] [15]

In: Oh A, Naumann T, Globerson A, et al (eds) Advances in Neu- ral Information Processing Systems, vol 36

Kang M, Lee S, Baek J, et al (2023) Knowledge-augmented reason- ing distillation for small language models in knowledge-intensive tasks. In: Oh A, Naumann T, Globerson A, et al (eds) Advances in Neu- ral Information Processing Systems, vol 36. Curran Associates, Inc., pp 48573–48602, URL https://proceedings.neurips.cc/paper files/paper/2023/file/ 97faedc9...

2023

[16] [16]

arXiv e-prints pp arXiv–2301

Kazemi M, Mittal S, Ramachandran D (2023) Understanding finetuning for factual knowledge extraction from language models. arXiv e-prints pp arXiv–2301

2023

[17] [17]

In: Findings of the Association for Computational Lin- guistics: EMNLP 2020

Khashabi D, Min S, Khot T, et al (2020) Unifiedqa: Crossing format boundaries with a single qa system. In: Findings of the Association for Computational Lin- guistics: EMNLP 2020. Association for Computational Linguistics, pp 1896–1907, 33 URL https://aclanthology.org/2020.findings-emnlp.171

2020

[18] [18]

In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Lester B, Al-Rfou R, Constant N (2021) The power of scale for parameter-efficient prompt tuning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp 3045–3059, https://doi.org/10.18653/ v1/2021.emnlp-main.243, URL https://aclant...

2021

[19] [19]

In: Liu F, Duan N, Xu Q, et al (eds) Natural Language Processing and Chinese Computing

Li J, Wang C, Chen Y, et al (2023) What events do pre-trained language mod- els learn from text? probing event-based commonsense knowledge by confidence sorting. In: Liu F, Duan N, Xu Q, et al (eds) Natural Language Processing and Chinese Computing. Springer Nature Switzerland, Cham, pp 669–681

2023

[20] [20]

Li XL, Liang P (2021) Prefix-tuning: Optimizing continuous prompts for gen- eration. ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Nat- ural Language Processing, Proceedings of the Conference pp 4582–4597. https: //doi.org/10.18653/v1/2021.acl-long.353, 2101.00190

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2021.acl-long.353 2021

[21] [21]

In: Annual Meeting of the Association for Computational Linguistics, URL https: //api.semanticscholar.org/CorpusID:964287

Lin CY (2004) Rouge: A package for automatic evaluation of summaries. In: Annual Meeting of the Association for Computational Linguistics, URL https: //api.semanticscholar.org/CorpusID:964287

2004

[22] [22]

In: BT technology journal, vol 22

Liu H, Singh P (2004) Conceptnet: A practical commonsense reasoning toolkit. In: BT technology journal, vol 22. Springer, pp 211–226

2004

[23] [23]

ACM Computing Surveys 55(9):1–35

Liu P, Yuan W, Fu J, et al (2023) Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys 55(9):1–35

2023

[24] [24]

CoRR abs/2103.10385

Liu X, Zheng Y, Du Z, et al (2021) Gpt understands, too. CoRR abs/2103.10385

arXiv 2021

[25] [25]

arXiv preprint arXiv:190711692

Liu Y, Ott M, Goyal N, et al (2019) Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:190711692

2019

[26] [26]

In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 13480–13488

Lourie N, Le Bras R, Bhagavatula C, et al (2021) Unicorn on rainbow: A universal commonsense reasoning model on a new multitask benchmark. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 13480–13488

2021

[27] [27]

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

Mihaylov T, Clark P, Khot T, et al (2018) Can a suit of armor conduct electricity? a new dataset for open book question answering. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, pp 2381–2391, https://doi.org/ 10.18653/v1/D18-1260, URL https://aclanth...

work page doi:10.18653/v1/d18-1260 2018

[28] [28]

Association for Computa- tional Linguistics, Hong Kong, China, pp 2463–2473, https://doi.org/10.18653/ v1/d19-1250, URL https://www.aclweb.org/anthology/D19-1250, 1909.01066

Petroni F, Rockt¨ aschel T, Lewis P, et al (2020) Language models as knowledge bases? In: EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in 34 Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference. Association for Computa- tional Linguistics, Hong Kong, China, pp 2463–247...

Pith/arXiv arXiv 2020

[29] [29]

In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5203–5212

Qin G, Eisner J (2021) Learning how to ask: Querying lms with mixtures of soft prompts. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5203–5212

2021

[30] [30]

Journal of Machine Learning Research 21(140):1–67

Raffel C, Shazeer N, Roberts A, et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21(140):1–67. URL http://jmlr.org/papers/v21/20-074.html

2020

[31] [31]

In: ICLR 2022-Tenth International Conference on Learning Representations

Sanh V, Webson A, Raffel C, et al (2022) Multitask prompted training enables zero-shot task generalization. In: ICLR 2022-Tenth International Conference on Learning Representations

2022

[32] [32]

Sap M, Le Bras R, Allaway E, et al (2019) ATOMIC: An atlas of machine commonsense for if-then reasoning. In: 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, pp 3027–3035, https:...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1609/aaai 2019

[33] [33]

In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Schick T, Sch¨ utze H (2021) Few-Shot Text Generation with Natural Language Instructions. EMNLP 2021 - 2021 Conference on Empirical Methods in Natu- ral Language Processing, Proceedings pp 390–402. https://doi.org/10.18653/v1/ 2021.emnlp-main.32

work page doi:10.18653/v1/ 2021

[34] [34]

In: International Conference on Machine Learning, PMLR, pp 4596– 4604

Shazeer N, Stern M (2018) Adafactor: Adaptive learning rates with sublinear memory cost. In: International Conference on Machine Learning, PMLR, pp 4596– 4604

2018

[35] [35]

Shin T, Razeghi Y, Logan IV RL, et al (2020) Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, pp 4222–4235, URL https: //www.aclweb.org/anthology/2020.emnlp-main.343/

2020

[36] [36]

Talmor A, Herzig J, Lourie N, et al (2019) CommonsenseQA: A question answer- ing challenge targeting commonsense knowledge. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minnea...

work page doi:10.18653/v1/n19-1421 2019

[37] [37]

In: Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp 174–183

Wallat J, Singh J, Anand A (2020) Bertnesia: Investigating the capture and forgetting of knowledge in bert. In: Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp 174–183

2020

[38] [38]

In: International Conference on Learning Representations

Wang K, Zhang Y, Yang D, et al (2021) Gnn is a counter? revisiting gnn for question answering. In: International Conference on Learning Representations

2021

[39] [39]

Wang T, Roberts A, Hesslow D, et al (2022) What language model archi- tecture and pretraining objective works best for zero-shot generalization? In: International Conference on Machine Learning, PMLR, pp 22964–22984

2022

[40] [40]

In: International Conference on Learning Representations (ICLR)

Wangchunshu Zhou RKSSLBYLXRDong-Ho Lee (2021) Pre-training text-to-text transformers for concept-centric common sense. In: International Conference on Learning Representations (ICLR)

2021

[41] [41]

West P, Bhagavatula C, Hessel J, et al (2022) Symbolic knowledge distillation: from general language models to commonsense models. In: Carpuat M, de Marn- effe MC, Meza Ruiz IV (eds) Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lan- guage Technologies. Association for Computationa...

work page doi:10.18653/v1/2022.naacl-main.341 2022

[42] [42]

Yasunaga M, Ren H, Bosselut A, et al (2021) Qa-gnn: reasoning with language models and knowledge graphs for question answering. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, Association for Computational Linguistics, Online, pp 535–546

2021

[43] [43]

In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp 2039–2055

Yin D, Bansal H, Monajatipoor M, et al (2022) Geomlama: Geo-diverse com- monsense probing on multilingual pre-trained language models. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp 2039–2055

2022

[44] [44]

Artificial Intelligence 309:103740

Zhang H, Liu X, Pan H, et al (2022) Aser: Towards large-scale commonsense knowledge acquisition via higher-order selectional preference over eventualities. Artificial Intelligence 309:103740

2022

[45] [45]

In: Luo B, Cheng L, Wu ZG, et al (eds) Neural Information Processing

Zhang L, Li R (2024) Knowledge prompting with contrastive learning for unsuper- vised commonsenseqa. In: Luo B, Cheng L, Wu ZG, et al (eds) Neural Information Processing. Springer Nature Singapore, Singapore, pp 27–38

2024

[46] [46]

In: International Conference on Learning Representations (ICLR), URL https://openreview.net/forum?id=SkeHuCVFDr 36

Zhang T, Kishore V, Wu F, et al (2020) Bertscore: Evaluating text generation with bert. In: International Conference on Learning Representations (ICLR), URL https://openreview.net/forum?id=SkeHuCVFDr 36

2020

[47] [47]

In: International Conference on Representation Learning (ICLR)

Zhang X, Bosselut A, Yasunaga M, et al (2022) Greaselm: Graph reasoning enhanced language models for question answering. In: International Conference on Representation Learning (ICLR)

2022

[48] [48]

In: Findings of the North American Chapter of the Associa- tion for Computational Linguistics: NAACL 2022, Association for Computational Linguistics

Zhao WX, Jiang J, Zhou K, et al (2022) Great truths are always simple: A rather simple knowledge encoder for enhancing the commonsense reasoning capacity of pre-trained models. In: Findings of the North American Chapter of the Associa- tion for Computational Linguistics: NAACL 2022, Association for Computational Linguistics

2022

[49] [49]

In: International Conference on Machine Learning, PMLR, pp 12697–12706

Zhao Z, Wallace E, Feng S, et al (2021) Calibrate before use: Improving few- shot performance of language models. In: International Conference on Machine Learning, PMLR, pp 12697–12706

2021

[50] [50]

learning to recall

Zhong Z, Friedman D, Chen D (2021) Factual probing is [mask]: Learning vs. learning to recall. In: Proceedings of the 2021 Conference of the North Ameri- can Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5017–5033

2021

[51] [51]

Zhou X, Zhang Y, Cui L, et al (2020) Evaluating commonsense in pre-trained lan- guage models. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 9733–9740 Appendix A Templates and Tasks A.1 Templates for the relations Table A1 presents a compilation of natural language phrases that have been employed for formatting relation tuples into ...

2020