Supervision versus Demonstration-Based In-Context Learning for Multiword Expression Classification

Sercan Karaka\c{s}; Yusuf \c{S}im\c{s}ek

arxiv: 2606.07479 · v1 · pith:UMBY4Q47new · submitted 2026-06-05 · 💻 cs.CL · cs.AI

Supervision versus Demonstration-Based In-Context Learning for Multiword Expression Classification

Sercan Karaka\c{s} , Yusuf \c{S}im\c{s}ek This is my paper

Pith reviewed 2026-06-27 22:02 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords light verb constructionsmultiword expressionsin-context learningfew-shot promptingTurkish NLPidiomaticity detectionLLM evaluation

0 comments

The pith

Few-shot demonstrations let prompted LLMs match or beat a fine-tuned Turkish BERT on detecting idiomatic light verb constructions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether instruction-tuned large language models can classify Turkish light verb constructions as idiomatic or literal when given zero, one, or few-shot prompts, and compares them against a supervised BERTurk baseline on a controlled set of 147 examples. In zero-shot settings the LLMs detect negatives reliably but miss most LVCs; one-shot prompting boosts recall but introduces model-specific biases; richer few-shot prompts improve calibration and allow GPT-OSS-20B and Qwen 2.5-14B to reach or surpass the supervised baseline on the positive class. The work shows that demonstration choice strongly affects error profiles and that prompt sensitivity remains high even when overall accuracy looks competitive.

Core claim

On a manually balanced test set of 147 Turkish verb-object sentences, few-shot prompting of instruction-tuned LLMs produces LVC detection performance that matches or exceeds a supervised BERTurk classifier, while zero- and one-shot regimes exhibit sharp shifts in recall and bias that are specific to each model family.

What carries the argument

Binary classification of literal versus idiomatic meaning for Turkish verb-object pairs, evaluated across zero-shot, one-shot, and few-shot prompts on a controlled N=147 set containing LVC positives, in-domain literal controls, and out-of-domain random negatives.

If this is right

Carefully chosen demonstrations can shift LLM error profiles from under-prediction to balanced detection on LVCs.
Model-specific biases appear even with a single demonstration and are reduced only when the prompt is enriched with multiple examples.
The supervised Turkish encoder baseline stays competitive overall, but prompted LLMs can exceed it on the idiomatic class under favorable prompting.
Prompt sensitivity is a dominant factor in metalinguistic classification tasks for Turkish.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If demonstration selection proves critical, automatic methods for choosing or generating examples could further close the gap between ICL and supervised approaches.
The same controlled-set design could be applied to other languages with light-verb or other multiword-expression phenomena to test cross-lingual generality.
Error-profile analysis suggests that hybrid systems combining a few-shot LLM with a lightweight supervised check might stabilize performance without full fine-tuning.

Load-bearing premise

The manually constructed set of 147 matched examples is representative enough to support general claims about relative performance and prompt sensitivity.

What would settle it

Evaluating the same models and prompt variants on a larger, independently sampled collection of Turkish LVC and literal verb-object sentences would show whether the reported few-shot gains and calibration improvements persist.

Figures

Figures reproduced from arXiv: 2606.07479 by Sercan Karaka\c{s}, Yusuf \c{S}im\c{s}ek.

**Figure 1.** Figure 1: Flowchart of the experimental process 5 Models We fine-tune BERTurk 32K cased and BERTurk 128K cased (Schweter, 2020) by adding a taskspecific binary classification head over the finallayer [CLS] representation. We split the data 80/20 into train/test with stratified sampling and set hidden and attention dropout to 0.2 to reduce overfitting. Models are trained with learning rate 2×10−5 , batch size 32,… view at source ↗

**Figure 2.** Figure 2: Experiment 2 (one-shot) success rates by con [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Experiment 3 accuracy by condition (few-shot [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Turkish idiomatic light verb constructions (LVCs) are challenging for multiword expression processing because they often share the same surface form as fully literal verb-object combinations while functioning as a single, partially idiomatic predicate. We frame Turkish LVC detection as a binary classification task (literal meaning vs. idiomatic meaning) and evaluate on a manually created controlled set (N=147) with matched negatives: out-of-domain random sentences and in-domain literal controls (NLVC), alongside LVC positives. We compare a supervised Turkish encoder baseline (BERTurk with a classifier head) to three instruction-tuned LLMs from different families under zero-shot, one-shot, and few-shot prompting, and analyze how demonstrations shift error profiles. In zero-shot, LLMs perform well on negatives but show very low LVC recall. One-shot prompting sharply improves LVC detection but can induce strong, model-specific biases, leading models to overpredict or underpredict LVCs. A richer few-shot prompt improves calibration and yields robust overall performance for GPT-OSS-20B and Qwen 2.5-14B. Overall, the results highlight substantial prompt sensitivity in Turkish metalinguistic classification: the supervised baseline remains competitive, while prompted LLMs can match or exceed it on LVCs with carefully constructed demonstrations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Few-shot prompting lets some LLMs match a supervised BERTurk baseline on Turkish LVC detection, but everything rests on 147 hand-curated examples.

read the letter

This paper shows that richer few-shot prompts improve calibration enough for GPT-OSS-20B and Qwen 2.5-14B to reach or beat the supervised baseline on Turkish light verb construction classification, while zero-shot and one-shot regimes produce clear recall or bias problems. The comparison is direct and the error-profile breakdown is useful.

What the work does well is the controlled setup with matched in-domain literal negatives and out-of-domain random sentences. That design makes the prompt-sensitivity results easier to interpret than a standard benchmark split would. The head-to-head across model families and shot counts is also new for this Turkish task.

The soft spot is scale and validation. N=147 is small for claims about robustness or general superiority, and the abstract gives no statistical tests, bootstrap numbers, or external validation set. If the demonstration examples overlap with the test items, the few-shot gains could be partly artifactual. The supervised baseline stays competitive, which is the honest takeaway.

This is mainly for people working on Turkish or other low-resource NLP and on prompt engineering for classification. It deserves peer review because the question is practical and the design is thoughtful, even if reviewers will want more data or clearer separation between prompts and test items.

Referee Report

2 major / 1 minor

Summary. The paper frames Turkish LVC detection as binary classification (literal vs. idiomatic) and compares a supervised BERTurk baseline against instruction-tuned LLMs (including GPT-OSS-20B and Qwen 2.5-14B) under zero-, one-, and few-shot prompting on a manually constructed controlled test set of N=147 examples with matched in-domain (NLVC) and out-of-domain negatives. The central claim is that zero-shot LLMs show low LVC recall, one-shot prompting induces model-specific biases, and richer few-shot prompts improve calibration and allow some LLMs to match or exceed the supervised baseline on LVCs, while underscoring prompt sensitivity.

Significance. If the results hold, the work provides a controlled empirical comparison of supervision versus demonstration-based ICL for a challenging MWE task in Turkish, with useful analysis of how prompting regimes shift error profiles (recall, over/under-prediction). The matched-negative design and cross-family LLM comparison are strengths that could inform prompting strategies for metalinguistic classification in low-resource settings.

major comments (2)

[Abstract / Dataset] Abstract and dataset description: The central claims about prompt robustness, calibration improvements, and LLM superiority over the supervised baseline rest entirely on a single hand-constructed collection of N=147 items that supplies both the 'carefully constructed demonstrations' and the evaluation instances. No statistical significance tests, bootstrap intervals, or external validation set are referenced, so observed differences in recall and error profiles could reflect idiosyncrasies of the chosen sentences rather than general properties of the methods.
[Abstract / Results] Evaluation design (implied in abstract): Because the same small pool is used for both demonstration selection and testing, any reported gains from richer few-shot prompts (e.g., for GPT-OSS-20B and Qwen 2.5-14B) lack an independent test of generalization; this directly affects the load-bearing claim that prompted LLMs 'can match or exceed' the supervised baseline.

minor comments (1)

[Abstract] Model naming: 'GPT-OSS-20B' is non-standard; clarify the exact model identifier and release used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our evaluation design. The controlled nature of the N=147 dataset was chosen to enable precise analysis of prompting effects, but we acknowledge the limitations raised and will revise the manuscript to address them explicitly.

read point-by-point responses

Referee: [Abstract / Dataset] Abstract and dataset description: The central claims about prompt robustness, calibration improvements, and LLM superiority over the supervised baseline rest entirely on a single hand-constructed collection of N=147 items that supplies both the 'carefully constructed demonstrations' and the evaluation instances. No statistical significance tests, bootstrap intervals, or external validation set are referenced, so observed differences in recall and error profiles could reflect idiosyncrasies of the chosen sentences rather than general properties of the methods.

Authors: We agree this is a genuine limitation of the current study. The dataset was deliberately hand-constructed with matched in-domain (NLVC) and out-of-domain negatives to isolate the impact of prompting strategies on LVC detection while controlling for surface-form confounds, which is difficult at larger scale in Turkish. We will add bootstrap confidence intervals and paired significance tests for all reported metrics in the revision. We will also revise the abstract and discussion to frame the work as an analysis of prompt sensitivity in a controlled low-resource setting rather than a general claim of LLM superiority. revision: partial
Referee: [Abstract / Results] Evaluation design (implied in abstract): Because the same small pool is used for both demonstration selection and testing, any reported gains from richer few-shot prompts (e.g., for GPT-OSS-20B and Qwen 2.5-14B) lack an independent test of generalization; this directly affects the load-bearing claim that prompted LLMs 'can match or exceed' the supervised baseline.

Authors: The shared pool is intentional to guarantee that demonstrations are high-quality, balanced examples of the exact phenomena under study, enabling direct comparison of zero-, one-, and few-shot regimes on identical items. We accept that this design precludes strong generalization claims. In the revision we will remove or qualify the phrasing 'can match or exceed' in the abstract, add an explicit limitations paragraph on the lack of held-out test data, and suggest larger independent corpora as future work. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison on fixed test set

full rationale

The paper reports direct experimental results comparing a supervised BERTurk baseline against zero/one/few-shot prompting of instruction-tuned LLMs on a manually constructed N=147 Turkish LVC classification dataset. No equations, derivations, fitted parameters presented as predictions, or load-bearing self-citations appear in the provided text. All claims derive from observed metrics (recall, calibration, error profiles) on the held-out examples rather than any self-referential reduction. The small dataset size raises external-validity concerns but does not constitute circularity under the defined criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical study; no mathematical axioms, free parameters, or invented entities are invoked. The central claim rests on the representativeness of the N=147 dataset and the fairness of the prompting setups.

pith-pipeline@v0.9.1-grok · 5770 in / 1073 out tokens · 23082 ms · 2026-06-27T22:02:09.462671+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

192 extracted references · 38 canonical work pages · 2 internal anchors

[1]

Proceedings of the Sixth Conference on Computational Natural Language Learning (CoNLL 2002) , pages =

Baldwin, Timothy and Villavicencio, Aline , title =. Proceedings of the Sixth Conference on Computational Natural Language Learning (CoNLL 2002) , pages =. 2002 , url =

2002
[2]

1997 , url =

Karttunen, Lauri and Gaal, Tamas and Kempe, Andre , title =. 1997 , url =

1997
[3]

2015 , url =

Ramisch, Carlos , title =. 2015 , url =

2015
[5]

Advances in Neural Information Processing Systems , volume =

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning , author =. Advances in Neural Information Processing Systems , volume =. 2022 , url =

2022
[6]

Language Models for Text Classification: Is In-Context Learning Enough?

Edwards, Aleksandra and Camacho-Collados, Jose. Language Models for Text Classification: Is In-Context Learning Enough?. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024

2024
[7]

Fine-Tuned

Bucher, Martin Juan Jos. Fine-Tuned. 2024 , eprint =

2024
[8]

2024 , eprint =

Fine-Tuning, Prompting, In-Context Learning and Instruction-Tuning: How Many Labelled Samples Do We Need? , author =. 2024 , eprint =

2024
[9]

Proceedings of the 38th International Conference on Machine Learning , pages =

Calibrate Before Use: Improving Few-shot Performance of Language Models , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

2021
[13]

Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLM s

Hua, Andong and Tang, Kenan and Gu, Chenhe and Gu, Jindong and Wong, Eric and Qin, Yao. Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLM s. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1006

work page doi:10.18653/v1/2025.emnlp-main.1006 2025
[14]

Making Pre-trained Language Models Better Few-shot Learners

Gao, Tianyu and Fisch, Adam and Chen, Danqi. Making Pre-trained Language Models Better Few-shot Learners. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021. doi:10.18653/v1/2021.acl-long.295

work page doi:10.18653/v1/2021.acl-long.295 2021
[16]

Language Teaching , year =

Siyanova-Chanturia, Anna and Sonbul, Suhad , title =. Language Teaching , year =
[17]

, title =

Arnon, Inbal and Clark, Eve V. , title =. Language Learning and Development , volume =. 2011 , doi =

2011
[18]

2020 , howpublished =

Schweter, Stefan , title =. 2020 , howpublished =

2020
[19]

Sentence Encoders on

Phang, Jason and F. Sentence Encoders on. 2018 , journal =

2018
[20]

, title =

Pruksachatkun, Yada and Phang, Jason and Liu, Haokun and Htut, Phu Mon and Zhang, Xiaoyi and Pang, Richard Yuanzhe and Vania, Clara and Kann, Katharina and Bowman, Samuel R. , title =. 2020 , journal =

2020
[22]

Journal of Language and Linguistic Studies , volume =

On. Journal of Language and Linguistic Studies , volume =. 2010 , url =

2010
[24]

Universal Dependencies v1: A Multilingual Treebank Collection , booktitle =

Nivre, Joakim and de Marneffe, Marie. Universal Dependencies v1: A Multilingual Treebank Collection , booktitle =. 2016 , url =

2016
[25]

Universal Dependencies:
[26]

Enhancements to the

Mar. Enhancements to the. arXiv preprint arXiv:2207.11782 , year =

arXiv
[27]

Resources for

T. Resources for. 2022 , month = mar, journal =

2022
[28]

Building the

Mar. Building the. Proceedings of the 11th Global Wordnet Conference , month = jan, year =. doi:10.18653/v1/2021.gwc-1.14 , pages =

work page doi:10.18653/v1/2021.gwc-1.14 2021
[29]

Proceedings of the 14th Workshop on Treebanks and Linguistic Theories (TLT 14) , year =

A grammar-book treebank of Turkish , author =. Proceedings of the 14th Workshop on Treebanks and Linguistic Theories (TLT 14) , year =
[30]

Turkish Journal of Electrical Engineering & Computer Sciences , year =

Implementing Universal Dependency, Morphology and Multiword Expression Annotation Standards for Turkish Language Processing , author =. Turkish Journal of Electrical Engineering & Computer Sciences , year =
[31]

Proceedings of

Sulubacak, Umut and Gokirmak, Memduh and Tyers, Francis and. Proceedings of. 2016 , address =

2016
[32]

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , year =

Zeman, Daniel and Popel, Martin and Straka, Milan and Haji. Proceedings of the. 2017 , address =. doi:10.18653/v1/K17-3001 , pages =

work page doi:10.18653/v1/k17-3001 2017
[33]

Mechanistic Interpretability for

Bereska, Leonard and Gavves, Efstratios , journal =. Mechanistic Interpretability for. 2024 , url =

2024
[34]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , author =. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

2022
[36]

Ali and Fincan, Ali Arda and G

Bayram, M. Ali and Fincan, Ali Arda and G. Tokenization Standards and Evaluation in Natural Language Processing: A Comparative Analysis of Large Language Models on. 2025 33rd Signal Processing and Communications Applications Conference (. 2025 , month = jun, pages =

2025
[37]

2021 , eprint =

Calibrate Before Use: Improving Few-Shot Performance of Language Models , author =. 2021 , eprint =

2021
[38]

Journal of Machine Learning Research , year =

Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability , author =. Journal of Machine Learning Research , year =
[39]

Chi and Quoc V

Jason Wei and Xuezhi Wang and Dale Schuurmans and Maarten Bosma and Brian Ichter and Fei Xia and Ed H. Chi and Quoc V. Le and Denny Zhou , title =. Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022 (. 2022 , url =

2022
[40]

Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022 (

Takeshi Kojima and Shixiang Shane Gu and Machel Reid and Yutaka Matsuo and Yusuke Iwasawa , title =. Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022 (

2022
[41]

Le and Ed H

Xuezhi Wang and Jason Wei and Dale Schuurmans and Quoc V. Le and Ed H. Chi and Sharan Narang and Aakanksha Chowdhery and Denny Zhou , title =. The Eleventh International Conference on Learning Representations (. 2023 , url =

2023
[42]

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , booktitle =

Denny Zhou and Nathanael Sch. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , booktitle =. 2023 , url =

2023
[43]

arXiv preprint arXiv:2112.00114 , year =

Maxwell Nye and Anders Johan Andreassen and Guy Gur-Ari and Henryk Michalewski and Jacob Austin and David Bieber and David Dohan and Aitor Lewkowycz and Maarten Bosma and David Luan and Charles Sutton and Augustus Odena , title =. arXiv preprint arXiv:2112.00114 , year =

Pith/arXiv arXiv
[44]

Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection , booktitle =

Nivre, Joakim and de Marneffe, Marie. Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection , booktitle =. 2020 , url =

2020
[45]

Siyanova-Chanturia, Anna and Conklin, Kathy and van Heuven, Walter J. B. , title =. Journal of Experimental Psychology: Learning, Memory, and Cognition , volume =. 2011 , doi =

2011
[47]

ACM Computing Surveys , year=

Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing , author=. ACM Computing Surveys , year=
[48]

2024 , eprint=

In-context language learning: Architectures and algorithms , author=. 2024 , eprint=

2024
[49]

2021 , eprint=

Calibrate Before Use: Improving Few-shot Performance of Language Models , author=. 2021 , eprint=

2021
[50]

2022 , eprint=

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , author=. 2022 , eprint=

2022
[51]

Beyond Accuracy: Behavioral Testing of

Ribeiro, Marco Tulio and Wu, Tongshuang and Guestrin, Carlos and Singh, Sameer , booktitle =. Beyond Accuracy: Behavioral Testing of. 2020 , address =

2020
[52]

Findings of the Association for Computational Linguistics:

Evaluating Models' Local Decision Boundaries via Contrast Sets , author =. Findings of the Association for Computational Linguistics:. 2020 , address =

2020
[55]

Proceedings of the 29th International Conference on Computational Linguistics , month = oct, year =

Yang, Guanqun and Haque, Mirazul and Song, Qiaochu and Yang, Wei and Liu, Xueqing , editor =. Proceedings of the 29th International Conference on Computational Linguistics , month = oct, year =
[58]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , month = nov, year =

Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations , author =. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , month = nov, year =

2025
[59]

and Delaney, Eoin D

Mayne, Harry and Kearns, Ryan Othniel and Yang, Yushi and Bean, Andrew M. and Delaney, Eoin D. and Russell, Chris and Mahdi, Adam , booktitle =. 2025 , address =

2025
[61]

A Structural Probe for Finding Syntax in Word Representations , author =. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , month = jun, year =

2019
[62]

Transactions of the Association for Computational Linguistics , year=

A Primer in BERTology: What We Know About How BERT Works , author=. Transactions of the Association for Computational Linguistics , year=
[63]

Integrating Morphology with Multi-word Expression Processing in

Oflazer, Kemal and. Integrating Morphology with Multi-word Expression Processing in. Proceedings of the Workshop on Multiword Expressions: Integrating Processing , month = jul, year =
[64]

Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages , month = oct, year =

Multiword Expressions in Statistical Dependency Parsing , author =. Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages , month = oct, year =
[66]

Enhancing the

Öztürk, Yağmur and Hadj Mohamed, Najet and Lion-Bouton, Adam and Savary, Agata , editor =. Enhancing the. Proceedings of the 18th Workshop on Multiword Expressions @LREC2022 , month = jun, year =
[67]

Proceedings of the Sixth Conference on Computational Natural Language Learning (CoNLL 2002) , year =

Extracting the Unextractable: A Case Study on Verb-Particles , author =. Proceedings of the Sixth Conference on Computational Natural Language Learning (CoNLL 2002) , year =

2002
[68]

Xerox Finite-State Tool , author =
[69]

Language Teaching , year =

The Processing of Multi-word Expressions: A Research Agenda for the Next 10 Years , author =. Language Teaching , year =
[70]

Journal of Experimental Psychology: Learning, Memory, and Cognition , volume =

Seeing a Phrase ``Time and Again'' Matters: The Role of Phrasal Frequency in the Processing of Multiword Sequences , author =. Journal of Experimental Psychology: Learning, Memory, and Cognition , volume =. 2011 , pages =

2011
[71]

Cognition , volume =

`Clap your hands' or `take your hands'? One-year-olds distinguish between frequent and infrequent multiword phrases , author =. Cognition , volume =. 2021 , pages =

2021
[72]

The Function of Word Order in Turkish Grammar , year =

Erguvanl. The Function of Word Order in Turkish Grammar , year =
[73]

Word Order and Scrambling , editor =

Kornfilt, Jaklin , title =. Word Order and Scrambling , editor =. 2003 , publisher =

2003
[74]

Linguistic Inquiry , year =

Kural, Murat , title =. Linguistic Inquiry , year =
[75]

1992 , note =

Kural, Murat , title =. 1992 , note =

1992
[76]

2003 , volume =

Information Structure in Turkish: The Word Order--Prosody Interface , journal =. 2003 , volume =

2003
[77]

2006 , number =

Bare Object NPs and Scrambling in Turkish , journal =. 2006 , number =

2006
[78]

Natural Language & Linguistic Theory , year =

Massam, Diane , title =. Natural Language & Linguistic Theory , year =
[79]

Case, Referentiality and Phrase Structure , year =
[80]

2009 , volume =

Incorporating Agents , journal =. 2009 , volume =

2009
[81]

Linguistic Inquiry , year =

Grimshaw, Jane and Mester, Armin , title =. Linguistic Inquiry , year =
[82]

Complex Predicates: Cross-linguistic Perspectives on Event Structure , editor =

Butt, Miriam , title =. Complex Predicates: Cross-linguistic Perspectives on Event Structure , editor =. 2010 , doi =

2010
[83]

Light Verb Constructions in

U. Light Verb Constructions in. Dil ve Edebiyat Dergisi / Journal of Linguistics and Literature , year =
[84]

The Function of Word Order in Turkish Grammar , publisher =

Erguvanl. The Function of Word Order in Turkish Grammar , publisher =. 1984 , series =

1984
[85]

Word Order and Scrambling , editor =

Kornfilt, Jaklin , title =. Word Order and Scrambling , editor =. 2003 , pages =

2003
[86]

2003 , volume =

Information structure in Turkish: the word order--prosody interface , journal =. 2003 , volume =

2003
[87]

Case, Referentiality and Phrase Structure , publisher =

\". Case, Referentiality and Phrase Structure , publisher =. 2005 , series =

2005
[88]

Proceedings of the Sixth Conference of the European Chapter of the Association for Computational Linguistics (EACL) , year =

Oflazer, Kemal , title =. Proceedings of the Sixth Conference of the European Chapter of the Association for Computational Linguistics (EACL) , year =
[90]

and Baldwin, Timothy and Bond, Francis and Copestake, Ann and Flickinger, Dan , title =

Sag, Ivan A. and Baldwin, Timothy and Bond, Francis and Copestake, Ann and Flickinger, Dan , title =. Computational Linguistics and Intelligent Text Processing (CICLing) , year =
[91]

Handbook of Natural Language Processing , edition =

Baldwin, Timothy and Kim, Su Nam , title =. Handbook of Natural Language Processing , edition =. 2010 , url =

2010
[92]

Complex Predicates: Cross-Linguistic Perspectives on Event Structure , editor =

Butt, Miriam , title =. Complex Predicates: Cross-Linguistic Perspectives on Event Structure , editor =
[93]

Advances in Neural Information Processing Systems , volume =

Language Models are Few-Shot Learners , author =. Advances in Neural Information Processing Systems , volume =. 2020 , publisher =

2020
[96]

Integrating Morphology with Multi-Word Expression Processing in T urkish

Oflazer, Kemal and C etino g lu, \"O zlem and Say, Bilge. Integrating Morphology with Multi-Word Expression Processing in T urkish. Proceedings of the Workshop on Multiword Expressions: Integrating Processing. 2004

2004
[97]

Proceedings of the LREC 2022

Enhancing the T urkish Verbal Multiword Expressions Corpus. Proceedings of the LREC 2022. 2022

2022
[98]

and Baldwin, Timothy and Bond, Francis and Copestake, Ann and Flickinger, Dan

Sag, Ivan A. and Baldwin, Timothy and Bond, Francis and Copestake, Ann and Flickinger, Dan. Multiword Expressions: A Pain in the Neck for NLP. Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLing). 2002

2002

Showing first 80 references.

[1] [1]

Proceedings of the Sixth Conference on Computational Natural Language Learning (CoNLL 2002) , pages =

Baldwin, Timothy and Villavicencio, Aline , title =. Proceedings of the Sixth Conference on Computational Natural Language Learning (CoNLL 2002) , pages =. 2002 , url =

2002

[2] [2]

1997 , url =

Karttunen, Lauri and Gaal, Tamas and Kempe, Andre , title =. 1997 , url =

1997

[3] [3]

2015 , url =

Ramisch, Carlos , title =. 2015 , url =

2015

[4] [5]

Advances in Neural Information Processing Systems , volume =

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning , author =. Advances in Neural Information Processing Systems , volume =. 2022 , url =

2022

[5] [6]

Language Models for Text Classification: Is In-Context Learning Enough?

Edwards, Aleksandra and Camacho-Collados, Jose. Language Models for Text Classification: Is In-Context Learning Enough?. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024

2024

[6] [7]

Fine-Tuned

Bucher, Martin Juan Jos. Fine-Tuned. 2024 , eprint =

2024

[7] [8]

2024 , eprint =

Fine-Tuning, Prompting, In-Context Learning and Instruction-Tuning: How Many Labelled Samples Do We Need? , author =. 2024 , eprint =

2024

[8] [9]

Proceedings of the 38th International Conference on Machine Learning , pages =

Calibrate Before Use: Improving Few-shot Performance of Language Models , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

2021

[9] [13]

Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLM s

Hua, Andong and Tang, Kenan and Gu, Chenhe and Gu, Jindong and Wong, Eric and Qin, Yao. Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLM s. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1006

work page doi:10.18653/v1/2025.emnlp-main.1006 2025

[10] [14]

Making Pre-trained Language Models Better Few-shot Learners

Gao, Tianyu and Fisch, Adam and Chen, Danqi. Making Pre-trained Language Models Better Few-shot Learners. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. 2021. doi:10.18653/v1/2021.acl-long.295

work page doi:10.18653/v1/2021.acl-long.295 2021

[11] [16]

Language Teaching , year =

Siyanova-Chanturia, Anna and Sonbul, Suhad , title =. Language Teaching , year =

[12] [17]

, title =

Arnon, Inbal and Clark, Eve V. , title =. Language Learning and Development , volume =. 2011 , doi =

2011

[13] [18]

2020 , howpublished =

Schweter, Stefan , title =. 2020 , howpublished =

2020

[14] [19]

Sentence Encoders on

Phang, Jason and F. Sentence Encoders on. 2018 , journal =

2018

[15] [20]

, title =

Pruksachatkun, Yada and Phang, Jason and Liu, Haokun and Htut, Phu Mon and Zhang, Xiaoyi and Pang, Richard Yuanzhe and Vania, Clara and Kann, Katharina and Bowman, Samuel R. , title =. 2020 , journal =

2020

[16] [22]

Journal of Language and Linguistic Studies , volume =

On. Journal of Language and Linguistic Studies , volume =. 2010 , url =

2010

[17] [24]

Universal Dependencies v1: A Multilingual Treebank Collection , booktitle =

Nivre, Joakim and de Marneffe, Marie. Universal Dependencies v1: A Multilingual Treebank Collection , booktitle =. 2016 , url =

2016

[18] [25]

Universal Dependencies:

[19] [26]

Enhancements to the

Mar. Enhancements to the. arXiv preprint arXiv:2207.11782 , year =

arXiv

[20] [27]

Resources for

T. Resources for. 2022 , month = mar, journal =

2022

[21] [28]

Building the

Mar. Building the. Proceedings of the 11th Global Wordnet Conference , month = jan, year =. doi:10.18653/v1/2021.gwc-1.14 , pages =

work page doi:10.18653/v1/2021.gwc-1.14 2021

[22] [29]

Proceedings of the 14th Workshop on Treebanks and Linguistic Theories (TLT 14) , year =

A grammar-book treebank of Turkish , author =. Proceedings of the 14th Workshop on Treebanks and Linguistic Theories (TLT 14) , year =

[23] [30]

Turkish Journal of Electrical Engineering & Computer Sciences , year =

Implementing Universal Dependency, Morphology and Multiword Expression Annotation Standards for Turkish Language Processing , author =. Turkish Journal of Electrical Engineering & Computer Sciences , year =

[24] [31]

Proceedings of

Sulubacak, Umut and Gokirmak, Memduh and Tyers, Francis and. Proceedings of. 2016 , address =

2016

[25] [32]

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , year =

Zeman, Daniel and Popel, Martin and Straka, Milan and Haji. Proceedings of the. 2017 , address =. doi:10.18653/v1/K17-3001 , pages =

work page doi:10.18653/v1/k17-3001 2017

[26] [33]

Mechanistic Interpretability for

Bereska, Leonard and Gavves, Efstratios , journal =. Mechanistic Interpretability for. 2024 , url =

2024

[27] [34]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , author =. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

2022

[28] [36]

Ali and Fincan, Ali Arda and G

Bayram, M. Ali and Fincan, Ali Arda and G. Tokenization Standards and Evaluation in Natural Language Processing: A Comparative Analysis of Large Language Models on. 2025 33rd Signal Processing and Communications Applications Conference (. 2025 , month = jun, pages =

2025

[29] [37]

2021 , eprint =

Calibrate Before Use: Improving Few-Shot Performance of Language Models , author =. 2021 , eprint =

2021

[30] [38]

Journal of Machine Learning Research , year =

Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability , author =. Journal of Machine Learning Research , year =

[31] [39]

Chi and Quoc V

Jason Wei and Xuezhi Wang and Dale Schuurmans and Maarten Bosma and Brian Ichter and Fei Xia and Ed H. Chi and Quoc V. Le and Denny Zhou , title =. Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022 (. 2022 , url =

2022

[32] [40]

Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022 (

Takeshi Kojima and Shixiang Shane Gu and Machel Reid and Yutaka Matsuo and Yusuke Iwasawa , title =. Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022 (

2022

[33] [41]

Le and Ed H

Xuezhi Wang and Jason Wei and Dale Schuurmans and Quoc V. Le and Ed H. Chi and Sharan Narang and Aakanksha Chowdhery and Denny Zhou , title =. The Eleventh International Conference on Learning Representations (. 2023 , url =

2023

[34] [42]

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , booktitle =

Denny Zhou and Nathanael Sch. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , booktitle =. 2023 , url =

2023

[35] [43]

arXiv preprint arXiv:2112.00114 , year =

Maxwell Nye and Anders Johan Andreassen and Guy Gur-Ari and Henryk Michalewski and Jacob Austin and David Bieber and David Dohan and Aitor Lewkowycz and Maarten Bosma and David Luan and Charles Sutton and Augustus Odena , title =. arXiv preprint arXiv:2112.00114 , year =

Pith/arXiv arXiv

[36] [44]

Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection , booktitle =

Nivre, Joakim and de Marneffe, Marie. Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection , booktitle =. 2020 , url =

2020

[37] [45]

Siyanova-Chanturia, Anna and Conklin, Kathy and van Heuven, Walter J. B. , title =. Journal of Experimental Psychology: Learning, Memory, and Cognition , volume =. 2011 , doi =

2011

[38] [47]

ACM Computing Surveys , year=

Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing , author=. ACM Computing Surveys , year=

[39] [48]

2024 , eprint=

In-context language learning: Architectures and algorithms , author=. 2024 , eprint=

2024

[40] [49]

2021 , eprint=

Calibrate Before Use: Improving Few-shot Performance of Language Models , author=. 2021 , eprint=

2021

[41] [50]

2022 , eprint=

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , author=. 2022 , eprint=

2022

[42] [51]

Beyond Accuracy: Behavioral Testing of

Ribeiro, Marco Tulio and Wu, Tongshuang and Guestrin, Carlos and Singh, Sameer , booktitle =. Beyond Accuracy: Behavioral Testing of. 2020 , address =

2020

[43] [52]

Findings of the Association for Computational Linguistics:

Evaluating Models' Local Decision Boundaries via Contrast Sets , author =. Findings of the Association for Computational Linguistics:. 2020 , address =

2020

[44] [55]

Proceedings of the 29th International Conference on Computational Linguistics , month = oct, year =

Yang, Guanqun and Haque, Mirazul and Song, Qiaochu and Yang, Wei and Liu, Xueqing , editor =. Proceedings of the 29th International Conference on Computational Linguistics , month = oct, year =

[45] [58]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , month = nov, year =

Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations , author =. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , month = nov, year =

2025

[46] [59]

and Delaney, Eoin D

Mayne, Harry and Kearns, Ryan Othniel and Yang, Yushi and Bean, Andrew M. and Delaney, Eoin D. and Russell, Chris and Mahdi, Adam , booktitle =. 2025 , address =

2025

[47] [61]

A Structural Probe for Finding Syntax in Word Representations , author =. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , month = jun, year =

2019

[48] [62]

Transactions of the Association for Computational Linguistics , year=

A Primer in BERTology: What We Know About How BERT Works , author=. Transactions of the Association for Computational Linguistics , year=

[49] [63]

Integrating Morphology with Multi-word Expression Processing in

Oflazer, Kemal and. Integrating Morphology with Multi-word Expression Processing in. Proceedings of the Workshop on Multiword Expressions: Integrating Processing , month = jul, year =

[50] [64]

Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages , month = oct, year =

Multiword Expressions in Statistical Dependency Parsing , author =. Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages , month = oct, year =

[51] [66]

Enhancing the

Öztürk, Yağmur and Hadj Mohamed, Najet and Lion-Bouton, Adam and Savary, Agata , editor =. Enhancing the. Proceedings of the 18th Workshop on Multiword Expressions @LREC2022 , month = jun, year =

[52] [67]

Proceedings of the Sixth Conference on Computational Natural Language Learning (CoNLL 2002) , year =

Extracting the Unextractable: A Case Study on Verb-Particles , author =. Proceedings of the Sixth Conference on Computational Natural Language Learning (CoNLL 2002) , year =

2002

[53] [68]

Xerox Finite-State Tool , author =

[54] [69]

Language Teaching , year =

The Processing of Multi-word Expressions: A Research Agenda for the Next 10 Years , author =. Language Teaching , year =

[55] [70]

Journal of Experimental Psychology: Learning, Memory, and Cognition , volume =

Seeing a Phrase ``Time and Again'' Matters: The Role of Phrasal Frequency in the Processing of Multiword Sequences , author =. Journal of Experimental Psychology: Learning, Memory, and Cognition , volume =. 2011 , pages =

2011

[56] [71]

Cognition , volume =

`Clap your hands' or `take your hands'? One-year-olds distinguish between frequent and infrequent multiword phrases , author =. Cognition , volume =. 2021 , pages =

2021

[57] [72]

The Function of Word Order in Turkish Grammar , year =

Erguvanl. The Function of Word Order in Turkish Grammar , year =

[58] [73]

Word Order and Scrambling , editor =

Kornfilt, Jaklin , title =. Word Order and Scrambling , editor =. 2003 , publisher =

2003

[59] [74]

Linguistic Inquiry , year =

Kural, Murat , title =. Linguistic Inquiry , year =

[60] [75]

1992 , note =

Kural, Murat , title =. 1992 , note =

1992

[61] [76]

2003 , volume =

Information Structure in Turkish: The Word Order--Prosody Interface , journal =. 2003 , volume =

2003

[62] [77]

2006 , number =

Bare Object NPs and Scrambling in Turkish , journal =. 2006 , number =

2006

[63] [78]

Natural Language & Linguistic Theory , year =

Massam, Diane , title =. Natural Language & Linguistic Theory , year =

[64] [79]

Case, Referentiality and Phrase Structure , year =

[65] [80]

2009 , volume =

Incorporating Agents , journal =. 2009 , volume =

2009

[66] [81]

Linguistic Inquiry , year =

Grimshaw, Jane and Mester, Armin , title =. Linguistic Inquiry , year =

[67] [82]

Complex Predicates: Cross-linguistic Perspectives on Event Structure , editor =

Butt, Miriam , title =. Complex Predicates: Cross-linguistic Perspectives on Event Structure , editor =. 2010 , doi =

2010

[68] [83]

Light Verb Constructions in

U. Light Verb Constructions in. Dil ve Edebiyat Dergisi / Journal of Linguistics and Literature , year =

[69] [84]

The Function of Word Order in Turkish Grammar , publisher =

Erguvanl. The Function of Word Order in Turkish Grammar , publisher =. 1984 , series =

1984

[70] [85]

Word Order and Scrambling , editor =

Kornfilt, Jaklin , title =. Word Order and Scrambling , editor =. 2003 , pages =

2003

[71] [86]

2003 , volume =

Information structure in Turkish: the word order--prosody interface , journal =. 2003 , volume =

2003

[72] [87]

Case, Referentiality and Phrase Structure , publisher =

\". Case, Referentiality and Phrase Structure , publisher =. 2005 , series =

2005

[73] [88]

Proceedings of the Sixth Conference of the European Chapter of the Association for Computational Linguistics (EACL) , year =

Oflazer, Kemal , title =. Proceedings of the Sixth Conference of the European Chapter of the Association for Computational Linguistics (EACL) , year =

[74] [90]

and Baldwin, Timothy and Bond, Francis and Copestake, Ann and Flickinger, Dan , title =

Sag, Ivan A. and Baldwin, Timothy and Bond, Francis and Copestake, Ann and Flickinger, Dan , title =. Computational Linguistics and Intelligent Text Processing (CICLing) , year =

[75] [91]

Handbook of Natural Language Processing , edition =

Baldwin, Timothy and Kim, Su Nam , title =. Handbook of Natural Language Processing , edition =. 2010 , url =

2010

[76] [92]

Complex Predicates: Cross-Linguistic Perspectives on Event Structure , editor =

Butt, Miriam , title =. Complex Predicates: Cross-Linguistic Perspectives on Event Structure , editor =

[77] [93]

Advances in Neural Information Processing Systems , volume =

Language Models are Few-Shot Learners , author =. Advances in Neural Information Processing Systems , volume =. 2020 , publisher =

2020

[78] [96]

Integrating Morphology with Multi-Word Expression Processing in T urkish

Oflazer, Kemal and C etino g lu, \"O zlem and Say, Bilge. Integrating Morphology with Multi-Word Expression Processing in T urkish. Proceedings of the Workshop on Multiword Expressions: Integrating Processing. 2004

2004

[79] [97]

Proceedings of the LREC 2022

Enhancing the T urkish Verbal Multiword Expressions Corpus. Proceedings of the LREC 2022. 2022

2022

[80] [98]

and Baldwin, Timothy and Bond, Francis and Copestake, Ann and Flickinger, Dan

Sag, Ivan A. and Baldwin, Timothy and Bond, Francis and Copestake, Ann and Flickinger, Dan. Multiword Expressions: A Pain in the Neck for NLP. Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLing). 2002

2002