pith. sign in

arxiv: 2509.13196 · v1 · pith:MESO4A3Unew · submitted 2025-09-16 · 💻 cs.CL

The Few-shot Dilemma: Over-prompting Large Language Models

classification 💻 cs.CL
keywords few-shotexamplesllmsover-promptingperformancedilemmaexcessivelanguage
0
0 comments X
read the original abstract

Over-prompting, a phenomenon where excessive examples in prompts lead to diminished performance in Large Language Models (LLMs), challenges the conventional wisdom about in-context few-shot learning. To investigate this few-shot dilemma, we outline a prompting framework that leverages three standard few-shot selection methods - random sampling, semantic embedding, and TF-IDF vectors - and evaluate these methods across multiple LLMs, including GPT-4o, GPT-3.5-turbo, DeepSeek-V3, Gemma-3, LLaMA-3.1, LLaMA-3.2, and Mistral. Our experimental results reveal that incorporating excessive domain-specific examples into prompts can paradoxically degrade performance in certain LLMs, which contradicts the prior empirical conclusion that more relevant few-shot examples universally benefit LLMs. Given the trend of LLM-assisted software engineering and requirement analysis, we experiment with two real-world software requirement classification datasets. By gradually increasing the number of TF-IDF-selected and stratified few-shot examples, we identify their optimal quantity for each LLM. This combined approach achieves superior performance with fewer examples, avoiding the over-prompting problem, thus surpassing the state-of-the-art by 1% in classifying functional and non-functional requirements.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Sustainability Analysis of Prompt Strategies for SLM-based Automated Test Generation

    cs.SE 2026-04 unverdicted novelty 6.0

    Prompt strategies for SLM-based automated test generation vary widely in energy consumption and carbon emissions, with simpler strategies delivering competitive coverage at markedly lower environmental cost.

  2. Agent4cs: A Multi-agent System for Code Summarization in Large Hierarchical Codebases

    cs.AI 2026-07 unverdicted novelty 5.0

    Agent4cs deploys summarization, keyword-extraction, and quality-assurance agents in a bottom-up pipeline that raises semantic consistency by 8% and normalized keyword coverage by up to 38% over structured prompting ba...

  3. The PICCO Framework for Large Language Model Prompting: A Taxonomy and Reference Architecture for Prompt Structure

    cs.CL 2026-04 accept novelty 5.0

    PICCO is a five-element reference architecture (Persona, Instructions, Context, Constraints, Output) for structuring LLM prompts, derived from synthesizing prior frameworks along with a taxonomy distinguishing prompt ...