hub Canonical reference

A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications

Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, Aman Chadha · 2024 · cs.AI · arXiv 2402.07927

Canonical reference. 82% of citing Pith papers cite this work as background.

66 Pith papers citing it

Background 82% of classified citations

open full Pith review browse 66 citing papers arXiv PDF

abstract

Prompt engineering has emerged as an indispensable technique for extending the capabilities of large language models (LLMs) and vision-language models (VLMs). This approach leverages task-specific instructions, known as prompts, to enhance model efficacy without modifying the core model parameters. Rather than updating the model parameters, prompts allow seamless integration of pre-trained models into downstream tasks by eliciting desired model behaviors solely based on the given prompt. Prompts can be natural language instructions that provide context to guide the model or learned vector representations that activate relevant knowledge. This burgeoning field has enabled success across various applications, from question-answering to commonsense reasoning. However, there remains a lack of systematic organization and understanding of the diverse prompt engineering methods and techniques. This survey paper addresses the gap by providing a structured overview of recent advancements in prompt engineering, categorized by application area. For each prompting approach, we provide a summary detailing the prompting methodology, its applications, the models involved, and the datasets utilized. We also delve into the strengths and limitations of each approach and include a taxonomy diagram and table summarizing datasets, models, and critical points of each prompting technique. This systematic analysis enables a better understanding of this rapidly developing field and facilitates future research by illuminating open challenges and opportunities for prompt engineering.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 10 method 1

citation-polarity summary

background 9 support 1 use method 1

representative citing papers

Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models

cs.CV · 2026-06-04 · unverdicted · novelty 7.0

BloomBench reveals that state-of-the-art VLMs perform well on semantic understanding but struggle with factual recall and creative synthesis, while also showing large English-Arabic performance gaps.

TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data

cs.AI · 2026-04-30 · unverdicted · novelty 7.0

TADI shows that domain-specialized tools orchestrated by an LLM over dual structured and semantic databases can convert heterogeneous wellsite data into evidence-grounded drilling intelligence, with tool design mattering more than model scale.

Incisor: Ex Ante Cloud Instance Selection for HPC Jobs

cs.DC · 2026-04-27 · unverdicted · novelty 7.0

Incisor uses program analysis and frontier LLMs to select working AWS EC2 instances ex ante for 100% of first-time HPC runs of C/C++/Fortran and Python codes, cutting runtime 54% and costs 44% versus an expert-constrained SkyPilot baseline.

Dynamic Cyber Ranges

cs.CR · 2026-04-27 · unverdicted · novelty 7.0

Dynamic Cyber Ranges with LLM defender agents reduce attacker success to 0-55% and preserve evaluation headroom as models advance by using comparable capabilities on both sides.

Atropos: Improving Cost-Benefit Trade-off of LLM-based Agents under Self-Consistency with Early Termination and Model Hotswap

cs.SE · 2026-04-16 · unverdicted · novelty 7.0

Atropos uses GCN on inference graphs for early failure prediction and hotswaps to larger LLMs, achieving 74% of large-model performance at 24% cost.

Human-Centric Topic Modeling with Goal-Prompted Contrastive Learning and Optimal Transport

cs.AI · 2026-04-14 · unverdicted · novelty 7.0

GCTM-OT extracts goal candidates with an LLM, then uses goal-prompted contrastive learning and optimal transport to discover topics that are more coherent, diverse, and aligned with human intent than prior methods on subreddit data.

Figures as Interfaces: Toward LLM-Native Artifacts for Scientific Discovery

cs.HC · 2026-04-09 · unverdicted · novelty 7.0

LLM-native figures embed provenance and enable direct LLM interaction with scientific visualizations to accelerate discovery and improve reproducibility.

RubberDuckBench: A Benchmark for AI Coding Assistants

cs.SE · 2026-01-23 · unverdicted · novelty 7.0

RubberDuckBench shows top AI models score around 68% on real GitHub coding questions, rarely answer completely correctly, and hallucinate in 58% of responses on average.

PIAST: Rapid Prompting with In-context Augmentation for Scarce Training data

cs.CL · 2025-12-11 · conditional · novelty 7.0

PIAST iteratively optimizes few-shot examples in prompts via Monte Carlo Shapley value estimation, outperforming prior automatic prompting methods and setting new SOTA on classification, simplification, and GSM8K with modest compute.

From Task to Tutorial: An Automated GUI Framework for Excel Tutorial Document and Video Creation

cs.SE · 2025-09-26 · unverdicted · novelty 7.0

An AI framework automates Excel tutorial and video creation from task descriptions via an Execution Agent, achieving 8.5% higher task success and 1/20th the authoring time of experts.

The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies

cs.CL · 2025-09-22 · conditional · novelty 7.0

A systematic audit of LLM-based AI societies finds that 89.7% of 39 studies violate at least one of six PIMMUR validity principles, with reproductions showing that many claimed collective behaviors disappear when controls are tightened.

LLM-Orchestrated Conformance Checking in Stroke Care Without Computer-Interpretable Guidelines

cs.AI · 2026-06-08 · unverdicted · novelty 6.0

An LLM-orchestrated framework enables conformance checking in stroke care from unstructured texts, achieving over 86% conformance in hospital data.

CRAFT: Cost-aware Refinement And Front-aware Tuning of Prompts

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

CRAFT is a Pareto-front prompt optimizer that allocates scarce LLM validation calls to candidates near the current front using accuracy- and cost-oriented generators plus NSGA-II retention.

Enhancing Reliability in LLM-Based Secure Code Generation

cs.CR · 2026-05-22 · conditional · novelty 6.0

MA-CoT prompting reduces security findings in LLM-generated code by 57.6% on a 200-task dataset and 94.5% on LLMSecEval across C, Java, and Python, outperforming vanilla, zero-shot, and standard CoT strategies.

Reflective Prompt Tuning through Language Model Function-Calling

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

Reflective Prompt Tuning uses LLM function calling and diagnostic reports to iteratively optimize prompts, yielding up to 12.9 point gains on reasoning tasks while improving calibration.

TCARD: Nearly Balanced Two-Level Designs with Treatment Cardinality Constraints with an Application to LLM Prompt Engineering

stat.ME · 2026-05-20 · unverdicted · novelty 6.0

Proposes nearly balanced TCARDs that minimize the first two generalized word-length pattern components, defines Φ_BCD criterion linked to classical optimality, and constructs designs via coordinate exchange with simulation-calibrated weights for LLM prompt engineering.

Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

Adapting multi-objective pure-exploration bandits enables efficient Pareto prompt set recovery and best feasible prompt identification for LLMs, with linear-case guarantees and empirical gains over baselines.

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.

VISOR: A Vision-Language Model-based Test Oracle for Testing Robots

cs.SE · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

VISOR is a VLM-based automated test oracle that evaluates robot task correctness and quality from videos while reporting its own uncertainty, tested on GPT and Gemini across four tasks and over 1000 videos with Gemini showing higher recall and GPT higher precision but low uncertainty-correctness tie

Black-box model classification under the discriminative factorization

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Discriminative factorization distinguishes high-quality query sets for black-box model classification, with chance-level error decaying exponentially in query budget and parameters predicting empirical decay rates on auditing tasks.

GRaSp: Automatic Example Optimization for In-Context Learning in Low-Data Tasks

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

GRaSp optimizes in-context examples for LLMs via synthetic generation, clustering, dimensionality reduction, and genetic algorithms with diversity-adaptive mutation, reaching 45.84% micro-F1 on financial NER with real data and outperforming zero-shot and random few-shot baselines.

Query-efficient model evaluation using cached responses

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

DKPS-based methods predict new model benchmark scores using cached responses, matching baseline mean absolute error with substantially fewer queries and an offline query selection approach.

PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts

cs.CR · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

PragLocker generates function-preserving but non-portable prompts for LLM agents via code-symbol semantic anchoring followed by target-model feedback noise injection.

Tailored Prompts, Targeted Protection: Vulnerability-Specific LLM Analysis for Smart Contracts

cs.CR · 2026-05-05 · unverdicted · novelty 6.0

An LLM framework with tailored prompts and a new dataset of 31,165 annotated instances achieves 0.92 positive recall and 0.85 negative recall for detecting 13 smart contract vulnerability categories.

citing papers explorer

Showing 16 of 66 citing papers.

PRL: Prompts from Reinforcement Learning cs.AI · 2025-05-20 · unverdicted · none · ref 4 · internal anchor
PRL is a reinforcement learning method that generates novel prompts and achieves state-of-the-art results on text classification, simplification, and summarization benchmarks, outperforming APE and EvoPrompt.
Artificial Intelligence in Number Theory: LLMs for Algorithm Generation and Ensemble Methods for Conjecture Verification math.NT · 2025-04-28 · conditional · none · ref 42 · internal anchor
LLM reaches >=0.95 accuracy on 60 number theory problems with optimal hints; LightGBM classifier empirically supports Dirichlet conductor conjecture via zero features at 93.9% test accuracy for small q.
Improving Language Models with Intentional Analysis cs.CL · 2025-02-07 · unverdicted · none · ref 10 · internal anchor
Intentional Analysis improves language model task performance by explicitly adding intent-aware analysis and reasoning, outperforming Chain-of-Thought and working synergistically with it even on frontier models.
MiCU: End-to-End Smart Home Command Understanding with Large Language Model cs.CL · 2026-05-31 · unverdicted · none · ref 22 · internal anchor
MiCU is a domain-adapted LLM for smart-home command understanding that reports 20% average accuracy gains over baselines and is deployed in the Xiaomi Home app.
LegalGraphRAG: Multi-Agent Graph Retrieval-Augmented Generation for Reliable Legal Reasoning cs.CL · 2026-05-27 · unverdicted · none · ref 8 · internal anchor
LegalGraphRAG adds hierarchical organization to legal knowledge graphs and a multi-agent verification loop to reach claimed state-of-the-art accuracy and trustworthiness on legal reasoning benchmarks.
Hierarchical Prompt-Domain Control and Learning for Resource-Constrained Agentic Language Models cs.AI · 2026-05-26 · unverdicted · none · ref 5 · internal anchor
Hierarchical prompt-domain control framework separates schema distillation from online semantic adaptation in agentic LLMs using an oracle loop, evaluated on a Multi-Fidelity Bayesian Optimization testbed.
Analyzing Chain of Thought (CoT) Approaches in Control Flow Code Deobfuscation Tasks cs.SE · 2026-04-16 · unverdicted · none · ref 16 · internal anchor
CoT prompting improves LLM performance on control-flow deobfuscation of C benchmarks, yielding ~16% better CFG reconstruction and ~20.5% better semantic preservation for GPT5 versus zero-shot prompting.
Combining Static Code Analysis and Large Language Models Improves Correctness and Performance of Algorithm Recognition cs.SE · 2026-04-03 · conditional · none · ref 63 · internal anchor
Hybrid LLM plus static analysis for algorithm recognition in code cuts required model calls by 72-97% and lifts F1-scores by as much as 12 points.
Toward a Safe Internet of Agents cs.MA · 2025-11-29 · unverdicted · none · ref 21 · internal anchor
The paper proposes a bottom-up framework for safe agentic AI systems that treats each component as a dual-use interface where added capabilities also expand attack surfaces across single agents, multi-agent systems, and interoperable ecosystems.
Foundational Design Principles and Patterns for Building Robust and Adaptive GenAI-Native Systems cs.SE · 2025-08-21 · unverdicted · none · ref 50 · internal anchor
Proposes five foundational pillars and architectural patterns for building robust GenAI-native systems by combining AI with software engineering principles.
AI, Meet Human: Learning Paradigms for Hybrid Decision Making Systems cs.LG · 2024-02-09 · unverdicted · none · ref 131 · internal anchor
Proposes a taxonomy of Hybrid Decision Making Systems as a conceptual and technical framework for modeling human-machine interaction in machine learning literature.
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods cs.CL · 2024-12-07 · accept · none · ref 195 · internal anchor
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
BIT.UA-AAUBS at ArchEHR-QA 2026: Evaluating Open-Source and Proprietary LLMs via Prompting in Low-Resource QA cs.CL · 2026-05-05 · unverdicted · none · ref 13 · internal anchor
Prompt-based LLM evaluation without training data secured top rankings in the ArchEHR-QA 2026 shared task on clinical QA.
Bridging Language Models and Financial Analysis q-fin.ST · 2025-03-14 · unverdicted · none · ref 81 · internal anchor
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.
Harness Engineering for Agentic AI Coding Tools: An Exploratory Study cs.SE · 2026-02-16 · unreviewed · ref 21 · internal anchor
From Fragments to Facts: A Curriculum-Driven DPO Approach for Generating Hindi News Veracity Explanations cs.CL · 2025-07-07 · unreviewed · ref 34 · internal anchor

A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer