hub

A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications

Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, Aman Chadha · 2024 · cs.AI · arXiv 2402.07927

28 Pith papers cite this work. Polarity classification is still indexing.

28 Pith papers citing it

open full Pith review browse 28 citing papers arXiv PDF

abstract

Prompt engineering has emerged as an indispensable technique for extending the capabilities of large language models (LLMs) and vision-language models (VLMs). This approach leverages task-specific instructions, known as prompts, to enhance model efficacy without modifying the core model parameters. Rather than updating the model parameters, prompts allow seamless integration of pre-trained models into downstream tasks by eliciting desired model behaviors solely based on the given prompt. Prompts can be natural language instructions that provide context to guide the model or learned vector representations that activate relevant knowledge. This burgeoning field has enabled success across various applications, from question-answering to commonsense reasoning. However, there remains a lack of systematic organization and understanding of the diverse prompt engineering methods and techniques. This survey paper addresses the gap by providing a structured overview of recent advancements in prompt engineering, categorized by application area. For each prompting approach, we provide a summary detailing the prompting methodology, its applications, the models involved, and the datasets utilized. We also delve into the strengths and limitations of each approach and include a taxonomy diagram and table summarizing datasets, models, and critical points of each prompting technique. This systematic analysis enables a better understanding of this rapidly developing field and facilitates future research by illuminating open challenges and opportunities for prompt engineering.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts

cs.CR · 2026-05-07 · unverdicted · novelty 7.0

PragLocker protects agent prompts as IP by building non-portable obfuscated versions that function only on the intended LLM through code-symbol semantic anchoring followed by target-model feedback noise injection.

TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data

cs.AI · 2026-04-30 · unverdicted · novelty 7.0

TADI shows that domain-specialized tools orchestrated by an LLM over dual structured and semantic databases can convert heterogeneous wellsite data into evidence-grounded drilling intelligence, with tool design mattering more than model scale.

Incisor: Ex Ante Cloud Instance Selection for HPC Jobs

cs.DC · 2026-04-27 · unverdicted · novelty 7.0

Incisor uses program analysis and frontier LLMs to select working AWS EC2 instances ex ante for 100% of first-time HPC runs of C/C++/Fortran and Python codes, cutting runtime 54% and costs 44% versus an expert-constrained SkyPilot baseline.

Dynamic Cyber Ranges

cs.CR · 2026-04-27 · unverdicted · novelty 7.0

Dynamic Cyber Ranges with LLM defender agents reduce attacker success to 0-55% and preserve evaluation headroom as models advance by using comparable capabilities on both sides.

Atropos: Improving Cost-Benefit Trade-off of LLM-based Agents under Self-Consistency with Early Termination and Model Hotswap

cs.SE · 2026-04-16 · unverdicted · novelty 7.0

Atropos uses GCN on inference graphs for early failure prediction and hotswaps to larger LLMs, achieving 74% of large-model performance at 24% cost.

Human-Centric Topic Modeling with Goal-Prompted Contrastive Learning and Optimal Transport

cs.AI · 2026-04-14 · unverdicted · novelty 7.0

GCTM-OT extracts goal candidates with an LLM, then uses goal-prompted contrastive learning and optimal transport to discover topics that are more coherent, diverse, and aligned with human intent than prior methods on subreddit data.

Figures as Interfaces: Toward LLM-Native Artifacts for Scientific Discovery

cs.HC · 2026-04-09 · unverdicted · novelty 7.0

LLM-native figures embed provenance and enable direct LLM interaction with scientific visualizations to accelerate discovery and improve reproducibility.

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.

VISOR: A Vision-Language Model-based Test Oracle for Testing Robot

cs.SE · 2026-05-11 · unverdicted · novelty 6.0

VISOR applies VLMs to automate robot test oracles for correctness and quality assessment while reporting uncertainty, with evaluation on GPT and Gemini showing trade-offs in precision and recall but poor uncertainty calibration.

Black-box model classification under the discriminative factorization

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Discriminative factorization distinguishes high-quality query sets for black-box model classification, with chance-level error decaying exponentially in query budget and parameters predicting empirical decay rates on auditing tasks.

GRaSp: Automatic Example Optimization for In-Context Learning in Low-Data Tasks

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

GRaSp optimizes in-context examples for LLMs via synthetic generation, clustering, dimensionality reduction, and genetic algorithms with diversity-adaptive mutation, reaching 45.84% micro-F1 on financial NER with real data and outperforming zero-shot and random few-shot baselines.

Tailored Prompts, Targeted Protection: Vulnerability-Specific LLM Analysis for Smart Contracts

cs.CR · 2026-05-05 · unverdicted · novelty 6.0

An LLM framework with tailored prompts and a new dataset of 31,165 annotated instances achieves 0.92 positive recall and 0.85 negative recall for detecting 13 smart contract vulnerability categories.

Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study

cs.SE · 2026-04-27 · unverdicted · novelty 6.0

Fine-tuning 7B code LLMs on a custom multi-file DSL dataset achieves structural fidelity of 1.00, high exact-match accuracy, and practical utility validated by expert survey and execution checks.

Understanding the Mechanism of Altruism in Large Language Models

econ.GN · 2026-04-21 · unverdicted · novelty 6.0

A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.

From Craft to Kernel: A Governance-First Execution Architecture and Semantic ISA for Agentic Computers

cs.CR · 2026-04-20 · unverdicted · novelty 6.0

Arbiter-K is a new execution architecture that treats LLMs as probabilistic processors inside a neuro-symbolic kernel with a semantic ISA to enable deterministic security enforcement and unsafe trajectory interdiction in agentic AI.

When LLMs Lag Behind: Knowledge Conflicts from Evolving APIs in Code Generation

cs.SE · 2026-04-10 · unverdicted · novelty 6.0

LLMs produce executable code only 42.55% of the time under API evolution without full documentation, improving to 66.36% with structured docs and by 11% more with reasoning strategies, yet outdated patterns persist.

Beyond Single Reports: Evaluating Automated ATT&CK Technique Extraction in Multi-Report Campaign Settings

cs.SE · 2026-04-08 · unverdicted · novelty 6.0

Aggregating multiple CTI reports improves ATT&CK technique extraction F1 by about 26 percent over single-report baselines, with saturation after 5-15 reports and maximum F1 scores of 78.6 percent and 54.9 percent across the tested campaigns.

Context-Value-Action Architecture for Value-Driven Large Language Model Agents

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

The Context-Value-Action architecture decouples reasoning from action in LLM agents via a human-data-trained Value Verifier, mitigating polarization and outperforming prompt-based methods on a large real-world benchmark.

VIP-COP: Context Optimization for Tabular Foundation Models

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

VIP-COP is a black-box method that optimizes context for tabular foundation models by ranking and selecting high-value samples and features via online KernelSHAP regression, outperforming baselines on large high-dimensional data.

User Reviews as a Source for Usability Requirements: A Precursor Study on Using Large Language Models

cs.SE · 2026-05-12 · conditional · novelty 5.0

LLMs can detect usability content in user reviews with F-scores comparable to humans, though performance depends strongly on prompt design.

Jailbreaking Large Language Models with Morality Attacks

cs.CL · 2026-04-18 · unverdicted · novelty 5.0

Morality-specific jailbreak attacks expose critical vulnerabilities in both large language models and guardrail systems when handling pluralistic values.

Cross-Lingual Attention Distillation with Personality-Informed Generative Augmentation for Multilingual Personality Recognition

cs.CL · 2026-04-10 · unverdicted · novelty 5.0

ADAM uses personality-guided LLM augmentation and cross-lingual attention distillation to raise balanced accuracy on multilingual personality recognition to 0.6332 on Essays and 0.7448 on Kaggle, outperforming standard BCE loss.

From Incomplete Architecture to Quantified Risk: Multimodal LLM-Driven Security Assessment for Cyber-Physical Systems

cs.CR · 2026-04-07 · unverdicted · novelty 5.0

ASTRAL applies multimodal LLMs with prompt chaining and few-shot learning to synthesize CPS architectures from disparate sources, enabling adaptive threat identification and quantitative risk estimation, as supported by ablation studies and feedback from 14 cybersecurity practitioners.

The PICCO Framework for Large Language Model Prompting: A Taxonomy and Reference Architecture for Prompt Structure

cs.CL · 2026-04-03 · accept · novelty 5.0

PICCO is a five-element reference architecture (Persona, Instructions, Context, Constraints, Output) for structuring LLM prompts, derived from synthesizing prior frameworks along with a taxonomy distinguishing prompt concepts.

citing papers explorer

Showing 28 of 28 citing papers.

PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts cs.CR · 2026-05-07 · unverdicted · none · ref 102 · internal anchor
PragLocker protects agent prompts as IP by building non-portable obfuscated versions that function only on the intended LLM through code-symbol semantic anchoring followed by target-model feedback noise injection.
TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data cs.AI · 2026-04-30 · unverdicted · none · ref 39 · internal anchor
TADI shows that domain-specialized tools orchestrated by an LLM over dual structured and semantic databases can convert heterogeneous wellsite data into evidence-grounded drilling intelligence, with tool design mattering more than model scale.
Incisor: Ex Ante Cloud Instance Selection for HPC Jobs cs.DC · 2026-04-27 · unverdicted · none · ref 37 · internal anchor
Incisor uses program analysis and frontier LLMs to select working AWS EC2 instances ex ante for 100% of first-time HPC runs of C/C++/Fortran and Python codes, cutting runtime 54% and costs 44% versus an expert-constrained SkyPilot baseline.
Dynamic Cyber Ranges cs.CR · 2026-04-27 · unverdicted · none · ref 64 · internal anchor
Dynamic Cyber Ranges with LLM defender agents reduce attacker success to 0-55% and preserve evaluation headroom as models advance by using comparable capabilities on both sides.
Atropos: Improving Cost-Benefit Trade-off of LLM-based Agents under Self-Consistency with Early Termination and Model Hotswap cs.SE · 2026-04-16 · unverdicted · none · ref 13 · internal anchor
Atropos uses GCN on inference graphs for early failure prediction and hotswaps to larger LLMs, achieving 74% of large-model performance at 24% cost.
Human-Centric Topic Modeling with Goal-Prompted Contrastive Learning and Optimal Transport cs.AI · 2026-04-14 · unverdicted · none · ref 33 · internal anchor
GCTM-OT extracts goal candidates with an LLM, then uses goal-prompted contrastive learning and optimal transport to discover topics that are more coherent, diverse, and aligned with human intent than prior methods on subreddit data.
Figures as Interfaces: Toward LLM-Native Artifacts for Scientific Discovery cs.HC · 2026-04-09 · unverdicted · none · ref 95 · internal anchor
LLM-native figures embed provenance and enable direct LLM interaction with scientific visualizations to accelerate discovery and improve reproducibility.
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space cs.CL · 2026-05-12 · unverdicted · none · ref 145 · internal anchor
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.
VISOR: A Vision-Language Model-based Test Oracle for Testing Robot cs.SE · 2026-05-11 · unverdicted · none · ref 50 · internal anchor
VISOR applies VLMs to automate robot test oracles for correctness and quality assessment while reporting uncertainty, with evaluation on GPT and Gemini showing trade-offs in precision and recall but poor uncertainty calibration.
Black-box model classification under the discriminative factorization cs.LG · 2026-05-08 · unverdicted · none · ref 32 · internal anchor
Discriminative factorization distinguishes high-quality query sets for black-box model classification, with chance-level error decaying exponentially in query budget and parameters predicting empirical decay rates on auditing tasks.
GRaSp: Automatic Example Optimization for In-Context Learning in Low-Data Tasks cs.CL · 2026-05-08 · unverdicted · none · ref 10 · internal anchor
GRaSp optimizes in-context examples for LLMs via synthetic generation, clustering, dimensionality reduction, and genetic algorithms with diversity-adaptive mutation, reaching 45.84% micro-F1 on financial NER with real data and outperforming zero-shot and random few-shot baselines.
Tailored Prompts, Targeted Protection: Vulnerability-Specific LLM Analysis for Smart Contracts cs.CR · 2026-05-05 · unverdicted · none · ref 28 · internal anchor
An LLM framework with tailored prompts and a new dataset of 31,165 annotated instances achieves 0.92 positive recall and 0.85 negative recall for detecting 13 smart contract vulnerability categories.
Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study cs.SE · 2026-04-27 · unverdicted · none · ref 36 · internal anchor
Fine-tuning 7B code LLMs on a custom multi-file DSL dataset achieves structural fidelity of 1.00, high exact-match accuracy, and practical utility validated by expert survey and execution checks.
Understanding the Mechanism of Altruism in Large Language Models econ.GN · 2026-04-21 · unverdicted · none · ref 193 · internal anchor
A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.
From Craft to Kernel: A Governance-First Execution Architecture and Semantic ISA for Agentic Computers cs.CR · 2026-04-20 · unverdicted · none · ref 17 · internal anchor
Arbiter-K is a new execution architecture that treats LLMs as probabilistic processors inside a neuro-symbolic kernel with a semantic ISA to enable deterministic security enforcement and unsafe trajectory interdiction in agentic AI.
When LLMs Lag Behind: Knowledge Conflicts from Evolving APIs in Code Generation cs.SE · 2026-04-10 · unverdicted · none · ref 39 · internal anchor
LLMs produce executable code only 42.55% of the time under API evolution without full documentation, improving to 66.36% with structured docs and by 11% more with reasoning strategies, yet outdated patterns persist.
Beyond Single Reports: Evaluating Automated ATT&CK Technique Extraction in Multi-Report Campaign Settings cs.SE · 2026-04-08 · unverdicted · none · ref 61 · internal anchor
Aggregating multiple CTI reports improves ATT&CK technique extraction F1 by about 26 percent over single-report baselines, with saturation after 5-15 reports and maximum F1 scores of 78.6 percent and 54.9 percent across the tested campaigns.
Context-Value-Action Architecture for Value-Driven Large Language Model Agents cs.AI · 2026-04-07 · unverdicted · none · ref 3 · internal anchor
The Context-Value-Action architecture decouples reasoning from action in LLM agents via a human-data-trained Value Verifier, mitigating polarization and outperforming prompt-based methods on a large real-world benchmark.
VIP-COP: Context Optimization for Tabular Foundation Models cs.LG · 2026-05-13 · unverdicted · none · ref 31 · internal anchor
VIP-COP is a black-box method that optimizes context for tabular foundation models by ranking and selecting high-value samples and features via online KernelSHAP regression, outperforming baselines on large high-dimensional data.
User Reviews as a Source for Usability Requirements: A Precursor Study on Using Large Language Models cs.SE · 2026-05-12 · conditional · none · ref 16 · internal anchor
LLMs can detect usability content in user reviews with F-scores comparable to humans, though performance depends strongly on prompt design.
Jailbreaking Large Language Models with Morality Attacks cs.CL · 2026-04-18 · unverdicted · none · ref 4 · internal anchor
Morality-specific jailbreak attacks expose critical vulnerabilities in both large language models and guardrail systems when handling pluralistic values.
Cross-Lingual Attention Distillation with Personality-Informed Generative Augmentation for Multilingual Personality Recognition cs.CL · 2026-04-10 · unverdicted · none · ref 14 · internal anchor
ADAM uses personality-guided LLM augmentation and cross-lingual attention distillation to raise balanced accuracy on multilingual personality recognition to 0.6332 on Essays and 0.7448 on Kaggle, outperforming standard BCE loss.
From Incomplete Architecture to Quantified Risk: Multimodal LLM-Driven Security Assessment for Cyber-Physical Systems cs.CR · 2026-04-07 · unverdicted · none · ref 37 · internal anchor
ASTRAL applies multimodal LLMs with prompt chaining and few-shot learning to synthesize CPS architectures from disparate sources, enabling adaptive threat identification and quantitative risk estimation, as supported by ablation studies and feedback from 14 cybersecurity practitioners.
The PICCO Framework for Large Language Model Prompting: A Taxonomy and Reference Architecture for Prompt Structure cs.CL · 2026-04-03 · accept · none · ref 11 · internal anchor
PICCO is a five-element reference architecture (Persona, Instructions, Context, Constraints, Output) for structuring LLM prompts, derived from synthesizing prior frameworks along with a taxonomy distinguishing prompt concepts.
Analyzing Chain of Thought (CoT) Approaches in Control Flow Code Deobfuscation Tasks cs.SE · 2026-04-16 · unverdicted · none · ref 16 · internal anchor
CoT prompting improves LLM performance on control-flow deobfuscation of C benchmarks, yielding ~16% better CFG reconstruction and ~20.5% better semantic preservation for GPT5 versus zero-shot prompting.
Combining Static Code Analysis and Large Language Models Improves Correctness and Performance of Algorithm Recognition cs.SE · 2026-04-03 · conditional · none · ref 63 · internal anchor
Hybrid LLM plus static analysis for algorithm recognition in code cuts required model calls by 72-97% and lifts F1-scores by as much as 12 points.
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods cs.CL · 2024-12-07 · accept · none · ref 195 · internal anchor
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
BIT.UA-AAUBS at ArchEHR-QA 2026: Evaluating Open-Source and Proprietary LLMs via Prompting in Low-Resource QA cs.CL · 2026-05-05 · unverdicted · none · ref 13 · internal anchor
Prompt-based LLM evaluation without training data secured top rankings in the ArchEHR-QA 2026 shared task on clinical QA.

A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer