Empirical study of 41k+ AI agent skills finds reuse is mostly one-time verbatim copying with 53% never modified afterward and maintenance focused on additive local adaptations.
hub
A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT
49 Pith papers cite this work. Polarity classification is still indexing.
abstract
Prompt engineering is an increasingly important skill set needed to converse effectively with large language models (LLMs), such as ChatGPT. Prompts are instructions given to an LLM to enforce rules, automate processes, and ensure specific qualities (and quantities) of generated output. Prompts are also a form of programming that can customize the outputs and interactions with an LLM. This paper describes a catalog of prompt engineering techniques presented in pattern form that have been applied to solve common problems when conversing with LLMs. Prompt patterns are a knowledge transfer method analogous to software patterns since they provide reusable solutions to common problems faced in a particular context, i.e., output generation and interaction when working with LLMs. This paper provides the following contributions to research on prompt engineering that apply LLMs to automate software development tasks. First, it provides a framework for documenting patterns for structuring prompts to solve a range of problems so that they can be adapted to different domains. Second, it presents a catalog of patterns that have been applied successfully to improve the outputs of LLM conversations. Third, it explains how prompts can be built from multiple patterns and illustrates prompt patterns that benefit from combination with other prompt patterns.
hub tools
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
NAVI-Orbital performed the first claimed in-orbit demonstration of a zero-shot vision-language model for autonomous Earth observation using Gemma 3 on April 16, 2026.
Introduces the Grounded Observer framework that applies robotics-inspired formal constructs for runtime constraint enforcement on foundation model interaction trajectories in socially sensitive domains.
CA-SQL achieves 51.72% execution accuracy on the challenging tier of the BIRD benchmark using GPT-4o-mini by scaling exploration breadth according to estimated task difficulty, evolutionary prompt seeding, and candidate voting.
Structurally rich task descriptions make LLMs robust to prompt under-specification, and under-specification can enhance code correctness by disrupting misleading lexical or structural cues.
LLM-native figures embed provenance and enable direct LLM interaction with scientific visualizations to accelerate discovery and improve reproducibility.
AI coding agents perform vibe architecting by making prompt-driven architectural choices that produce structurally different systems for identical tasks.
Users treat human delegation for long tasks as a flexible compass but AI delegation as rigid railway tracks due to perceived AI limitations in inference and judgment.
Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.
A multi-LLM council scores predictive processing papers on an expert ontology, maps results in 3D hypothesis space, and introduces a dispersion metric showing greater spread in global versus local oddball paradigms.
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.
Open-weight LLMs reach 81-91% success generating formally verified Dafny code for complex algorithmic problems when given structural signatures and self-healing verifier feedback.
The paper systematizes agentic skills beyond tool use, providing design pattern and representation-scope taxonomies plus security analysis of malicious skill infiltration in agent marketplaces.
LLMs diverge from human goal selection in self-directed learning by exploiting single solutions with low variability across instances.
GPT-4o identified only 21.2% of the usability issues found by human experts in heuristic evaluation, while discovering 27 additional issues and exhibiting difficulties with certain heuristics and generating false positives.
Empirical analysis of 338 PRs with self-admitted ChatGPT usage shows low full integration (median 25%), selective adaptation patterns, and broader influence on developer reasoning during reviews.
UVM^2 is an LLM-driven system that generates and refines UVM testbenches for RTL verification, reporting up to substantial time savings and average code/function coverage of 87.44%/89.58% on designs up to 1.6K lines, outperforming prior methods.
LLMs are highly sensitive to prompt formatting in few-shot settings, with accuracy varying by up to 76 points across formats; FormatSpread samples formats to report performance intervals without model weights.
AC3S adds a self-supervised visual prompt modulator to ControlNet diffusion and a multi-agent VLM prompt composer to generate photorealistic images with accurate 2D/3D annotations while avoiding over-conditioning.
A taxonomy that consolidates prompt patterns from prior surveys into 30 unique canonical forms organized by two dimensions.
UniD³ applies KG-RAG with Llama 3.3-70B to build six knowledge graphs and generate large validated datasets for drug-disease matching, effectiveness assessment, and target analysis from biomedical literature.
Introduces Augment Engineering as a six-phase multi-tool orchestration methodology, supported by exploratory statistics from a single-practitioner case study across seven domains.
A participatory design study with two K-12 students iteratively refined a generative AI Python tutor toward Socratic questioning, reflection prompts, and incremental hints, with preliminary observations of better clarity and engagement when combined with human guidance.
Hermes uses multi-agent LLMs to detect 2450 documentation and REST smells across 600 OpenAPI endpoints, demonstrating that structurally valid microservice APIs are often not semantically ready for agent consumption.
citing papers explorer
-
From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification
UVM^2 is an LLM-driven system that generates and refines UVM testbenches for RTL verification, reporting up to substantial time savings and average code/function coverage of 87.44%/89.58% on designs up to 1.6K lines, outperforming prior methods.