arxiv: 2604.14197 · v1 · submitted 2026-04-03 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

The PICCO Framework for Large Language Model Prompting: A Taxonomy and Reference Architecture for Prompt Structure

David A. Cook

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:56 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords prompt engineeringlarge language modelsreference architecturetaxonomyPICCOprompt structureLLM prompting

0 comments

The pith

PICCO provides a five-element reference architecture for structuring prompts to large language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper synthesizes eleven prior prompting frameworks into a taxonomy that separates prompt frameworks, elements, generation, techniques, and engineering as distinct concepts. From this synthesis it extracts a reference architecture called PICCO that breaks prompt construction into Persona, Instructions, Context, Constraints, and Output. The stated purpose is to replace inconsistent ad-hoc prompt writing with a shared structure that makes design decisions explicit and comparable. A reader would care because clearer organization of prompts can reduce trial-and-error and produce more predictable behavior from the same models.

Core claim

The analysis yields a taxonomy distinguishing prompt frameworks from prompt elements, prompt generation, prompting techniques, and prompt engineering. It then derives a five-element reference architecture for prompt generation: Persona, Instructions, Context, Constraints, and Output. For each element the paper defines function, scope, and interrelationships, with the explicit goal of improving conceptual clarity and supporting systematic prompt design without claiming empirical performance gains.

What carries the argument

The PICCO reference architecture, which decomposes prompt generation into five named elements—Persona, Instructions, Context, Constraints, and Output—to supply a common structure for specification and comparison.

If this is right

Prompts become describable and comparable using a shared five-part vocabulary rather than free-form text.
Each element can be refined independently during iterative prompt engineering.
Standard techniques such as zero-shot, few-shot, chain-of-thought, and self-critique map onto specific PICCO slots.
Responsible prompting practices around bias, privacy, and security can be applied element by element.
Future work can extend the architecture to new domains while preserving the same five-part skeleton.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Automated prompt generators could be built to populate each PICCO slot from a task description.
The structure might reveal gaps when applied to multimodal or agentic prompts that current frameworks overlook.
Teams could adopt PICCO as an internal standard to reduce variance in prompt quality across different engineers.
Security reviews could focus on the Constraints element to surface hidden risks more systematically.

Load-bearing premise

A synthesis of eleven published prompting frameworks is sufficient to produce a general reference architecture that improves clarity for all users without requiring separate empirical validation.

What would settle it

A controlled study that measures output consistency or task success rates for prompts written with explicit PICCO elements versus unstructured prompts of similar length would directly test whether the architecture delivers the claimed clarity.

read the original abstract

Large language model (LLM) performance depends heavily on prompt design, yet prompt construction is often described and applied inconsistently. Our purpose was to derive a reference framework for structuring LLM prompts. This paper presents PICCO, a framework derived through a rigorous synthesis of 11 previously published prompting frameworks identified through a multi-database search. The analysis yields two main contributions. First, it proposes a taxonomy that distinguishes prompt frameworks, prompt elements, prompt generation, prompting techniques, and prompt engineering as related but non-equivalent concepts. Second, it derives a five-element reference architecture for prompt generation: Persona, Instructions, Context, Constraints, and Output (PICCO). For each element, we define its function, scope, and relationship to other elements, with the goal of improving conceptual clarity and supporting more systematic prompt design. Finally, to support application of the framework, we outline key concepts relevant to implementation, including prompting techniques (e.g., zero-shot, few-shot, chain-of-thought, ensembling, decomposition, and self-critique, with selected variants), human and automated approaches to iterative prompt engineering, responsible prompting considerations such as security, privacy, bias, and trust, and priorities for future research. This work is a conceptual and methodological contribution: it formalizes a common structure for prompt specification and comparison, but does not claim empirical validation of PICCO as an optimization method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper synthesizes 11 prior prompting frameworks into the PICCO five-element architecture and a taxonomy of related concepts, offering a clean organizational tool but no empirical tests.

read the letter

Hi, the main takeaway from this paper is the PICCO reference architecture derived from synthesizing 11 prior prompting frameworks. It breaks prompts into Persona, Instructions, Context, Constraints, and Output, and pairs that with a taxonomy that treats frameworks, elements, generation, techniques, and engineering as distinct ideas. This is a new synthesis rather than a brand new concept. The paper does well in providing precise definitions and relationships for each part of PICCO. The method section explains how the frameworks were identified and combined, which adds credibility to the derivation. Including sections on techniques and ethical issues rounds it out nicely for practical use. Soft spots include the complete lack of empirical validation. We have no data on whether this structure actually helps in designing better prompts or reduces inconsistency. The selection of only 11 frameworks might not be exhaustive, potentially overlooking important variations in the literature. Readers who would get value are those studying or teaching prompt design in LLMs. It offers a common language that could make papers more comparable. If your work involves building prompt tools or frameworks, this might be a useful reference point. My recommendation is to put it through peer review. It's a coherent conceptual piece that deserves input from the community, even if it will need some expansion to show real utility.

Referee Report

1 major / 3 minor

Summary. The paper proposes the PICCO framework for structuring prompts in large language models. It derives a taxonomy that differentiates prompt frameworks, elements, generation, techniques, and engineering as related but non-equivalent concepts. From a synthesis of 11 prior frameworks identified via multi-database search, it presents a five-element reference architecture: Persona, Instructions, Context, Constraints, and Output (PICCO), defining each element's function, scope, and interrelationships. The work also outlines implementation concepts including prompting techniques (zero-shot, few-shot, chain-of-thought, etc.), iterative human/automated prompt engineering, responsible considerations (security, privacy, bias), and future research priorities, explicitly positioning the contribution as conceptual without empirical validation of performance gains.

Significance. If the synthesis holds, the taxonomy and PICCO architecture would provide a valuable standardized reference for prompt specification and comparison in the LLM literature, where terminology remains inconsistent. The explicit scoping as non-empirical synthesis, combined with the transparent derivation from prior frameworks, supports its utility for systematic prompt design and future empirical work. This is a methodological contribution that formalizes common structure without overclaiming optimization results.

major comments (1)

[Methods] Methods section: The multi-database search and selection process for the 11 frameworks is described at a high level; to substantiate the central claim of a 'rigorous synthesis' yielding the PICCO architecture, explicit inclusion/exclusion criteria, search strings, and the mapping procedure from source elements to the five PICCO components should be provided (e.g., in a supplementary table or appendix).

minor comments (3)

[Figure 1] Figure 1 (taxonomy diagram): The visual relationships among prompt frameworks, elements, generation, techniques, and engineering would benefit from explicit edge labels or a legend to clarify distinctions.
[Section 4] Section 4 (PICCO elements): While definitions are provided, adding one concrete prompt example per element (or a combined example) would improve accessibility without altering the conceptual scope.
[Discussion] Discussion of prompting techniques: The list of techniques (zero-shot, few-shot, chain-of-thought, ensembling, etc.) is useful but would be strengthened by a brief comparison table of their alignment with specific PICCO elements.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment and constructive suggestion. We agree that greater transparency in the methods will strengthen the claim of rigorous synthesis and will revise the manuscript to include the requested details.

read point-by-point responses

Referee: [Methods] Methods section: The multi-database search and selection process for the 11 frameworks is described at a high level; to substantiate the central claim of a 'rigorous synthesis' yielding the PICCO architecture, explicit inclusion/exclusion criteria, search strings, and the mapping procedure from source elements to the five PICCO components should be provided (e.g., in a supplementary table or appendix).

Authors: We accept this point. In the revised manuscript we will expand the Methods section to report the exact search strings used across the databases, the full inclusion/exclusion criteria applied to candidate frameworks, and a supplementary table that maps each source framework's elements to the five PICCO components, including the rationale for any consolidation or re-labeling decisions. These additions will be placed in a new Appendix A and referenced from the main text. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a non-empirical conceptual synthesis that identifies 11 external prompting frameworks via multi-database search and integrates them into a taxonomy plus the PICCO reference architecture (Persona, Instructions, Context, Constraints, Output). No equations, fitted parameters, or derivations are present. The central claims rest on transparent aggregation of prior published work by other authors; no self-citation chains, self-definitional loops, or renamings that reduce the output to the paper's own inputs occur. The work explicitly disclaims empirical validation and presents the result as an organizational contribution, rendering the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the selected 11 frameworks adequately represent the space of prompting approaches to support a general reference architecture.

axioms (1)

domain assumption The 11 previously published prompting frameworks identified through a multi-database search represent a sufficient basis for deriving a general reference architecture.
Invoked to justify the PICCO elements as a reference structure applicable beyond the sampled frameworks.

pith-pipeline@v0.9.0 · 5546 in / 1250 out tokens · 40539 ms · 2026-05-13T19:56:52.390737+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

derives a five-element reference architecture for prompt generation: Persona, Instructions, Context, Constraints, and Output (PICCO)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

taxonomy that distinguishes prompt frameworks, prompt elements, prompt generation, prompting techniques, and prompt engineering

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 10 internal anchors

[1]

Macy Foundation Innovation Report Part I: Current Landscape of Artificial Intelligence in Medical Education

Boscardin CK, Abdulnour RE, Gin BC. Macy Foundation Innovation Report Part I: Current Landscape of Artificial Intelligence in Medical Education. Acad Med. 2025;100:S15-s21

work page 2025
[2]

Foundation

Josiah Macy Jr. Foundation. Josiah Macy Jr. Foundation Conference on Artificial Intelligence in Medical Education: Proceedings and Recommendations. Acad Med. 2025;100:S4-s14

work page 2025
[3]

The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

Schulhoff S, Ilie M, Balepur N, Kahadze K, Liu A, Si C, et al. The Prompt Report: A Systematic Survey of Prompt Engineering Techniques. arXiv e-prints. 2024:arXiv:2406.06608

work page internal anchor Pith review arXiv 2024
[4]

The AI Workshop: Your Complete Beginner’s Guide to AI Prompts: An A-Z Guide to AI Prompt Engineering for Life, Work, and Business: Funtacular Books; 2025

Foster M. The AI Workshop: Your Complete Beginner’s Guide to AI Prompts: An A-Z Guide to AI Prompt Engineering for Life, Work, and Business: Funtacular Books; 2025

work page 2025
[5]

AI Prompt Engineering Bible: The Author; 2025

Dylik T . AI Prompt Engineering Bible: The Author; 2025

work page 2025
[6]

Prompt Engineering for Generative AI: Future-Proof Inputs for Reliable AI Outputs: O'Reilly Media; 2024

Phoenix J, Taylor M. Prompt Engineering for Generative AI: Future-Proof Inputs for Reliable AI Outputs: O'Reilly Media; 2024

work page 2024
[7]

Prompt Engineering Playbook

GovTech Data Science & AI Division. Prompt Engineering Playbook. Singapore: Singapore Government Developer Portal; 2023

work page 2023
[8]

The RICECO Prompt Formula: The Simple Framework to 10x Your AI Results

Anh M. The RICECO Prompt Formula: The Simple Framework to 10x Your AI Results. Produced by AI Fire; 2025. Available at: www.aifire.co/p/the-riceco-prompting-framework- a-guide-to-a-better-ai-prompt. Accessed 15 Nov 2025

work page 2025
[9]

LearnPrompting: Basic Prompt Structure and Key Parts

Kuka V. LearnPrompting: Basic Prompt Structure and Key Parts. Produced by Learn Prompting; 2025. Available at: https://learnprompting.org/docs/basics/prompt_structure. Accessed 3 December 2025

work page 2025
[10]

Prompt Engineering Paradigms for Medical Applications: Scoping Review

Zaghir J, Naguib M, Bjelogrlic M, Névéol A, Tannier X, Lovis C. Prompt Engineering Paradigms for Medical Applications: Scoping Review. J Med Internet Res. 2024;26:e60501

work page 2024
[11]

A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications

Sahoo P , Singh AK, Saha S, Jain V , Mondal S, Chadha A. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. arXiv e-prints. 2024:arXiv:2402.07927

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Prompt engineering in higher education: a systematic review to help inform curricula

Lee D, Palmer E. Prompt engineering in higher education: a systematic review to help inform curricula. International Journal of Educational Technology in Higher Education. 2025;22:7

work page 2025
[13]

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

White J, Fu Q, Hays S, Sandborn M, Olea C, Gilbert H, et al. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT . arXiv e-prints. 2023:arXiv:2302.11382

work page internal anchor Pith review arXiv 2023
[14]

Prompting Frameworks for Large Language Models: A Survey

Liu X, Wang J, Sun J, Yuan X, Dong G, Di P , et al. Prompting Frameworks for Large Language Models: A Survey. arXiv e-prints. 2023:arXiv:2311.12785. 24

work page arXiv 2023
[15]

Prompt Engineering in Clinical Practice: Tutorial for Clinicians

Liu J, Liu F , Wang C, Liu S. Prompt Engineering in Clinical Practice: Tutorial for Clinicians. J Med Internet Res. 2025;27:e72644

work page 2025
[16]

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Zhao C, Tan Z, Ma P , Li D, Jiang B, Wang Y, et al. Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens. arXiv. 2025:arXiv:2508.01191

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

Lost in the Middle: How Language Models Use Long Contexts

Liu NF , Lin K, Hewitt J, Paranjape A, Bevilacqua M, Petroni F , et al. Lost in the Middle: How Language Models Use Long Contexts. arXiv. 2023:arXiv:2307.03172

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

Prompting Science Report 1: Prompt Engineering is Complicated and Contingent

Meincke L, Mollick E, Mollick L, Shapiro D. Prompting Science Report 1: Prompt Engineering is Complicated and Contingent. arXiv. 2025:arXiv:2503.04818

work page arXiv 2025
[19]

Serial Position Effects of Large Language Models

Guo X, Vosoughi S. Serial Position Effects of Large Language Models. arXiv. 2024:arXiv:2406.15981

work page arXiv 2024
[20]

Quantifying language models’ sensitivity to spurious features in prompt design.arXiv preprint arXiv:2310.11324, 2023

Sclar M, Choi Y, Tsvetkov Y, Suhr A. Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting. arXiv. 2023:arXiv:2310.11324

work page arXiv 2023
[21]

Self-Consistency Falls Short! The Adverse Effects of Positional Bias on Long-Context Problems

Byerly A, Khashabi D. Self-Consistency Falls Short! The Adverse Effects of Positional Bias on Long-Context Problems. arXiv. 2024:arXiv:2411.01101

work page arXiv 2024
[22]

Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning

Cobbina K, Zhou T . Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning. arXiv. 2025:arXiv:2507.22887

work page arXiv 2025
[23]

Do LLMs "know" internally when they follow instructions? arXiv

Heo J, Heinze-Deml C, Elachqar O, Chan KHR, Ren S, Nallasamy U, et al. Do LLMs "know" internally when they follow instructions? arXiv. 2024:arXiv:2410.14516

work page arXiv 2024
[24]

What Makes a Good Order of Examples in In-Context Learning

Guo Q, Wang L, Wang Y, Ye W, Zhang S. What Makes a Good Order of Examples in In-Context Learning. Findings of the Association for Computational Linguistics: ACL 2024. 2024:14892- 14904

work page 2024
[25]

OptiSeq: Ordering Examples On-The-Fly for In-Context Learning

Bhope RA, Venkateswaran P , Jayaram KR, Isahagian V, Muthusamy V, Venkatasubramanian N. OptiSeq: Ordering Examples On-The-Fly for In-Context Learning. arXiv. 2025:arXiv:2501.15030

work page arXiv 2025
[26]

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Hsieh C-Y, Chuang Y-S, Li C-L, Wang Z, Le LT , Kumar A, et al. Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization. arXiv. 2024:arXiv:2406.16008

work page arXiv 2024
[27]

Eliminating Position Bias of Language Models: A Mechanistic Approach

Wang Z, Zhang H, Li X, Huang K-H, Han C, Ji S, et al. Eliminating Position Bias of Language Models: A Mechanistic Approach. arXiv. 2024:arXiv:2407.01100

work page arXiv 2024
[28]

5C Prompt Contracts: A Minimalist, Creative-Friendly, Token-Efficient Design Framework for Individual and SME LLM Usage

Ari U. 5C Prompt Contracts: A Minimalist, Creative-Friendly, Token-Efficient Design Framework for Individual and SME LLM Usage. ArXiv. 2025:arXiv:2507.07045

work page arXiv 2025
[29]

Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models

Murthy R, Zhu M, Yang L, Qiu J, Tan J, Heinecke S, et al. Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models. ArXiv. 2025:arXiv:2507.14241

work page arXiv 2025
[30]

LangGPT: Rethinking Structured Reusable Prompt Design Framework for LLMs from the Programming Language

Wang M, Liu Y, Liang X, Li S, Huang Y, Zhang X, et al. LangGPT: Rethinking Structured Reusable Prompt Design Framework for LLMs from the Programming Language. ArXiv. 2024:arXiv:2402.16929

work page arXiv 2024
[31]

CRISPE - ChatGPT Prompt Engineering Framework

Dinkevych D. CRISPE - ChatGPT Prompt Engineering Framework. Produced by Medium

work page
[32]

Accessed 15 Nov 2025

Available at: https://sourcingdenis.medium.com/crispe-prompt-engineering- framework-e47eaaf83611. Accessed 15 Nov 2025

work page 2025
[33]

How to use ChatGPT properly using the RISEN framework (Youtube video)

Balmer K. How to use ChatGPT properly using the RISEN framework (Youtube video). 2024. Available at: www.youtube.com/shorts/kkQKF5Zonw8. Accessed 15 Nov 2025. 25

work page 2024
[34]

How to turn any prompt into a super prompt (Youtube video)

Hutson K. How to turn any prompt into a super prompt (Youtube video). Produced by Futurepedia; 2025. Available at: www.youtube.com/watch?v=X7YjqKk-7Y0. Accessed 15 Nov 2025

work page 2025
[35]

The Prompt Engineering Life Cycle, Using Analytics with AI

Penn C. The Prompt Engineering Life Cycle, Using Analytics with AI. Produced by Trust Insights; 2024. Available at: www.trustinsights.ai/blog/2024/04/inbox-insights-april-17- 2024-the-prompt-engineering-life-cycle-using-analytics-with-ai/. Accessed 15 Nov 2025

work page 2024
[36]

How to turn any prompt into a super prompt

Kremb M. How to turn any prompt into a super prompt. Produced by The Prompt Warrior

work page
[37]

Accessed 15 Nov 2025

Available at: www.thepromptwarrior.com/p/turn-prompt-super-prompt. Accessed 15 Nov 2025

work page 2025
[38]

A Prompting Framework to Enhance Language Model Output

Ratnayake H, Wang C. A Prompting Framework to Enhance Language Model Output. 2024; Singapore: 66-81

work page 2024
[39]

Better Zero-Shot Reasoning with Role-Play Prompting , rights =

Kong A, Zhao S, Chen H, Li Q, Qin Y, Sun R, et al. Better Zero-Shot Reasoning with Role-Play Prompting. arXiv. 2023:arXiv:2308.07702

work page arXiv 2023
[40]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Lewis P , Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv e-prints. 2020:arXiv:2005.11401

work page internal anchor Pith review Pith/arXiv arXiv 2020
[41]

Large Language Models Can Be Easily Distracted by Irrelevant Context

Shi F , Chen X, Misra K, Scales N, Dohan D, Chi E, et al. Large Language Models Can Be Easily Distracted by Irrelevant Context. arXiv. 2023:arXiv:2302.00093

work page arXiv 2023
[42]

Best practices for prompt engineering with the OpenAI API

OpenAI. Best practices for prompt engineering with the OpenAI API. 2025. Available at: https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering- with-the-openai-api. Accessed 15 December 2025

work page arXiv 2025
[43]

Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation

Stahl M, Biermann L, Nehring A, Wachsmuth H. Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation. arXiv e-prints. 2024:arXiv:2404.15845

work page arXiv 2024
[44]

Order Matters: Rethinking Prompt Construction in In-Context Learning

Li W, Wang Y, Wang Z, Shang J. Order Matters: Rethinking Prompt Construction in In-Context Learning. arXiv. 2025:arXiv:2511.09700

work page arXiv 2025
[45]

Call Me A Jerk: Persuading AI to Comply with Objectionable Requests: Wharton School Research Paper (https://ssrn.com/abstract=5357179); 2025

Meincke L, Shapiro D, Duckworth A, Mollick ER, Mollick L, Cialdini R. Call Me A Jerk: Persuading AI to Comply with Objectionable Requests: Wharton School Research Paper (https://ssrn.com/abstract=5357179); 2025

work page 2025
[46]

Prompting Science Report 3: I'll pay you or I'll kill you -- but will you care? arXiv

Meincke L, Mollick E, Mollick L, Shapiro D. Prompting Science Report 3: I'll pay you or I'll kill you -- but will you care? arXiv. 2025:arXiv:2508.00614

work page arXiv 2025
[47]

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Min S, Lyu X, Holtzman A, Artetxe M, Lewis M, Hajishirzi H, et al. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022:11048-11064

work page 2022
[48]

Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot

Cheng X, Pan C, Zhao M, Li D, Liu F, Zhang X, et al. Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot. arXiv. 2025:arXiv:2506.14641

work page arXiv 2025
[49]

Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator

Kim HJ, Cho H, Kim J, Kim T , Yoo KM, Lee S-g. Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator. arXiv. 2022:arXiv:2206.08082

work page arXiv 2022
[50]

InstructEval: Systematic Evaluation of Instruction Selection Methods

Ajith A, Pan C, Xia M, Deshpande A, Narasimhan K. InstructEval: Systematic Evaluation of Instruction Selection Methods. arXiv; 2023:arXiv:2307.00259. Available at: https://ui.adsabs.harvard.edu/abs/2023arXiv230700259A. Accessed July 01, 2023

work page arXiv 2023
[51]

Instruction Tuning Vs

Wang T , Xu X, Wang Y, Jiang Y. Instruction Tuning Vs. In-Context Learning: Revisiting Large Language Models in Few-Shot Computational Social Science. arXiv; 2024:arXiv:2409.14673. 26 Available at: https://ui.adsabs.harvard.edu/abs/2024arXiv240914673W. Accessed September 01, 2024

work page arXiv 2024
[52]

Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization

Wan X, Sun R, Nakhost H, Arik SO. Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization. arXiv; 2024:arXiv:2406.15708. Available at: https://ui.adsabs.harvard.edu/abs/2024arXiv240615708W. Accessed June 01, 2024

work page arXiv 2024
[53]

The Few-shot Dilemma: Over-prompting Large Language Models

Tang Y, Tuncel D, Koerner C, Runkler T . The Few-shot Dilemma: Over-prompting Large Language Models. arXiv. 2025:arXiv:2509.13196

work page arXiv 2025
[54]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. 2022:arXiv:2201.11903. Available at: https://ui.adsabs.harvard.edu/abs/2022arXiv220111903W. Accessed January 01, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[55]

Large Language Models are Zero-Shot Reasoners

Kojima T , Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large Language Models are Zero-Shot Reasoners. 2022:arXiv:2205.11916. Available at: https://ui.adsabs.harvard.edu/abs/2022arXiv220511916K. Accessed May 01, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[56]

Prompting Science Report 2: The Decreasing Value of Chain of Thought in Prompting

Meincke L, Mollick E, Mollick L, Shapiro D. Prompting Science Report 2: The Decreasing Value of Chain of Thought in Prompting. arXiv. 2025:arXiv:2506.07142

work page arXiv 2025
[57]

Evaluating education innovations rapidly with build-measure-learn: Applying lean startup to health professions education

Cook DA, Bikkani A, Poterucha Carter MJ. Evaluating education innovations rapidly with build-measure-learn: Applying lean startup to health professions education. Med Teach. 2023;45:167-178

work page 2023
[58]

Developing and Testing Changes in Delivery of Care

Berwick DM. Developing and Testing Changes in Delivery of Care. Ann Intern Med. 1998;128:651-656

work page 1998
[59]

Large Language Model Instruction Following: A Survey of Progresses and Challenges

Lou R, Zhang K, Yin W. Large Language Model Instruction Following: A Survey of Progresses and Challenges. Computational Linguistics. 2024;50:1053-1095

work page 2024
[60]

LearnPrompting: Advanced Techniques

Schulhoff S. LearnPrompting: Advanced Techniques. Produced by Learn Prompting; 2025. Available at: https://learnprompting.org/docs/advanced/introduction. Accessed 25 November 2025

work page 2025
[61]

Ethical and social risks of harm from Language Models

Weidinger L, Mellor J, Rauh M, Griffin C, Uesato J, Huang P-S, et al. Ethical and social risks of harm from Language Models. arXiv. 2021:arXiv:2112.04359

work page internal anchor Pith review Pith/arXiv arXiv 2021
[62]

Macy Foundation Innovation Report Part II: From Hype to Reality: Innovators' Visions for Navigating AI Integration Challenges in Medical Education

Gin BC, LaForge K, Burk-Rafel J, Boscardin CK. Macy Foundation Innovation Report Part II: From Hype to Reality: Innovators' Visions for Navigating AI Integration Challenges in Medical Education. Acad Med. 2025;100:S22-s29

work page 2025
[63]

A General Language Assistant as a Laboratory for Alignment

Askell A, Bai Y, Chen A, Drain D, Ganguli D, Henighan T , et al. A General Language Assistant as a Laboratory for Alignment. arXiv. 2021:arXiv:2112.00861

work page internal anchor Pith review Pith/arXiv arXiv 2021
[64]

Large language model alignment: A survey

Shen T , Jin R, Huang Y, Liu C, Dong W, Guo Z, et al. Large Language Model Alignment: A Survey. arXiv. 2023:arXiv:2309.15025

work page arXiv 2023
[65]

Prompt Engineering Guide (IBM.com)

Gadesha V. Prompt Engineering Guide (IBM.com). Produced by IBM; 2025. Available at: https://www.ibm.com/think/topics/prompt-engineering. Accessed 23 December 2025

work page 2025
[66]

best effort

Vats V, Binta Nizam M, Liu M, Wang Z, Ho R, Sai Prasad M, et al. A Survey on Human-AI Collaboration with Large Foundation Models. arXiv; 2024:arXiv:2403.04931. Available at: https://ui.adsabs.harvard.edu/abs/2024arXiv240304931V. Accessed March 01, 2024. 27 Appendices Appendix Table A1. Details of a systematic literature search for prompt frameworks Full s...

work page arXiv 2024
[67]

thoughts

ask LLM to complete each sub-task in sequence, often using exemplars showing how to complete each sub-task. • Understand-Plan-Act-Reflect (UPAR): aims to mirror human reasoning (Kantian philosophy) by guiding LLM to: 1) Understand – answer 4 questions about relevant entities, constraints, and relationships; 2) Plan – propose a solution (similar to generic...

work page
[68]

in-context learning

ask LLM to convert each question + response into a single statement and hide part of original question; 3) ask LLM to predict the hidden part. The response from step 1 that matches with correct prediction in step 3 is the final answer . 38 The difference between Chain-of-Verification and Self-Verification is that the former asks about the correctness of t...

work page 2023