iPOE: Interpretable Prompt Optimization via Explanations

Jiahui Li; Roman Klinger; Sean Papay; Yarik Menchaca Resendiz

arxiv: 2605.18113 · v2 · pith:UUGP5UEUnew · submitted 2026-05-18 · 💻 cs.CL

iPOE: Interpretable Prompt Optimization via Explanations

Jiahui Li , Yarik Menchaca Resendiz , Sean Papay , Roman Klinger This is my paper

Pith reviewed 2026-05-20 10:49 UTC · model grok-4.3

classification 💻 cs.CL

keywords prompt optimizationinterpretable promptsannotation guidelinesexplanationslarge language modelsclassification tasksperformance improvement

0 comments

The pith

Guidelines derived from annotation explanations optimize prompts and improve LLM performance by up to 35 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces iPOE, a strategy for optimizing prompts that incorporates automatically created guidelines derived from explanations of annotation decisions. These guidelines are refined through operations like removing, adding, shuffling, and merging to create transparent instructions for the language model. The approach bridges prompt optimization with human annotation practices by making the decision process clear and understandable. A sympathetic reader would care because it addresses the opacity in current prompt engineering methods, enabling better performance and accessibility for non-experts in specialized domains. Experiments across four datasets demonstrate improvements of up to 31 percent over prompts without guidelines and 35 percent over those with random guidelines, while showing that LLM-generated explanations can substitute for human ones.

Core claim

The central claim is that guiding prompt optimization with guidelines automatically derived from explanations of annotation decisions, refined by a series of operations including removing, adding, shuffling, and merging, results in prompts that are both interpretable and higher-performing for LLMs on classification tasks.

What carries the argument

The iPOE method, which generates a set of guidelines from explanations of annotation decisions and optimizes them via remove, add, shuffle, and merge operations to produce transparent annotation instructions for the LLM.

Load-bearing premise

That guidelines automatically derived from explanations of annotation decisions, when refined by the listed operations, will produce prompts that are both more transparent and measurably higher-performing for LLMs on the target tasks.

What would settle it

A direct comparison on the same four datasets where iPOE guidelines yield no accuracy gain or no increase in transparency relative to prompts without guidelines or with random guidelines.

Figures

Figures reproduced from arXiv: 2605.18113 by Jiahui Li, Roman Klinger, Sean Papay, Yarik Menchaca Resendiz.

**Figure 2.** Figure 2: The conceptual workflow of our iPOE approach. G refers to the current guideline set which is an empty set in the very beginning, and G′ is the updated guideline set from the operations that achieves the best performance. F(·) refers to a performance metric. Our approach also aim to exploit such additional information either provided by humans or LLMs, but instead to generate rules and guidelines by learn… view at source ↗

**Figure 3.** Figure 3: Learning curves of F1 scores on the training and validation sets for our iPOE approach. Each plot [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

read the original abstract

Prompt optimization has often been framed as a discrete search problem to find high-performing and robust instructions for an LLM. However, the search result might not make it transparent why and where specific prompt changes lead to performance gains. This is in contrast to how humans are instructed for annotation tasks. Here, researchers carefully design annotation guidelines, leading to enhanced annotation consistency. Our paper aims at joining these two approaches and introduces iPOE, a novel interpretable prompt optimization strategy via explanations. We guide the prompt optimization process by automatically created guidelines from explanations of annotation decisions (either automatically generated or from humans). This set of guidelines is furthermore optimized by as series of operations, including removing, adding, shuffling, and merging. The resulting prompt includes guidelines that instruct the annotation, making the decision process of the LLM and the optimization transparent. It therefore supports also laypeople in the area of prompt optimization, particularly in challenging domains requiring expertise. In our experiments on four datasets, we find that iPOE can improves over the evaluated baselines by up to 39% and LLM explanations can replace human explanations in the proposed method. Moreover, our interpretability validation study demonstrates that humans and LLMs can substantially agree on which guidelines contribute to their annotations, achieving a Cohen's kappa score of up to 0.65.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

iPOE turns explanations into refined guidelines for prompts and reports solid gains, but the improvements may trace to prompt length or structure rather than the specific derivation process.

read the letter

iPOE's core contribution is deriving guidelines from explanations of annotation decisions—either from LLMs or humans—and then applying remove, add, shuffle, and merge operations to refine them into prompts that are both higher-performing and more transparent. It does well in framing prompt optimization as similar to creating annotation guidelines for humans, which is a nice parallel. The pipeline is explicit, and testing that LLM explanations can stand in for human ones is a practical angle that could save effort in some settings. If the results hold, it gives a method that keeps the optimization process understandable. The soft spot is in the evaluation. The abstract reports improvements of up to 31% over no-guideline prompts and 35% over random guidelines across four datasets. However, there's no indication that the baselines were matched for total instruction length or number of guidelines. The gains could simply reflect providing more detailed or structured instructions rather than the value of explanation-sourced content or the refinement operations. Without details on statistical testing, dataset scales, or baseline construction, it's difficult to assess how robust the performance claims are. This paper would interest people working on prompt engineering for classification or annotation tasks where both performance and interpretability matter. It could be useful for practitioners who want prompts that laypeople can understand. I think it deserves a serious referee to examine the methods section for proper controls and to verify the novelty against existing prompt search techniques. Recommendation: Send it for peer review, but flag the need for length-controlled baselines.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces iPOE, a method for interpretable prompt optimization that automatically derives annotation guidelines from explanations of decisions (human or LLM-generated), refines this set via operations including removal, addition, shuffling, and merging, and incorporates the guidelines into LLM prompts. Experiments across four datasets demonstrate that iPOE prompts outperform those without guidelines by up to 31% and those with randomly selected guidelines by up to 35%, while also showing that LLM explanations can effectively replace human explanations.

Significance. If the empirical results hold after addressing potential confounds in the experimental design, this work would meaningfully advance prompt optimization research by aligning it with human annotation guideline practices. It offers a route to more transparent and accessible prompt engineering, with practical benefits for non-expert users in specialized domains and the demonstrated feasibility of substituting LLM-generated explanations for human ones.

major comments (2)

[Experimental section] Experimental section (and abstract): The reported performance gains of up to 31% and 35% are presented without details on dataset sizes, baseline prompt construction procedures, statistical testing, or any controls that match total instruction length, number of guidelines, or structural complexity between the iPOE condition and the no-guideline/random-guideline baselines. This omission leaves open the possibility that gains arise from added prompt elaboration rather than from the explanation-derived guideline content or the listed refinement operations, which is load-bearing for the central claim that the proposed derivation process drives the improvements.
[§3 (Method)] §3 (Method): The description of the guideline refinement operations (remove, add, shuffle, merge) does not specify selection criteria, ordering, or iteration limits, nor does it include an ablation isolating their contribution from the baseline effect of simply providing more detailed instructions. Without this, it is difficult to confirm that the transparency and performance benefits are attributable to the iPOE process rather than generic prompt expansion.

minor comments (1)

[Abstract] Abstract: 'as series of operations' should read 'a series of operations'; 'can improves' should read 'can improve'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, providing clarifications where possible and committing to revisions that strengthen the experimental rigor and methodological transparency of the manuscript.

read point-by-point responses

Referee: [Experimental section] Experimental section (and abstract): The reported performance gains of up to 31% and 35% are presented without details on dataset sizes, baseline prompt construction procedures, statistical testing, or any controls that match total instruction length, number of guidelines, or structural complexity between the iPOE condition and the no-guideline/random-guideline baselines. This omission leaves open the possibility that gains arise from added prompt elaboration rather than from the explanation-derived guideline content or the listed refinement operations, which is load-bearing for the central claim that the proposed derivation process drives the improvements.

Authors: We agree that additional experimental details are necessary to rule out confounds from prompt length or elaboration. In the revised manuscript we will report exact dataset sizes and splits for all four datasets, provide a precise description of baseline prompt construction (including how no-guideline and random-guideline prompts were generated), include statistical significance tests (paired t-tests or McNemar’s test across multiple seeds), and add length- and complexity-matched controls. While the existing random-guideline baseline already holds the number of guidelines constant, we will introduce an explicit length-matched baseline that adds generic elaboration without explanation-derived content. These changes will directly address whether the observed gains are attributable to the iPOE derivation and refinement process. revision: yes
Referee: [§3 (Method)] §3 (Method): The description of the guideline refinement operations (remove, add, shuffle, merge) does not specify selection criteria, ordering, or iteration limits, nor does it include an ablation isolating their contribution from the baseline effect of simply providing more detailed instructions. Without this, it is difficult to confirm that the transparency and performance benefits are attributable to the iPOE process rather than generic prompt expansion.

Authors: We acknowledge that §3 requires greater specificity. The revised manuscript will expand the description of each operation with explicit selection criteria (e.g., removal of redundant or low-impact guidelines based on validation-set performance, addition of new guidelines derived from remaining explanations, shuffling to test robustness, and merging for conciseness), the sequence in which operations are applied, and iteration limits (e.g., until validation performance stabilizes). We will also add an ablation that compares the full iPOE pipeline against a control condition receiving an equivalent volume of additional instructions generated without the explanation-derived guidelines or the listed refinement operations. This ablation will help isolate the contribution of the iPOE process from generic prompt expansion. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical method evaluated against external baselines

full rationale

The paper describes a procedural algorithm: guidelines are automatically derived from annotation explanations (human or LLM-generated), then refined via explicit operations (remove, add, shuffle, merge) before insertion into the prompt. Performance is assessed via direct empirical comparisons on four datasets against two external baselines (prompts without guidelines; prompts with randomly selected guidelines), reporting relative gains of up to 31% and 35%. No equations, fitted parameters, or self-referential definitions appear in the provided text. The central claims rest on measurable task accuracy rather than reducing to construction by definition, self-citation chains, or renaming of prior results. The derivation chain is therefore self-contained as an engineering procedure whose validity is tested externally.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly assumes that LLM-generated explanations are sufficiently reliable to serve as guideline sources and that the listed editing operations improve guideline quality.

pith-pipeline@v0.9.0 · 5735 in / 1155 out tokens · 56582 ms · 2026-05-20T10:49:57.191586+00:00 · methodology

iPOE: Interpretable Prompt Optimization via Explanations

Core claim

What carries the argument

Load-bearing premise

What would settle it

discussion (0)