Recognition: unknown
PromptSuite: A Task-Agnostic Framework for Multi-Prompt Generation
read the original abstract
Evaluating LLMs with a single prompt has proven unreliable, with small changes leading to significant performance differences. However, generating the prompt variations needed for a more robust multi-prompt evaluation is challenging, limiting its adoption in practice. To address this, we introduce PromptSuite, a framework that enables the automatic generation of various prompts. PromptSuite is flexible - working out of the box on a wide range of tasks and benchmarks. It follows a modular prompt design, allowing controlled perturbations to each component, and is extensible, supporting the addition of new components and perturbation types. Through a series of case studies, we show that PromptSuite provides meaningful variations to support strong evaluation practices. All resources, including the Python API, source code, user-friendly web interface, and demonstration video, are available at: https://eliyahabba.github.io/PromptSuite/.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
From Words to Widgets for Controllable LLM Generation
Malleable Prompting reifies subjective preferences from natural language into GUI widgets and modulates LLM token probabilities during decoding to enable controllable generation, with a user study showing improved pre...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.