Compilation and linguistic analysis of 129 LLM prompt datasets identifies distinguishing features, with syntactic distributions enabling high-accuracy lightweight routing and quality prediction in three downstream tasks.
needles”) hidden within multi-turn conversations. Inspired by Gemini’s MRCR, it embeds 2, 4, or 8 duplicate prompts (e.g., “Write a poem about tapirs
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
Large Language Model Prompt Datasets: An In-depth Analysis and Insights
Compilation and linguistic analysis of 129 LLM prompt datasets identifies distinguishing features, with syntactic distributions enabling high-accuracy lightweight routing and quality prediction in three downstream tasks.