Compilation and linguistic analysis of 129 LLM prompt datasets identifies distinguishing features, with syntactic distributions enabling high-accuracy lightweight routing and quality prediction in three downstream tasks.
1M • Description: Firefly is a Chinese instruction-tuning dataset comprising 1.15 million high- quality examples drawn from 23 common Chinese natural language processing datasets
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
Large Language Model Prompt Datasets: An In-depth Analysis and Insights
Compilation and linguistic analysis of 129 LLM prompt datasets identifies distinguishing features, with syntactic distributions enabling high-accuracy lightweight routing and quality prediction in three downstream tasks.