LLM annotations for social science tasks vary substantially with prompt wording in interpretive cases but become more stable when majority voting is applied across multiple equivalent prompts.
As emphasized by Krippendorff (2018), the scientific value of a label lies not in its existence, but in its ability to facilitate stable social inference
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CY 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
What Is Actually Being Annotated? Inter-Prompt Reliability as a Measurement Problem in LLM-Based Social Science Labeling
LLM annotations for social science tasks vary substantially with prompt wording in interpretive cases but become more stable when majority voting is applied across multiple equivalent prompts.