Multiple LLMs reach human-comparable reliability for coding humanitarian needs when using structured prompts and reasoning models, but vary in detecting indirect expressions, out-of-category needs, and protection concerns.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Can Large Language Models Reliably Code Qualitative Humanitarian Data? A Benchmark Study Against Human Expert Adjudication
Multiple LLMs reach human-comparable reliability for coding humanitarian needs when using structured prompts and reasoning models, but vary in detecting indirect expressions, out-of-category needs, and protection concerns.