Continued pre-training on web data and LLM-ensemble synthetic labels improve multilingual hate speech detection, with gains up to 11% for small models in low-resource settings.
While raw web text can be collected at scale, annotat- ing it remains costly (Ross and et al., 2017) and human annotators inevitably introduce subjective biases(Casellietal.,2021)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Toward Generalized Cross-Lingual Hateful Language Detection with Web-Scale Data and Ensemble LLM Annotations
Continued pre-training on web data and LLM-ensemble synthetic labels improve multilingual hate speech detection, with gains up to 11% for small models in low-resource settings.