While raw web text can be collected at scale, annotat- ing it remains costly (Ross and et al., 2017) and human annotators inevitably introduce subjective biases(Casellietal.,2021)

Introduction A central bottleneck in building robust detectors for hateful, offensive language is the scarcity of high-quality labelled training data (Vidgen · 2020

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Toward Generalized Cross-Lingual Hateful Language Detection with Web-Scale Data and Ensemble LLM Annotations

cs.CL · 2026-03-18 · unverdicted · novelty 4.0

Continued pre-training on web data and LLM-ensemble synthetic labels improve multilingual hate speech detection, with gains up to 11% for small models in low-resource settings.

citing papers explorer

Showing 1 of 1 citing paper.

Toward Generalized Cross-Lingual Hateful Language Detection with Web-Scale Data and Ensemble LLM Annotations cs.CL · 2026-03-18 · unverdicted · none · ref 1
Continued pre-training on web data and LLM-ensemble synthetic labels improve multilingual hate speech detection, with gains up to 11% for small models in low-resource settings.

While raw web text can be collected at scale, annotat- ing it remains costly (Ross and et al., 2017) and human annotators inevitably introduce subjective biases(Casellietal.,2021)

fields

years

verdicts

representative citing papers

citing papers explorer