pith. machine review for the scientific record. sign in

arxiv: 1701.08118 · v1 · submitted 2017-01-27 · 💻 cs.CL

Recognition: unknown

Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis

Authors on Pith no claims yet
classification 💻 cs.CL
keywords hatespeechdefinitionreliabilityuserswhetherannotationsgroups
0
0 comments X
read the original abstract

Some users of social media are spreading racist, sexist, and otherwise hateful content. For the purpose of training a hate speech detection system, the reliability of the annotations is crucial, but there is no universally agreed-upon definition. We collected potentially hateful messages and asked two groups of internet users to determine whether they were hate speech or not, whether they should be banned or not and to rate their degree of offensiveness. One of the groups was shown a definition prior to completing the survey. We aimed to assess whether hate speech can be annotated reliably, and the extent to which existing definitions are in accordance with subjective ratings. Our results indicate that showing users a definition caused them to partially align their own opinion with the definition but did not improve reliability, which was very low overall. We conclude that the presence of hate speech should perhaps not be considered a binary yes-or-no decision, and raters need more detailed instructions for the annotation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Do We Still Need Humans in the Loop? Comparing Human and LLM Annotation in Active Learning for Hostility Detection

    cs.CL 2026-04 conditional novelty 7.0

    LLM annotation can replace human labels for hostility detection with comparable F1 at much lower cost, but active learning adds little value and error structures differ systematically.