ToxSyn-PT: A Synthetic Fine-Grained Dataset of Minority-Targeted Toxic Language in Portuguese

Arlindo R. Galv\~ao Filho; Diogo Fernandes; Fernanda Bufon Farber; Iago Alves Brito; Julia Soares Dollis

arxiv: 2506.10245 · v3 · pith:DNIDRGUOnew · submitted 2025-06-11 · 💻 cs.CL · cs.AI

ToxSyn-PT: A Synthetic Fine-Grained Dataset of Minority-Targeted Toxic Language in Portuguese

Iago Alves Brito , Julia Soares Dollis , Fernanda Bufon Farber , Diogo Fernandes , Arlindo R. Galv\~ao Filho This is my paper

classification 💻 cs.CL cs.AI

keywords hatedetectionnon-toxictoxiccapturecounterexamplesdatadatasets

0 comments

read the original abstract

The development of robust hate speech detection systems remains limited by the lack of large-scale, fine-grained training data, especially for languages beyond English. Existing corpora typically rely on simplistic toxic and non-toxic labels, and the few that capture hate directed at specific minority groups lack the positive counterexamples required to distinguish genuine hate from mere discussion. In this work, we introduce ToxSyn-PT, the first Portuguese large-scale corpus explicitly designed for multi-label hate speech detection across nine protected minority groups, including the non-toxic counterexamples absent in all other public datasets. Generated via a controllable four-stage pipeline, ToxSyn contains discourse-type annotations to capture rhetorical strategies of toxic/non-toxic language, such as sarcasm, dehumanization, and cultural appreciation. Our experiments reveal a catastrophic, mutual generalization failure compared to existing datasets from social-media domains: models trained on social media struggle to generalize to minority-specific contexts, and vice-versa. This finding indicates they are distinct tasks and exposes summary metrics like Macro F1 can be unreliable indicators of true model behavior, as they completely mask model failure. We publicly release ToxSyn on HuggingFace to support reproducible research on synthetic data generation and benchmark progress in hate-speech detection for low- and mid-resource languages.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Safety Is Not Universal: The Selective Safety Trap in LLM Alignment
cs.CL 2026-01 conditional novelty 7.0

Safety alignment in LLMs is not uniform but forms a demographic hierarchy, with defense rates varying by up to 42% across groups; a new benchmark and DPO method demonstrate transferable safety.