Hate Speech Detection in Turkish and Arabic Languages: A Comprehensive Study

Arzucan \"Ozg\"ur; Berrin Yanikoglu; Elif Erol; G\"ok\c{c}e Uludo\u{g}an; Mehmet Umut \c{S}en; Somaiyeh Dehghan

arxiv: 2607.00143 · v1 · pith:JAOML7XPnew · submitted 2026-06-30 · 💻 cs.CL · cs.AI

Hate Speech Detection in Turkish and Arabic Languages: A Comprehensive Study

Somaiyeh Dehghan , G\"ok\c{c}e Uludo\u{g}an , Mehmet Umut \c{S}en , Elif Erol , Arzucan \"Ozg\"ur , Berrin Yanikoglu This is my paper

Pith reviewed 2026-07-02 19:22 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords hate speech detectionTurkishArabicBERTdatasetmulti-task learningspan detectioncontent moderation

0 comments

The pith

A new hate speech dataset for five Turkish topics and Arabic refugees comes with BERT models for category, intensity, target, and span tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a dataset drawn from Turkish social media posts on refugees, the Israel-Palestine conflict, anti-Greek sentiment, ethnic and religious communities, and LGBTI+ issues, together with Arabic examples focused on refugees. It pairs this data with fine-tuned BERT models that perform four linked tasks at once: assigning hate categories, scoring intensity, naming the target group, and marking the exact hateful spans in text. A sympathetic reader would see value in moving beyond simple yes-no detection toward finer-grained tools that could support more precise content decisions on platforms. The work treats the combination of topic-specific collection and multi-output modeling as the route to better coverage of hate speech in these languages.

Core claim

We introduce a comprehensive hate speech dataset covering five distinct topics in Turkish: refugees, the Israel-Palestine conflict, anti-Greek sentiment in Turkey, ethnic or religious communities (Alevis, Armenians, Arabs, Jews, and Kurds), and LGBTI+, alongside one topic in Arabic (refugees). In addition, we develop state-of-the-art BERT-based models to address multiple dimensions of hate speech analysis, including hate category classification, hate intensity prediction, target identification, and hate speech span detection, enabling a comprehensive understanding of hateful content in online discourse.

What carries the argument

BERT-based models fine-tuned on the new multi-topic dataset to output category labels, intensity scores, target groups, and character-level hateful spans in a single pass.

If this is right

Moderation systems can move from binary flags to outputs that also report how intense the hate is and which exact phrases trigger it.
Analysis becomes possible for hate patterns tied to specific Turkish topics such as ethnic communities or LGBTI+ discussions.
The same modeling approach can be applied to additional low-resource language settings where only topic-labeled text is available.
Links between online content and offline incidents can be studied with finer target and intensity labels attached to each post.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dataset could support experiments testing whether intensity or span signals transfer across the Turkish topics better than category labels alone.
Deployment on live streams might expose cases where context outside the post changes whether the same text counts as hate.
Span detection outputs could be used to generate training examples for simpler keyword-based filters in the same languages.

Load-bearing premise

The posts gathered for the dataset reflect typical real-world hate speech in Turkish and Arabic on the listed topics, and the fine-tuned models will maintain reliable performance without major annotation errors or shifts in language use.

What would settle it

Fresh posts collected from the same platforms and topics produce substantially lower accuracy on any of the four tasks, or repeated annotation rounds show low agreement on labels and spans.

read the original abstract

Online hate speech has been linked to a global rise in violence against minorities, including incidents such as mass shootings, lynchings, and ethnic cleansing. Societies grappling with this issue, particularly when hate speech targets specific groups based on religion, race, ethnicity, culture, nationality, or migration status, face the challenge of balancing freedom of expression with the need for effective content moderation on widely used online platforms. In response to this challenge, we introduce a comprehensive hate speech dataset covering five distinct topics in Turkish: refugees, the Israel-Palestine conflict, anti-Greek sentiment in Turkey, ethnic or religious communities (Alevis, Armenians, Arabs, Jews, and Kurds), and LGBTI+, alongside one topic in Arabic (refugees). In addition, we develop state-of-the-art BERT-based models to address multiple dimensions of hate speech analysis, including hate category classification, hate intensity prediction, target identification, and hate speech span detection, enabling a comprehensive understanding of hateful content in online discourse.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New Turkish and Arabic hate speech datasets on specific topics plus BERT models for four tasks, but the abstract supplies zero metrics or collection details so the SOTA claim can't be checked yet.

read the letter

The main takeaway is that the authors built new annotated datasets for hate speech in Turkish across five topics (refugees, Israel-Palestine, anti-Greek sentiment, ethnic/religious groups, LGBTI+) and one topic in Arabic (refugees), then fine-tuned BERT for hate category classification, intensity prediction, target identification, and span detection.

They fill a genuine gap by targeting languages with little prior work and picking timely, high-stakes topics. Resource papers like this can matter for moderation tools in those regions if the data turns out usable.

The soft spots are straightforward. The abstract asserts state-of-the-art models but gives no numbers, no baselines, no data sizes, no sources, and no annotation agreement scores. Without those, it's impossible to tell whether the models actually work or whether the data collection avoided obvious biases. The weakest assumption is that the collected examples represent real-world hate speech and that standard BERT fine-tuning will deliver reliable results across tasks.

This is for NLP researchers working on low-resource languages or social media analysis who need fresh data. Someone building moderation systems for Turkish or Arabic platforms might find the datasets useful if they get released with clear documentation.

It deserves peer review because new datasets in these languages are worth checking even if the modeling side needs more evidence. The work looks like honest dataset construction rather than overclaimed theory.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces a comprehensive hate speech dataset covering five topics in Turkish (refugees, Israel-Palestine conflict, anti-Greek sentiment, ethnic/religious communities including Alevis/Armenians/Arabs/Jews/Kurds, and LGBTI+) and one topic in Arabic (refugees). It additionally develops BERT-based models for multiple tasks: hate category classification, hate intensity prediction, target identification, and hate speech span detection.

Significance. If the dataset collection, annotation quality (including IAA), and model evaluations are rigorously documented and the models demonstrably outperform baselines, the work would provide valuable resources for hate speech detection in under-resourced languages, supporting content moderation research.

major comments (1)

[Abstract] Abstract: The claim of developing 'state-of-the-art BERT-based models' for the listed tasks is unsupported by any reported metrics, baselines, data splits, or performance numbers, making it impossible to evaluate whether the central empirical claims hold.

minor comments (1)

[Abstract] Abstract: No information is provided on data sources, collection methodology, annotation protocol, or inter-annotator agreement, which are standard requirements for dataset papers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and the recommendation for major revision. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The claim of developing 'state-of-the-art BERT-based models' for the listed tasks is unsupported by any reported metrics, baselines, data splits, or performance numbers, making it impossible to evaluate whether the central empirical claims hold.

Authors: We agree that the abstract's phrasing is problematic. The manuscript body contains the full experimental results, baselines, data splits, and metrics for the BERT models on the four tasks, but the abstract does not report any of these numbers and therefore cannot support the 'state-of-the-art' claim. In the revised version we will remove the phrase 'state-of-the-art' from the abstract and replace it with a neutral description of the models developed, ensuring the abstract makes no unsubstantiated performance claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is an empirical paper focused on dataset construction for hate speech detection across Turkish and Arabic topics plus standard BERT fine-tuning for classification, intensity prediction, target ID, and span detection. No derivation chain, equations, fitted parameters presented as predictions, or self-citation load-bearing steps exist in the provided text. Central claims rest on data collection protocols and model performance metrics that are externally verifiable rather than reducing to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical NLP study focused on dataset creation and model training; the abstract introduces no free parameters, mathematical axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5738 in / 1150 out tokens · 28687 ms · 2026-07-02T19:22:20.729702+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 4 canonical work pages · 1 internal anchor

[1]

A lexicon-based approach for hate speech detection.International Journal of Multimedia and Ubiquitous Engineering, 10:215–230, 2015

Gitari N M, Zuping Z, Damien H, and Jun Long J. A lexicon-based approach for hate speech detection.International Journal of Multimedia and Ubiquitous Engineering, 10:215–230, 2015. 13 DEHGHAN et al. / A PREPRINT

2015
[2]

Contextual-lexicon approach for abusive language detection

Vargas F, G´ oes F R, Carvalho I, Benevenuto F, and Pardo T. Contextual-lexicon approach for abusive language detection. InProceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1438–1447, Held Online, September 2021. INCOMA Ltd

2021
[3]

A corpus of Turkish offensive language on social media

C ¸ ¨ oltekin C ¸ . A corpus of Turkish offensive language on social media. InProceedings of the Twelfth Language Resources and Evaluation Conference, pages 6174–6184, Marseille, France, May 2020. European Language Resources Association

2020
[4]

Automated detection of hate speech towards woman on twitter

S ¸ahi H, Kılı¸ cY, and Saˇ glam R B. Automated detection of hate speech towards woman on twitter. In2018 3rd International Conference on Computer Science and Engineering (UBMK), pages 533–536, 2018

2018
[5]

Automatic hate speech detection using machine learning: A comparative study.International Journal of Advanced Computer Science and Applications, 11, 2020

Abro S, Shaikh S, Khand Z H, Ali Z, Khan S, and Mujtaba G. Automatic hate speech detection using machine learning: A comparative study.International Journal of Advanced Computer Science and Applications, 11, 2020

2020
[6]

Identifying hate speech using neural networks and discourse analysis techniques

H¨ us¨ unbeyi Z M, Akar D, and ¨Ozg¨ ur A. Identifying hate speech using neural networks and discourse analysis techniques. InProceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference, pages 32–41, Marseille, France, June
[7]

European Language Resources Association
[8]

A Turkish hate speech dataset and detection system

Beyhan F, C ¸ arık B, Arın I, Terzio˘ glu A, Yanikoglu B, and Yeniterzi R. A Turkish hate speech dataset and detection system. InProceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), page 4177–4185, 2022.https://aclanthology.org/2022.lrec-1.443/

2022
[9]

Large-scale hate speech detection with cross-domain transfer

Toraman C, S ¸ahinu¸ c F, and Yilmaz E. Large-scale hate speech detection with cross-domain transfer. InProceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2215–2225, Marseille, France, June 2022. European Language Resources Association

2022
[10]

T¨ urk¸ ce hakaret ve nefret s¨ oylemi otomatik tespit modeli.Veri Bilimi, 6(1):61–73, 2023

Kurt M S and Demirel E Y. T¨ urk¸ ce hakaret ve nefret s¨ oylemi otomatik tespit modeli.Veri Bilimi, 6(1):61–73, 2023

2023
[11]

Multi-domain hate speech detection using dual contrastive learning and paralinguistic features

Dehghan S and Yanikoglu B. Multi-domain hate speech detection using dual contrastive learning and paralinguistic features. InProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 11745–11755, Torino, Italia, May 2024. ELRA and ICCL

2024
[12]

Hate and offensive speech detection on arabic social media.Online Social Networks and Media, 19:100096, 2020

Alsafari S, Sadaoui S, and Mouhoub M. Hate and offensive speech detection on arabic social media.Online Social Networks and Media, 19:100096, 2020

2020
[13]

Arabic hate speech detection system based on arabert

Salomon P O, Kechaou Z, and Wali A. Arabic hate speech detection system based on arabert. In2022 IEEE 21st International Conference on Cognitive Informatics and Cognitive Computing (ICCI*CC), pages 208–213, Toronto, Canada, 2022

2022
[14]

arhatedetector: detection of hate speech from standard and dialectal arabic tweets.Discover Internet of Things, 3, 2023

Khezzar R, Moursi A, and Al Aghbari Z. arhatedetector: detection of hate speech from standard and dialectal arabic tweets.Discover Internet of Things, 3, 2023

2023
[15]

A survey of machine learning models and datasets for the multi-label classification of textual hate speech in english.ACM Trans

B¨ aumler J, Bl¨ ocher L, Frey L-J, Chen X, Bayer M, and Reuter C. A survey of machine learning models and datasets for the multi-label classification of textual hate speech in english.ACM Trans. Knowl. Discov. Data, June 2026. Just Accepted

2026
[16]

All together against hate: ensemble based llms for multi class hate speech classification in the football context, 2026

Santos G L, Santos V G d, and Kearns C. All together against hate: ensemble based llms for multi class hate speech classification in the football context, 2026

2026
[17]

Exploring boundaries and intensities in offensive and hate speech: Unveiling the complex spectrum of social media discourse, 2024

Ayele A A, Jalew E A, Ali A C, Yimam S M, and Biemann C. Exploring boundaries and intensities in offensive and hate speech: Unveiling the complex spectrum of social media discourse, 2024. Preprint athttps://doi.org/ 10.48550/arXiv.2404.12042

work page doi:10.48550/arxiv.2404.12042 2024
[18]

Exploring intensities of hate speech on social media: A case study on explaining multilingual models with XAI

Geleta R R, Eckelt K, Parada-Cabaleiro E, and Schedl M. Exploring intensities of hate speech on social media: A case study on explaining multilingual models with XAI. InProceedings of the 4th Conference on Language, Data and Knowledge, pages 532–537, Vienna, Austria, September 2023. NOVA CLUNL, Portugal

2023
[19]

Deep learning for hate speech intensity analysis: Distilbert classification algorithm

Riyadi S, Masyhur A M, and Andriyani A D. Deep learning for hate speech intensity analysis: Distilbert classification algorithm. In2024 IEEE Symposium on Industrial Electronics and Applications (ISIEA), pages 1–6, 2024. 14 DEHGHAN et al. / A PREPRINT

2024
[20]

JL-hate: An annotated dataset for joint learning of hate speech and target detection

B¨ uy¨ ukdemirci K, Kucukkaya I E,¨Olmez E, and Toraman C. JL-hate: An annotated dataset for joint learning of hate speech and target detection. InProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 9543–9553, Torino, Italia, may 2024. ELRA and ICCL

2024
[21]

Detecting hate speech in Turkish print media: A corpus and a hybrid approach with target-oriented linguistic knowledge

G Uludo˘ gan, Y¨ uksel A E, Tun¸ cer U, I¸ sık B, Korkmaz Y, Akar D, and¨Ozg¨ ur A. Detecting hate speech in Turkish print media: A corpus and a hybrid approach with target-oriented linguistic knowledge. InProceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024), pages 205–214, ...

2024
[22]

ViHOS: Hate speech spans detection for Vietnamese

Hoang P G, Luu C D, Tran K Q, Nguyen K V, and Nguyen N L-T. ViHOS: Hate speech spans detection for Vietnamese. InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 652–669, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics

2023
[23]

Explainability and hate speech: Structured explanations make social media moderators faster

Calabrese A, Neves L, Shah N, Bos M, Ross B, Lapata M, and Barbieri F. Explainability and hate speech: Structured explanations make social media moderators faster. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 398–408, Bangk...

2024
[24]

Siu2023-nst - hate speech detection contest

Arın I, I¸ sık Z, Kutal S, Dehghan S, ¨Ozg¨ ur A, and Yanikoglu B. Siu2023-nst - hate speech detection contest. In 2023 31st Signal Processing and Communications Applications Conference (SIU), pages 1–4, 2023

2023
[25]

Dealing with Annotator Disagreement in Hate Speech Classification

Dehghan S, Sen M U, and Yanikoglu B. Dealing with annotator disagreement in hate speech classification, 2025. Preprint athttps://doi.org/10.48550/arXiv.2502.08266

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.08266 2025
[26]

Overview of the hate speech detection in Turkish and Arabic tweets (HSD-2Lang) shared task at CASE 2024

Uludo˘ gan G, Dehghan S, Arin I, Erol E, Yanikoglu B, and¨Ozg¨ ur A. Overview of the hate speech detection in Turkish and Arabic tweets (HSD-2Lang) shared task at CASE 2024. InProceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024), pages 229–233, St. Julians, Malta, March 2024...

2024
[27]

Evaluating ChatGPT’s ability to detect hate speech in Turkish tweets

Dehghan S and Yanikoglu B. Evaluating ChatGPT’s ability to detect hate speech in Turkish tweets. InProceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024), pages 54–59, St. Julians, Malta, March 2024. Association for Computational Linguistics

2024
[28]

HATECAT-TR: A hate speech span detection and categorization dataset for Turkish

S ¸eker H K, Uludo˘ gan G,¨Onal P, and ¨Ozg¨ ur A. HATECAT-TR: A hate speech span detection and categorization dataset for Turkish. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors, Findings of the Association for Computational Linguistics: EMNLP 2025, pages 25568–25579, Suzhou, China, November 2025. Association f...

2025
[29]

BERT: Pre-training of deep bidirectional transformers for language understanding

Devlin J, Chang M W, Lee K, and Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171– 4186, Minneapolis, Minnesota, June 201...

2019
[30]

Supervised contrastive learning

Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, and Krishnan D. Supervised contrastive learning. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA, 2020. Curran Associates Inc

2020
[31]

A simple framework for contrastive learning of visual representa- tions

Chen T, Kornblith S, Norouzi M, and Hinton G. A simple framework for contrastive learning of visual representa- tions. InProceedings of the 37 th International Conference on Machine Learning, Vienna, Austria, 2020

2020
[32]

Zero-shot hashtag segmentation for multilingual sentiment analysis, 2021

Rodrigues R C, Inuzuka M A, Gomes J R S, Rocha A S, Calixto I, and Nascimento H A D. Zero-shot hashtag segmentation for multilingual sentiment analysis, 2021. Preprint athttps://doi.org/10.48550/arXiv.2112.03213

work page doi:10.48550/arxiv.2112.03213 2021
[33]

Synthetic data generation with large language models for text classification: Potential and limitations

Li Z, Zhu H, Lu Z, and Yin M. Synthetic data generation with large language models for text classification: Potential and limitations. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10443–10461, Singapore, December 2023. Association for Computational Linguistics

2023
[34]

Generative ai for synthetic data generation: Methods, challenges and the future, 2024

Guo X and Chen Y. Generative ai for synthetic data generation: Methods, challenges and the future, 2024. Preprint athttps://doi.org/10.48550/arXiv.2403.04190. 15

work page doi:10.48550/arxiv.2403.04190 2024

[1] [1]

A lexicon-based approach for hate speech detection.International Journal of Multimedia and Ubiquitous Engineering, 10:215–230, 2015

Gitari N M, Zuping Z, Damien H, and Jun Long J. A lexicon-based approach for hate speech detection.International Journal of Multimedia and Ubiquitous Engineering, 10:215–230, 2015. 13 DEHGHAN et al. / A PREPRINT

2015

[2] [2]

Contextual-lexicon approach for abusive language detection

Vargas F, G´ oes F R, Carvalho I, Benevenuto F, and Pardo T. Contextual-lexicon approach for abusive language detection. InProceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1438–1447, Held Online, September 2021. INCOMA Ltd

2021

[3] [3]

A corpus of Turkish offensive language on social media

C ¸ ¨ oltekin C ¸ . A corpus of Turkish offensive language on social media. InProceedings of the Twelfth Language Resources and Evaluation Conference, pages 6174–6184, Marseille, France, May 2020. European Language Resources Association

2020

[4] [4]

Automated detection of hate speech towards woman on twitter

S ¸ahi H, Kılı¸ cY, and Saˇ glam R B. Automated detection of hate speech towards woman on twitter. In2018 3rd International Conference on Computer Science and Engineering (UBMK), pages 533–536, 2018

2018

[5] [5]

Automatic hate speech detection using machine learning: A comparative study.International Journal of Advanced Computer Science and Applications, 11, 2020

Abro S, Shaikh S, Khand Z H, Ali Z, Khan S, and Mujtaba G. Automatic hate speech detection using machine learning: A comparative study.International Journal of Advanced Computer Science and Applications, 11, 2020

2020

[6] [6]

Identifying hate speech using neural networks and discourse analysis techniques

H¨ us¨ unbeyi Z M, Akar D, and ¨Ozg¨ ur A. Identifying hate speech using neural networks and discourse analysis techniques. InProceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference, pages 32–41, Marseille, France, June

[7] [7]

European Language Resources Association

[8] [8]

A Turkish hate speech dataset and detection system

Beyhan F, C ¸ arık B, Arın I, Terzio˘ glu A, Yanikoglu B, and Yeniterzi R. A Turkish hate speech dataset and detection system. InProceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), page 4177–4185, 2022.https://aclanthology.org/2022.lrec-1.443/

2022

[9] [9]

Large-scale hate speech detection with cross-domain transfer

Toraman C, S ¸ahinu¸ c F, and Yilmaz E. Large-scale hate speech detection with cross-domain transfer. InProceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2215–2225, Marseille, France, June 2022. European Language Resources Association

2022

[10] [10]

T¨ urk¸ ce hakaret ve nefret s¨ oylemi otomatik tespit modeli.Veri Bilimi, 6(1):61–73, 2023

Kurt M S and Demirel E Y. T¨ urk¸ ce hakaret ve nefret s¨ oylemi otomatik tespit modeli.Veri Bilimi, 6(1):61–73, 2023

2023

[11] [11]

Multi-domain hate speech detection using dual contrastive learning and paralinguistic features

Dehghan S and Yanikoglu B. Multi-domain hate speech detection using dual contrastive learning and paralinguistic features. InProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 11745–11755, Torino, Italia, May 2024. ELRA and ICCL

2024

[12] [12]

Hate and offensive speech detection on arabic social media.Online Social Networks and Media, 19:100096, 2020

Alsafari S, Sadaoui S, and Mouhoub M. Hate and offensive speech detection on arabic social media.Online Social Networks and Media, 19:100096, 2020

2020

[13] [13]

Arabic hate speech detection system based on arabert

Salomon P O, Kechaou Z, and Wali A. Arabic hate speech detection system based on arabert. In2022 IEEE 21st International Conference on Cognitive Informatics and Cognitive Computing (ICCI*CC), pages 208–213, Toronto, Canada, 2022

2022

[14] [14]

arhatedetector: detection of hate speech from standard and dialectal arabic tweets.Discover Internet of Things, 3, 2023

Khezzar R, Moursi A, and Al Aghbari Z. arhatedetector: detection of hate speech from standard and dialectal arabic tweets.Discover Internet of Things, 3, 2023

2023

[15] [15]

A survey of machine learning models and datasets for the multi-label classification of textual hate speech in english.ACM Trans

B¨ aumler J, Bl¨ ocher L, Frey L-J, Chen X, Bayer M, and Reuter C. A survey of machine learning models and datasets for the multi-label classification of textual hate speech in english.ACM Trans. Knowl. Discov. Data, June 2026. Just Accepted

2026

[16] [16]

All together against hate: ensemble based llms for multi class hate speech classification in the football context, 2026

Santos G L, Santos V G d, and Kearns C. All together against hate: ensemble based llms for multi class hate speech classification in the football context, 2026

2026

[17] [17]

Exploring boundaries and intensities in offensive and hate speech: Unveiling the complex spectrum of social media discourse, 2024

Ayele A A, Jalew E A, Ali A C, Yimam S M, and Biemann C. Exploring boundaries and intensities in offensive and hate speech: Unveiling the complex spectrum of social media discourse, 2024. Preprint athttps://doi.org/ 10.48550/arXiv.2404.12042

work page doi:10.48550/arxiv.2404.12042 2024

[18] [18]

Exploring intensities of hate speech on social media: A case study on explaining multilingual models with XAI

Geleta R R, Eckelt K, Parada-Cabaleiro E, and Schedl M. Exploring intensities of hate speech on social media: A case study on explaining multilingual models with XAI. InProceedings of the 4th Conference on Language, Data and Knowledge, pages 532–537, Vienna, Austria, September 2023. NOVA CLUNL, Portugal

2023

[19] [19]

Deep learning for hate speech intensity analysis: Distilbert classification algorithm

Riyadi S, Masyhur A M, and Andriyani A D. Deep learning for hate speech intensity analysis: Distilbert classification algorithm. In2024 IEEE Symposium on Industrial Electronics and Applications (ISIEA), pages 1–6, 2024. 14 DEHGHAN et al. / A PREPRINT

2024

[20] [20]

JL-hate: An annotated dataset for joint learning of hate speech and target detection

B¨ uy¨ ukdemirci K, Kucukkaya I E,¨Olmez E, and Toraman C. JL-hate: An annotated dataset for joint learning of hate speech and target detection. InProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 9543–9553, Torino, Italia, may 2024. ELRA and ICCL

2024

[21] [21]

Detecting hate speech in Turkish print media: A corpus and a hybrid approach with target-oriented linguistic knowledge

G Uludo˘ gan, Y¨ uksel A E, Tun¸ cer U, I¸ sık B, Korkmaz Y, Akar D, and¨Ozg¨ ur A. Detecting hate speech in Turkish print media: A corpus and a hybrid approach with target-oriented linguistic knowledge. InProceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024), pages 205–214, ...

2024

[22] [22]

ViHOS: Hate speech spans detection for Vietnamese

Hoang P G, Luu C D, Tran K Q, Nguyen K V, and Nguyen N L-T. ViHOS: Hate speech spans detection for Vietnamese. InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 652–669, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics

2023

[23] [23]

Explainability and hate speech: Structured explanations make social media moderators faster

Calabrese A, Neves L, Shah N, Bos M, Ross B, Lapata M, and Barbieri F. Explainability and hate speech: Structured explanations make social media moderators faster. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 398–408, Bangk...

2024

[24] [24]

Siu2023-nst - hate speech detection contest

Arın I, I¸ sık Z, Kutal S, Dehghan S, ¨Ozg¨ ur A, and Yanikoglu B. Siu2023-nst - hate speech detection contest. In 2023 31st Signal Processing and Communications Applications Conference (SIU), pages 1–4, 2023

2023

[25] [25]

Dealing with Annotator Disagreement in Hate Speech Classification

Dehghan S, Sen M U, and Yanikoglu B. Dealing with annotator disagreement in hate speech classification, 2025. Preprint athttps://doi.org/10.48550/arXiv.2502.08266

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.08266 2025

[26] [26]

Overview of the hate speech detection in Turkish and Arabic tweets (HSD-2Lang) shared task at CASE 2024

Uludo˘ gan G, Dehghan S, Arin I, Erol E, Yanikoglu B, and¨Ozg¨ ur A. Overview of the hate speech detection in Turkish and Arabic tweets (HSD-2Lang) shared task at CASE 2024. InProceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024), pages 229–233, St. Julians, Malta, March 2024...

2024

[27] [27]

Evaluating ChatGPT’s ability to detect hate speech in Turkish tweets

Dehghan S and Yanikoglu B. Evaluating ChatGPT’s ability to detect hate speech in Turkish tweets. InProceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024), pages 54–59, St. Julians, Malta, March 2024. Association for Computational Linguistics

2024

[28] [28]

HATECAT-TR: A hate speech span detection and categorization dataset for Turkish

S ¸eker H K, Uludo˘ gan G,¨Onal P, and ¨Ozg¨ ur A. HATECAT-TR: A hate speech span detection and categorization dataset for Turkish. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors, Findings of the Association for Computational Linguistics: EMNLP 2025, pages 25568–25579, Suzhou, China, November 2025. Association f...

2025

[29] [29]

BERT: Pre-training of deep bidirectional transformers for language understanding

Devlin J, Chang M W, Lee K, and Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171– 4186, Minneapolis, Minnesota, June 201...

2019

[30] [30]

Supervised contrastive learning

Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, and Krishnan D. Supervised contrastive learning. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA, 2020. Curran Associates Inc

2020

[31] [31]

A simple framework for contrastive learning of visual representa- tions

Chen T, Kornblith S, Norouzi M, and Hinton G. A simple framework for contrastive learning of visual representa- tions. InProceedings of the 37 th International Conference on Machine Learning, Vienna, Austria, 2020

2020

[32] [32]

Zero-shot hashtag segmentation for multilingual sentiment analysis, 2021

Rodrigues R C, Inuzuka M A, Gomes J R S, Rocha A S, Calixto I, and Nascimento H A D. Zero-shot hashtag segmentation for multilingual sentiment analysis, 2021. Preprint athttps://doi.org/10.48550/arXiv.2112.03213

work page doi:10.48550/arxiv.2112.03213 2021

[33] [33]

Synthetic data generation with large language models for text classification: Potential and limitations

Li Z, Zhu H, Lu Z, and Yin M. Synthetic data generation with large language models for text classification: Potential and limitations. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10443–10461, Singapore, December 2023. Association for Computational Linguistics

2023

[34] [34]

Generative ai for synthetic data generation: Methods, challenges and the future, 2024

Guo X and Chen Y. Generative ai for synthetic data generation: Methods, challenges and the future, 2024. Preprint athttps://doi.org/10.48550/arXiv.2403.04190. 15

work page doi:10.48550/arxiv.2403.04190 2024