Cultural Targets, Structural Frames, Binding Morals: A Cross-Lingual Audit of Online Hate in Multicultural Singapore

Emilio Ferrara

arxiv: 2606.21996 · v1 · pith:KQWR653Gnew · submitted 2026-06-20 · 💻 cs.SI · cs.CY

Cultural Targets, Structural Frames, Binding Morals: A Cross-Lingual Audit of Online Hate in Multicultural Singapore

Emilio Ferrara This is my paper

Pith reviewed 2026-06-26 11:00 UTC · model grok-4.3

classification 💻 cs.SI cs.CY

keywords online hatecross-lingualSingaporemoral foundationsthreat framessocial mediacontent moderationmulticulturalism

0 comments

The pith

Online hate in Singapore shows language-specific targets but shared moral and emotional structures across communities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies online hate discussions in English, Chinese, and Malay within Singapore's multicultural setting using millions of social media comments. It claims that the specific out-groups targeted vary by language, but the threat frames, moral foundations, and emotions expressed in that hate are much more uniform. This indicates a layered structure to hate where cultural differences are stronger at the level of targets than at the level of underlying grammar. The analysis relies on LLM annotations validated against human judgments to compare these layers. Understanding this could help in designing moderation that addresses common patterns rather than isolated language differences.

Core claim

The paper's core claim is that cross-lingual divergence in online hate decreases as analysis moves from targets to structures: language-by-target association is V=0.25, but drops to V=0.08 for moral foundations and V=0.07 for emotion, with binding morals of sanctity and loyalty prominent at 55-75% and hate being contempt-driven with anti-immigration focus.

What carries the argument

Layered cultural contingency, which describes how divergence falls monotonically from what is hated to how and why it is hated.

If this is right

Out-group targets for hate are specific to each language public.
Threat frames and binding morals like sanctity and loyalty are shared across languages.
Hateful comments overall receive less amplification than neutral ones, but anti-immigrant hate is amplified more.
Hate expresses out-group grievances rather than anti-system ones.
Absolute hate rates are hard to define due to low inter-model agreement, so relative structures are emphasized.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Moderation policies might benefit from targeting shared structural elements like moral language instead of specific topics.
Patterns observed here could be tested in other multilingual societies with parallel language communities.
Further human validation per language could test the reliability of the LLM-based cross-lingual comparisons.
Volume of hate not tracking events suggests it's driven by ongoing social dynamics rather than temporary triggers.

Load-bearing premise

The LLM chosen for annotation accurately reflects human perceptions of hate, frames, and morals in all three languages.

What would settle it

Human raters from each language group rating the same comments differently from the LLM on moral foundations or emotions at levels inconsistent with the high agreement scores reported.

Figures

Figures reproduced from arXiv: 2606.21996 by Emilio Ferrara.

**Figure 2.** Figure 2: Target salience is language-specific. (a) Divergence of each language from the cross-lingual mean salience per target (percentage points): targets that are uniformly salient wash to neutral, while the culturally divergent ones stand out— English over-indexes on foreign nationals, Chinese on the PRC/local-Chinese axis, and Malay on religion and LGBTQ. (b) Each language’s targets ranked by share of its confi… view at source ↗

**Figure 4.** Figure 4: The moral grammar of hate is binding-based (sanc [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Weekly confirmed-hate volume across 2025 (blue) [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 5.** Figure 5: Production ̸= resonance: for each target, share of hate produced (grey) vs share amplified (received engagement, colored). Targets whose amplified share exceeds their produced share are selectively amplified. Migrant/foreign-national (nativist) targets (orange) gain, while religious/identity targets (blue) lose. Sorted by amplification gap. most expect to mobilize out-group hostility: the February detent… view at source ↗

read the original abstract

Multicultural Singapore hosts overlapping language publics (English, Chinese, and Malay) that discuss the same out-groups in parallel, a natural setting to ask whether online hate shares a structure across languages and whether what a community $\textit{produces}$ is what it $\textit{amplifies}$. From a Singapore-centric 2025 Facebook, Reddit, and YouTube corpus (31.0M items; 1.76M comments mentioning eleven identity groups), we benchmark eight open large language models as hate annotators against a human-adjudicated gold set, adopt the best (Phi-4: accuracy 0.95, Cohen's $\kappa$=0.91, recall 1.00 on an independent manual check), and replicate every finding under a second model. The results converge on one thesis, $\textit{layered cultural contingency}$: cross-lingual divergence falls monotonically as one moves from what a community hates to how and why it hates. Which out-groups are targeted is culturally specific (language $\times$ target $V$=0.25), but the threat frames and the binding moral grammar of hate (sanctity and loyalty, $55-75\%$, not fairness) are far more shared across languages, with divergence dropping to $V$=0.08 for moral foundations and 0.07 for emotion. Hate is contempt-driven and voices an out-group, anti-immigration grievance rather than an anti-system one. Reception is selectively nativist: hateful comments are amplified less than neutral mentions overall, yet anti-immigrant hate is preferentially amplified while religious and anti-LGBTQ hate is not, and volume does not track 2025 Singapore key events. We further show that absolute hate prevalence is not well defined at the LLM-annotator level, with agreement ceilings at $\kappa\approx0.42$ across models, so we report relative structure as primary. The findings bear directly on cross-lingual content moderation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Singapore multilingual hate shows targets varying by language while moral frames converge, but the structural labels rest on unvalidated LLM output beyond binary detection.

read the letter

The paper's main result is that cross-lingual divergence in online hate drops steadily from targets (V=0.25) to moral foundations and emotions (V=0.08 and 0.07) in Singapore's English-Chinese-Malay setting. Targets are more culturally specific, but the binding morals (sanctity, loyalty) and contempt-driven frames are shared, with hate often anti-immigrant rather than anti-system.

The work does a few things right. The corpus is large and timely, drawn from 2025 Facebook, Reddit, and YouTube. Benchmarking eight LLMs against a human gold set for hate detection, then replicating with a second model, gives some grounding. Reporting relative structures instead of absolute prevalence is sensible given the kappa ceiling around 0.42.

The soft spot is the validation gap the stress test flags. Human adjudication covers only binary hate detection; there is no equivalent check described for the threat frames or moral foundation labels that carry the layered-contingency claim. If Phi-4 introduces language-specific artifacts in Chinese or Malay on those finer categories, the observed convergence could be an artifact rather than a real pattern. The abstract does not report cross-lingual human agreement on the structural annotations.

This is for computational social scientists and moderation researchers working on multilingual platforms. The Singapore case is a clean natural experiment and the data effort is real. It deserves peer review so referees can examine the annotation pipeline for the moral and frame layers, even if the binary detection step looks reasonable.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes a 2025 Singapore corpus (31M items, 1.76M comments on 11 identity groups) across English, Chinese, and Malay to test cross-lingual structure in online hate. After benchmarking eight LLMs against a human gold set and selecting Phi-4 (accuracy 0.95, κ=0.91), it reports that target selection is culturally specific (language × target V=0.25) while threat frames and binding moral foundations (sanctity/loyalty dominant, V=0.08/0.07) converge, supporting a 'layered cultural contingency' thesis; hate is contempt-driven and anti-immigrant, with selective amplification and no event linkage. Absolute prevalence is treated as ill-defined due to model disagreement (κ≈0.42).

Significance. If the structural annotations are reliable, the monotonic decline in divergence from targets to morals provides a testable, empirically grounded distinction between culturally variable and shared components of hate speech, with direct implications for multilingual moderation. The human gold set for binary detection, replication under a second model, and focus on relative structure rather than absolute rates are positive features.

major comments (2)

[Methods (LLM benchmarking and annotation pipeline)] The reported validation (accuracy 0.95, κ=0.91 on independent manual check) applies exclusively to binary hate detection against the human gold set. No parallel human cross-lingual adjudication is described for the target, threat-frame, or moral-foundation labels whose divergence statistics (V=0.25 → 0.08/0.07) carry the central layered-contingency claim.
[Results (model replication and structural comparisons)] Post-selection of Phi-4 as the primary annotator, combined with the acknowledged low inter-model agreement on prevalence (κ≈0.42), leaves open the possibility that model-specific artifacts in non-English languages drive the observed convergence in frames and morals even if human judgments diverge.

minor comments (2)

[Methods] Clarify whether the second-model replication applied the identical prompt templates and label schemas used for Phi-4 or introduced any adjustments.
[Results] The abstract states that 'volume does not track 2025 Singapore key events'; provide the exact event list and statistical test used for this null result.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and for noting the strengths of the human gold set for binary detection, the model replication, and the focus on relative structure. We respond to each major comment below.

read point-by-point responses

Referee: [Methods (LLM benchmarking and annotation pipeline)] The reported validation (accuracy 0.95, κ=0.91 on independent manual check) applies exclusively to binary hate detection against the human gold set. No parallel human cross-lingual adjudication is described for the target, threat-frame, or moral-foundation labels whose divergence statistics (V=0.25 → 0.08/0.07) carry the central layered-contingency claim.

Authors: The referee correctly identifies that human validation was performed only for binary hate detection. Target, threat-frame, and moral-foundation annotations were produced by the selected LLM and subjected to full replication under the second model; the monotonic decline in divergence (V=0.25 to 0.08/0.07) is reproduced in both. We will add an explicit limitations paragraph acknowledging the lack of human cross-lingual adjudication for the structural labels and will clarify how cross-model consistency functions as the primary robustness check for those annotations. revision: partial
Referee: [Results (model replication and structural comparisons)] Post-selection of Phi-4 as the primary annotator, combined with the acknowledged low inter-model agreement on prevalence (κ≈0.42), leaves open the possibility that model-specific artifacts in non-English languages drive the observed convergence in frames and morals even if human judgments diverge.

Authors: All structural comparisons were replicated under the second model, and the convergence patterns in threat frames and moral foundations remain unchanged. Because the low inter-model κ on prevalence is already acknowledged, the manuscript centers on relative structure; divergent model artifacts would be expected to produce inconsistent structural signals across models, which is not observed. We will expand the methods and results sections to report the replication statistics specifically for the non-binary labels and to state that this replication directly tests against language-specific model biases. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical statistics from annotated corpus

full rationale

The paper's claims rest on direct computation of association measures (Cramer's V) from LLM-annotated corpus data after benchmarking the annotator on a human gold set for binary hate detection. No equations, fitted parameters, or self-citations are used to derive the layered-contingency thesis; the monotonic decline in V (0.25 to 0.08/0.07) is an observed empirical pattern, not a constructed equivalence. The analysis is self-contained against external benchmarks (human labels, multiple models) and does not reduce to its inputs by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on two domain assumptions about annotation validity and corpus coverage; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption LLM annotations validated on a human gold set can be treated as reliable for measuring cross-lingual differences in hate structure
Basis for selecting Phi-4 and reporting structural findings
domain assumption The 2025 Singapore Facebook, Reddit, and YouTube corpus adequately represents parallel language publics discussing the same out-groups
Required to interpret language-by-target and language-by-moral associations as cultural contingency

pith-pipeline@v0.9.1-grok · 5891 in / 1330 out tokens · 56301 ms · 2026-06-26T11:00:47.916738+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 35 canonical work pages

[1]

Cassola, M

Mohammad Atari, Jesse Graham, and Morteza De- hghani. Foundations of morality in iran.Evolution and Human Behavior, 41(5):367–384, 2020. doi: 10.1016/j. evolhumbehav.2020.07.014

work page doi:10.1016/j 2020
[2]

The effects of binding moral foundations on prejudiced attitudes to- ward migrants: The mediation role of perceived realistic and symbolic threats.Genealogy, 7(3):65, 2023

Federica Bianco and Ankica Kosic. The effects of binding moral foundations on prejudiced attitudes to- ward migrants: The mediation role of perceived realistic and symbolic threats.Genealogy, 7(3):65, 2023. doi: 10.3390/genealogy7030065

work page doi:10.3390/genealogy7030065 2023
[3]

Brady, Julian A

William J. Brady, Julian A. Wills, John T. Jost, Joshua A. Tucker, and Jay J. Van Bavel. Emotion shapes the diffu- sion of moralized content in social networks.Proceed- ings of the National Academy of Sciences, 114(28):7313– 7318, 2017. doi: 10.1073/pnas.1618923114

work page doi:10.1073/pnas.1618923114 2017
[4]

Brady, Killian McLoughlin, Tuan N

William J. Brady, Killian McLoughlin, Tuan N. Doan, and Molly J. Crockett. How social learning amplifies moral outrage expression in online social networks.Sci- ence Advances, 7(33):eabe5641, 2021. doi: 10.1126/ sciadv.abe5641

2021
[5]

Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations

Aida Mostafazadeh Davani, Mark Díaz, and Vinodkumar Prabhakaran. Dealing with disagreements: Looking be- yond the majority vote in subjective annotations.Trans- actions of the Association for Computational Linguistics, 10:92–110, 2022. doi: 10.1162/tacl_a_00449

work page doi:10.1162/tacl_a_00449 2022
[6]

Automated hate speech detection and the problem of offensive language

Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. Automated hate speech detection and the problem of offensive language. InProceedings of ICWSM 2017, volume 11, pages 512–515, 2017. doi: 10.1609/icwsm.v11i1.14955

work page doi:10.1609/icwsm.v11i1.14955 2017
[7]

COLD: A bench- mark for chinese offensive language detection

Jiawen Deng, Jingyan Zhou, Hao Sun, Chujie Zheng, Fei Mi, Helen Meng, and Minlie Huang. COLD: A bench- mark for chinese offensive language detection. InPro- ceedings of EMNLP 2022, pages 11580–11599, 2022. doi: 10.18653/v1/2022.emnlp-main.796

work page doi:10.18653/v1/2022.emnlp-main.796 2022
[8]

Can LLMs express person- ality across cultures? introducing CulturalPersonas for evaluating trait alignment

Priyanka Dey, Aayush Bothra, Yugal Khanter, Jieyu Zhao, and Emilio Ferrara. Can LLMs express person- ality across cultures? introducing CulturalPersonas for evaluating trait alignment. InFindings of the Association for Computational Linguistics: EMNLP 2025, 2025. doi: 10.18653/v1/2025.findings-emnlp.1101

work page doi:10.18653/v1/2025.findings-emnlp.1101 2025
[9]

Latent Hatred: A Benchmark for Understanding Implicit Hate Speech

Mai ElSherief, Caleb Ziems, David Nguyen, Soumya Roy, Diba Wang, and Diyi Yang. Latent hatred: A bench- mark for understanding implicit hate speech. InPro- ceedings of EMNLP 2021, pages 345–363, 2021. doi: 10.18653/v1/2021.emnlp-main.29

work page doi:10.18653/v1/2021.emnlp-main.29 2021
[10]

Im- pact and dynamics of hate and counter speech online

Joshua Garland, Keyan Ghazi-Zahedi, Jean-Gabriel Young, Laurent Hébert-Dufresne, and Mirta Galesic. Im- pact and dynamics of hate and counter speech online. EPJ Data Science, 11:3, 2022. doi: 10.1140/epjds/ s13688-021-00314-6

work page doi:10.1140/epjds/ 2022
[12]

Jesse Graham, Jonathan Haidt, and Brian A. Nosek. Lib- erals and conservatives rely on different sets of moral foundations.Journal of Personality and Social Psychol- ogy, 96(5):1029–1046, 2009. doi: 10.1037/a0015141

work page doi:10.1037/a0015141 2009
[13]

Nosek, Jonathan Haidt, Ravi Iyer, Spassena Koleva, and Peter H

Jesse Graham, Brian A. Nosek, Jonathan Haidt, Ravi Iyer, Spassena Koleva, and Peter H. Ditto. Mapping the moral domain.Journal of Personality and Social Psy- chology, 101(2):366–385, 2011. doi: 10.1037/a0021847

work page doi:10.1037/a0021847 2011
[14]

Detecting social me- dia manipulation in low-resource languages

Samar Haider, Luca Luceri, Ashok Deb, Adam Badawy, Nanyun Peng, and Emilio Ferrara. Detecting social me- dia manipulation in low-resource languages. InCom- panion Proceedings of the ACM Web Conference 2023 (WWW ’23 Companion), 2023. doi: 10.1145/3543873. 3587615

work page doi:10.1145/3543873 2023
[15]

Empathy-based counterspeech can reduce racist hate speech in a social media field experiment.Proceed- ings of the National Academy of Sciences, 118(50): e2116310118, 2021

Dominik Hangartner, Gloria Gennaro, Sary Alasiri, et al. Empathy-based counterspeech can reduce racist hate speech in a social media field experiment.Proceed- ings of the National Academy of Sciences, 118(50): e2116310118, 2021. doi: 10.1073/pnas.2116310118

work page doi:10.1073/pnas.2116310118 2021
[16]

Dehumanization: An integrative review

Nick Haslam. Dehumanization: An integrative review. Personality and Social Psychology Review, 10(3):252– 264, 2006. doi: 10.1207/s15327957pspr1003_4

work page doi:10.1207/s15327957pspr1003_4 2006
[17]

great replacement

Maik Herold. Who believes in the “great replacement”? political attitudes and democratic alienation among sup- porters of immigration-related conspiracy theories in eu- rope.Social Science Quarterly, 2025. doi: 10.1111/ssqu. 13481

work page doi:10.1111/ssqu 2025
[18]

Interpersonal disgust, ideological orientations, and dehumanization as predictors of intergroup attitudes.Psychological Science, 18(8):691–698, 2007

Gordon Hodson and Kimberly Costello. Interpersonal disgust, ideological orientations, and dehumanization as predictors of intergroup attitudes.Psychological Science, 18(8):691–698, 2007. doi: 10.1111/j.1467-9280.2007. 01962.x

work page doi:10.1111/j.1467-9280.2007 2007
[19]

Hopp, Jacob T

Frederic R. Hopp, Jacob T. Fisher, Devin Cornell, Richard Huskey, and René Weber. The extended moral foundations dictionary (eMFD): Development and appli- cations of a crowd-sourced approach to extracting moral intuitions from text.Behavior Research Methods, 53: 232–246, 2021. doi: 10.3758/s13428-020-01433-0

work page doi:10.3758/s13428-020-01433-0 2021
[20]

Multi-label hate speech and abusive language detection in indonesian twitter

Muhammad Okky Ibrohim and Indra Budi. Multi-label hate speech and abusive language detection in indonesian twitter. InProceedings of the Third Workshop on Abusive Language Online (ALW3), pages 46–57, 2019. doi: 10. 18653/v1/W19-3506. 9

2019
[21]

Walther, and Emilio Ferrara

Julie Jiang, Luca Luceri, Joseph B. Walther, and Emilio Ferrara. Social approval and network ho- mophily as motivators of online toxicity. arXiv preprint arXiv:2310.07779, 2024

arXiv 2024
[22]

Moral values underpinning COVID-19 online communication patterns

Julie Jiang, Luca Luceri, and Emilio Ferrara. Moral values underpinning COVID-19 online communication patterns. InCompanion Proceedings of the ACM Web Conference 2025 (WWW ’25 Companion), 2025. doi: 10.1145/3701716.3717538

work page doi:10.1145/3701716.3717538 2025
[23]

N. F. Johnson, R. Leahy, N. J. Restrepo, N. Velasquez, M. Zheng, P. Manrique, P. Devkota, and S. Wuchty. Hid- den resilience and adaptive dynamics of the global online hate ecology.Nature, 573(7773):261–265, 2019. doi: 10.1038/s41586-019-1494-7

work page doi:10.1038/s41586-019-1494-7 2019
[24]

Jost, and Sharareh Noor- baloochi

Matthew Kugler, John T. Jost, and Sharareh Noor- baloochi. Another look at moral foundations theory: Do authoritarianism and social dominance orientation explain liberal-conservative differences in “moral” intu- itions?Social Justice Research, 27(4):413–431, 2014. doi: 10.1007/s11211-014-0223-5

work page doi:10.1007/s11211-014-0223-5 2014
[25]

EPJ Data Science URL:https://link.springer.com/article/10.1140/epjds/s13688-025-00575-5, doi:10.1140/ epjds/s13688-025-00575-5

Kristina Lerman, Minh Duc Chu, Charles Bickham, Luca Luceri, and Emilio Ferrara. Safe spaces or toxic places? content moderation and social dynamics of online eating disorder communities.EPJ Data Science, 14(1), 2025. doi: 10.1140/epjds/s13688-025-00575-5

work page doi:10.1140/epjds/s13688-025-00575-5 2025
[26]

Facilitating fine-grained detection of chinese toxic language: Hierarchical taxon- omy, resources, and benchmarks

Junyu Lu, Bo Xu, Xiaokun Zhang, Changrong Min, Liang Yang, and Hongfei Lin. Facilitating fine-grained detection of chinese toxic language: Hierarchical taxon- omy, resources, and benchmarks. InProceedings of ACL 2023 (Long Papers), pages 16235–16250, 2023. doi: 10.18653/v1/2023.acl-long.898

work page doi:10.18653/v1/2023.acl-long.898 2023
[27]

Mackie, Thierry Devos, and Eliot R

Diane M. Mackie, Thierry Devos, and Eliot R. Smith. Intergroup emotions: Explaining offensive action ten- dencies in an intergroup context.Journal of Personal- ity and Social Psychology, 79(4):602–616, 2000. doi: 10.1037/0022-3514.79.4.602

work page doi:10.1037/0022-3514.79.4.602 2000
[28]

Thou shalt not hate: Countering online hate speech

Binny Mathew, Punyajoy Saha, Hardik Tharad, Sub- ham Rajgaria, Prajwal Singhania, Suman Kalyan Maity, Pawan Goyal, and Animesh Mukherjee. Thou shalt not hate: Countering online hate speech. InProceedings of ICWSM 2019, volume 13, pages 369–380, 2019. doi: 10.1609/icwsm.v13i01.3237

work page doi:10.1609/icwsm.v13i01.3237 2019
[29]

Modeling framing in immigration discourse on social media

Julia Mendelsohn, Ceren Budak, and David Jurgens. Modeling framing in immigration discourse on social media. InProceedings of NAACL 2021, pages 2219– 2263, 2021. doi: 10.18653/v1/2021.naacl-main.179

work page doi:10.18653/v1/2021.naacl-main.179 2021
[30]

SGH ate C heck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of S ingapore

Ri Chi Ng, Nirmalendu Prakash, Ming Shan Hee, Kenny Tsu Wei Choo, and Roy Ka-Wei Lee. SGHateCheck: Functional tests for detecting hate speech in low-resource languages of singapore. InProceedings of the 8th Work- shop on Online Abuse and Harms (WOAH 2024), pages 312–331, 2024. doi: 10.18653/v1/2024.woah-1.24

work page doi:10.18653/v1/2024.woah-1.24 2024
[31]

Moral exclusion and injustice: An intro- duction.Journal of Social Issues, 46(1):1–20, 1990

Susan Opotow. Moral exclusion and injustice: An intro- duction.Journal of Social Issues, 46(1):1–20, 1990. doi: 10.1111/j.1540-4560.1990.tb00268.x

work page doi:10.1111/j.1540-4560.1990.tb00268.x 1990
[32]

Pennebaker, Ryan L

James W. Pennebaker, Ryan L. Boyd, Kayla Jordan, and Kate Blackburn. The development and psychometric properties of LIWC2015. Technical report, University of Texas at Austin, Austin, TX, 2015

2015
[33]

2406783121

Steve Rathje, Jay J. Van Bavel, and Sander van der Lin- den. Out-group animosity drives engagement on social media.Proceedings of the National Academy of Sci- ences, 118(26):e2024292118, 2021. doi: 10.1073/pnas. 2024292118

work page doi:10.1073/pnas 2021
[34]

H ate C heck: Functional Tests for Hate Speech Detection Models

Paul Röttger, Bertie Vidgen, Dong Nguyen, Zeerak Waseem, Helen Margetts, and Janet Pierrehumbert. Hat- eCheck: Functional tests for hate speech detection mod- els. InProceedings of ACL-IJCNLP 2021 (Long Papers), pages 41–58, 2021. doi: 10.18653/v1/2021.acl-long.4

work page doi:10.18653/v1/2021.acl-long.4 2021
[35]

Multilingual HateCheck: Func- tional tests for multilingual hate speech detection mod- els

Paul Röttger, Haitham Seelawi, Debora Nozza, Zeerak Talat, and Bertie Vidgen. Multilingual HateCheck: Func- tional tests for multilingual hate speech detection mod- els. InProceedings of WOAH 2022, pages 154–169,

2022
[36]

doi: 10.18653/v1/2022.woah-1.15

work page doi:10.18653/v1/2022.woah-1.15 2022
[37]

Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, and Noah A. Smith. Annota- tors with attitudes: How annotator beliefs and identities bias toxic language detection. InProceedings of NAACL 2022, pages 5884–5906, 2022. doi: 10.18653/v1/2022. naacl-main.431

work page doi:10.18653/v1/2022 2022
[38]

Moralized lan- guage predicts hate speech on social media.PNAS Nexus, 1(5):pgac281, 2022

Kirill Solovev and Nicolas Pröllochs. Moralized lan- guage predicts hate speech on social media.PNAS Nexus, 1(5):pgac281, 2022. doi: 10.1093/pnasnexus/pgac281

work page doi:10.1093/pnasnexus/pgac281 2022
[39]

Stephan and Cookie White Stephan

Walter G. Stephan and Cookie White Stephan. An integrated threat theory of prejudice. In Stuart Os- kamp, editor,Reducing Prejudice and Discrimination, pages 23–45. Lawrence Erlbaum, 2000. doi: 10.4324/ 9781410605634-7

2000
[40]

Stephan and Cookie White Stephan

Walter G. Stephan and Cookie White Stephan. Inter- group threats. In Chris G. Sibley and Fiona Kate Barlow, editors,The Cambridge Handbook of the Psychology of Prejudice, pages 131–148. Cambridge University Press,
[41]

doi: 10.1017/9781316161579.007

work page doi:10.1017/9781316161579.007
[42]

Henri Tajfel and John C. Turner. An integrative theory of intergroup conflict. InThe Social Psychology of Inter- group Relations, pages 33–47. Brooks/Cole, 1979. doi: 10.4324/9780203505984-16

work page doi:10.4324/9780203505984-16 1979
[43]

Detecting east asian prejudice on social media

Bertie Vidgen, Austin Botelho, David Broniatowski, Ella Guest, Matthew Hall, Helen Margetts, Rebekah Tromble, Zeerak Waseem, and Scott Hale. Detecting east asian prejudice on social media. InProceedings of the 4th Workshop on Online Abuse and Harms (WOAH), pages 162–172, 2020. doi: 10.18653/v1/2020.alw-1.19. 10

work page doi:10.18653/v1/2020.alw-1.19 2020

[1] [1]

Cassola, M

Mohammad Atari, Jesse Graham, and Morteza De- hghani. Foundations of morality in iran.Evolution and Human Behavior, 41(5):367–384, 2020. doi: 10.1016/j. evolhumbehav.2020.07.014

work page doi:10.1016/j 2020

[2] [2]

The effects of binding moral foundations on prejudiced attitudes to- ward migrants: The mediation role of perceived realistic and symbolic threats.Genealogy, 7(3):65, 2023

Federica Bianco and Ankica Kosic. The effects of binding moral foundations on prejudiced attitudes to- ward migrants: The mediation role of perceived realistic and symbolic threats.Genealogy, 7(3):65, 2023. doi: 10.3390/genealogy7030065

work page doi:10.3390/genealogy7030065 2023

[3] [3]

Brady, Julian A

William J. Brady, Julian A. Wills, John T. Jost, Joshua A. Tucker, and Jay J. Van Bavel. Emotion shapes the diffu- sion of moralized content in social networks.Proceed- ings of the National Academy of Sciences, 114(28):7313– 7318, 2017. doi: 10.1073/pnas.1618923114

work page doi:10.1073/pnas.1618923114 2017

[4] [4]

Brady, Killian McLoughlin, Tuan N

William J. Brady, Killian McLoughlin, Tuan N. Doan, and Molly J. Crockett. How social learning amplifies moral outrage expression in online social networks.Sci- ence Advances, 7(33):eabe5641, 2021. doi: 10.1126/ sciadv.abe5641

2021

[5] [5]

Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations

Aida Mostafazadeh Davani, Mark Díaz, and Vinodkumar Prabhakaran. Dealing with disagreements: Looking be- yond the majority vote in subjective annotations.Trans- actions of the Association for Computational Linguistics, 10:92–110, 2022. doi: 10.1162/tacl_a_00449

work page doi:10.1162/tacl_a_00449 2022

[6] [6]

Automated hate speech detection and the problem of offensive language

Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. Automated hate speech detection and the problem of offensive language. InProceedings of ICWSM 2017, volume 11, pages 512–515, 2017. doi: 10.1609/icwsm.v11i1.14955

work page doi:10.1609/icwsm.v11i1.14955 2017

[7] [7]

COLD: A bench- mark for chinese offensive language detection

Jiawen Deng, Jingyan Zhou, Hao Sun, Chujie Zheng, Fei Mi, Helen Meng, and Minlie Huang. COLD: A bench- mark for chinese offensive language detection. InPro- ceedings of EMNLP 2022, pages 11580–11599, 2022. doi: 10.18653/v1/2022.emnlp-main.796

work page doi:10.18653/v1/2022.emnlp-main.796 2022

[8] [8]

Can LLMs express person- ality across cultures? introducing CulturalPersonas for evaluating trait alignment

Priyanka Dey, Aayush Bothra, Yugal Khanter, Jieyu Zhao, and Emilio Ferrara. Can LLMs express person- ality across cultures? introducing CulturalPersonas for evaluating trait alignment. InFindings of the Association for Computational Linguistics: EMNLP 2025, 2025. doi: 10.18653/v1/2025.findings-emnlp.1101

work page doi:10.18653/v1/2025.findings-emnlp.1101 2025

[9] [9]

Latent Hatred: A Benchmark for Understanding Implicit Hate Speech

Mai ElSherief, Caleb Ziems, David Nguyen, Soumya Roy, Diba Wang, and Diyi Yang. Latent hatred: A bench- mark for understanding implicit hate speech. InPro- ceedings of EMNLP 2021, pages 345–363, 2021. doi: 10.18653/v1/2021.emnlp-main.29

work page doi:10.18653/v1/2021.emnlp-main.29 2021

[10] [10]

Im- pact and dynamics of hate and counter speech online

Joshua Garland, Keyan Ghazi-Zahedi, Jean-Gabriel Young, Laurent Hébert-Dufresne, and Mirta Galesic. Im- pact and dynamics of hate and counter speech online. EPJ Data Science, 11:3, 2022. doi: 10.1140/epjds/ s13688-021-00314-6

work page doi:10.1140/epjds/ 2022

[11] [12]

Jesse Graham, Jonathan Haidt, and Brian A. Nosek. Lib- erals and conservatives rely on different sets of moral foundations.Journal of Personality and Social Psychol- ogy, 96(5):1029–1046, 2009. doi: 10.1037/a0015141

work page doi:10.1037/a0015141 2009

[12] [13]

Nosek, Jonathan Haidt, Ravi Iyer, Spassena Koleva, and Peter H

Jesse Graham, Brian A. Nosek, Jonathan Haidt, Ravi Iyer, Spassena Koleva, and Peter H. Ditto. Mapping the moral domain.Journal of Personality and Social Psy- chology, 101(2):366–385, 2011. doi: 10.1037/a0021847

work page doi:10.1037/a0021847 2011

[13] [14]

Detecting social me- dia manipulation in low-resource languages

Samar Haider, Luca Luceri, Ashok Deb, Adam Badawy, Nanyun Peng, and Emilio Ferrara. Detecting social me- dia manipulation in low-resource languages. InCom- panion Proceedings of the ACM Web Conference 2023 (WWW ’23 Companion), 2023. doi: 10.1145/3543873. 3587615

work page doi:10.1145/3543873 2023

[14] [15]

Empathy-based counterspeech can reduce racist hate speech in a social media field experiment.Proceed- ings of the National Academy of Sciences, 118(50): e2116310118, 2021

Dominik Hangartner, Gloria Gennaro, Sary Alasiri, et al. Empathy-based counterspeech can reduce racist hate speech in a social media field experiment.Proceed- ings of the National Academy of Sciences, 118(50): e2116310118, 2021. doi: 10.1073/pnas.2116310118

work page doi:10.1073/pnas.2116310118 2021

[15] [16]

Dehumanization: An integrative review

Nick Haslam. Dehumanization: An integrative review. Personality and Social Psychology Review, 10(3):252– 264, 2006. doi: 10.1207/s15327957pspr1003_4

work page doi:10.1207/s15327957pspr1003_4 2006

[16] [17]

great replacement

Maik Herold. Who believes in the “great replacement”? political attitudes and democratic alienation among sup- porters of immigration-related conspiracy theories in eu- rope.Social Science Quarterly, 2025. doi: 10.1111/ssqu. 13481

work page doi:10.1111/ssqu 2025

[17] [18]

Interpersonal disgust, ideological orientations, and dehumanization as predictors of intergroup attitudes.Psychological Science, 18(8):691–698, 2007

Gordon Hodson and Kimberly Costello. Interpersonal disgust, ideological orientations, and dehumanization as predictors of intergroup attitudes.Psychological Science, 18(8):691–698, 2007. doi: 10.1111/j.1467-9280.2007. 01962.x

work page doi:10.1111/j.1467-9280.2007 2007

[18] [19]

Hopp, Jacob T

Frederic R. Hopp, Jacob T. Fisher, Devin Cornell, Richard Huskey, and René Weber. The extended moral foundations dictionary (eMFD): Development and appli- cations of a crowd-sourced approach to extracting moral intuitions from text.Behavior Research Methods, 53: 232–246, 2021. doi: 10.3758/s13428-020-01433-0

work page doi:10.3758/s13428-020-01433-0 2021

[19] [20]

Multi-label hate speech and abusive language detection in indonesian twitter

Muhammad Okky Ibrohim and Indra Budi. Multi-label hate speech and abusive language detection in indonesian twitter. InProceedings of the Third Workshop on Abusive Language Online (ALW3), pages 46–57, 2019. doi: 10. 18653/v1/W19-3506. 9

2019

[20] [21]

Walther, and Emilio Ferrara

Julie Jiang, Luca Luceri, Joseph B. Walther, and Emilio Ferrara. Social approval and network ho- mophily as motivators of online toxicity. arXiv preprint arXiv:2310.07779, 2024

arXiv 2024

[21] [22]

Moral values underpinning COVID-19 online communication patterns

Julie Jiang, Luca Luceri, and Emilio Ferrara. Moral values underpinning COVID-19 online communication patterns. InCompanion Proceedings of the ACM Web Conference 2025 (WWW ’25 Companion), 2025. doi: 10.1145/3701716.3717538

work page doi:10.1145/3701716.3717538 2025

[22] [23]

N. F. Johnson, R. Leahy, N. J. Restrepo, N. Velasquez, M. Zheng, P. Manrique, P. Devkota, and S. Wuchty. Hid- den resilience and adaptive dynamics of the global online hate ecology.Nature, 573(7773):261–265, 2019. doi: 10.1038/s41586-019-1494-7

work page doi:10.1038/s41586-019-1494-7 2019

[23] [24]

Jost, and Sharareh Noor- baloochi

Matthew Kugler, John T. Jost, and Sharareh Noor- baloochi. Another look at moral foundations theory: Do authoritarianism and social dominance orientation explain liberal-conservative differences in “moral” intu- itions?Social Justice Research, 27(4):413–431, 2014. doi: 10.1007/s11211-014-0223-5

work page doi:10.1007/s11211-014-0223-5 2014

[24] [25]

EPJ Data Science URL:https://link.springer.com/article/10.1140/epjds/s13688-025-00575-5, doi:10.1140/ epjds/s13688-025-00575-5

Kristina Lerman, Minh Duc Chu, Charles Bickham, Luca Luceri, and Emilio Ferrara. Safe spaces or toxic places? content moderation and social dynamics of online eating disorder communities.EPJ Data Science, 14(1), 2025. doi: 10.1140/epjds/s13688-025-00575-5

work page doi:10.1140/epjds/s13688-025-00575-5 2025

[25] [26]

Facilitating fine-grained detection of chinese toxic language: Hierarchical taxon- omy, resources, and benchmarks

Junyu Lu, Bo Xu, Xiaokun Zhang, Changrong Min, Liang Yang, and Hongfei Lin. Facilitating fine-grained detection of chinese toxic language: Hierarchical taxon- omy, resources, and benchmarks. InProceedings of ACL 2023 (Long Papers), pages 16235–16250, 2023. doi: 10.18653/v1/2023.acl-long.898

work page doi:10.18653/v1/2023.acl-long.898 2023

[26] [27]

Mackie, Thierry Devos, and Eliot R

Diane M. Mackie, Thierry Devos, and Eliot R. Smith. Intergroup emotions: Explaining offensive action ten- dencies in an intergroup context.Journal of Personal- ity and Social Psychology, 79(4):602–616, 2000. doi: 10.1037/0022-3514.79.4.602

work page doi:10.1037/0022-3514.79.4.602 2000

[27] [28]

Thou shalt not hate: Countering online hate speech

Binny Mathew, Punyajoy Saha, Hardik Tharad, Sub- ham Rajgaria, Prajwal Singhania, Suman Kalyan Maity, Pawan Goyal, and Animesh Mukherjee. Thou shalt not hate: Countering online hate speech. InProceedings of ICWSM 2019, volume 13, pages 369–380, 2019. doi: 10.1609/icwsm.v13i01.3237

work page doi:10.1609/icwsm.v13i01.3237 2019

[28] [29]

Modeling framing in immigration discourse on social media

Julia Mendelsohn, Ceren Budak, and David Jurgens. Modeling framing in immigration discourse on social media. InProceedings of NAACL 2021, pages 2219– 2263, 2021. doi: 10.18653/v1/2021.naacl-main.179

work page doi:10.18653/v1/2021.naacl-main.179 2021

[29] [30]

SGH ate C heck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of S ingapore

Ri Chi Ng, Nirmalendu Prakash, Ming Shan Hee, Kenny Tsu Wei Choo, and Roy Ka-Wei Lee. SGHateCheck: Functional tests for detecting hate speech in low-resource languages of singapore. InProceedings of the 8th Work- shop on Online Abuse and Harms (WOAH 2024), pages 312–331, 2024. doi: 10.18653/v1/2024.woah-1.24

work page doi:10.18653/v1/2024.woah-1.24 2024

[30] [31]

Moral exclusion and injustice: An intro- duction.Journal of Social Issues, 46(1):1–20, 1990

Susan Opotow. Moral exclusion and injustice: An intro- duction.Journal of Social Issues, 46(1):1–20, 1990. doi: 10.1111/j.1540-4560.1990.tb00268.x

work page doi:10.1111/j.1540-4560.1990.tb00268.x 1990

[31] [32]

Pennebaker, Ryan L

James W. Pennebaker, Ryan L. Boyd, Kayla Jordan, and Kate Blackburn. The development and psychometric properties of LIWC2015. Technical report, University of Texas at Austin, Austin, TX, 2015

2015

[32] [33]

2406783121

Steve Rathje, Jay J. Van Bavel, and Sander van der Lin- den. Out-group animosity drives engagement on social media.Proceedings of the National Academy of Sci- ences, 118(26):e2024292118, 2021. doi: 10.1073/pnas. 2024292118

work page doi:10.1073/pnas 2021

[33] [34]

H ate C heck: Functional Tests for Hate Speech Detection Models

Paul Röttger, Bertie Vidgen, Dong Nguyen, Zeerak Waseem, Helen Margetts, and Janet Pierrehumbert. Hat- eCheck: Functional tests for hate speech detection mod- els. InProceedings of ACL-IJCNLP 2021 (Long Papers), pages 41–58, 2021. doi: 10.18653/v1/2021.acl-long.4

work page doi:10.18653/v1/2021.acl-long.4 2021

[34] [35]

Multilingual HateCheck: Func- tional tests for multilingual hate speech detection mod- els

Paul Röttger, Haitham Seelawi, Debora Nozza, Zeerak Talat, and Bertie Vidgen. Multilingual HateCheck: Func- tional tests for multilingual hate speech detection mod- els. InProceedings of WOAH 2022, pages 154–169,

2022

[35] [36]

doi: 10.18653/v1/2022.woah-1.15

work page doi:10.18653/v1/2022.woah-1.15 2022

[36] [37]

Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, and Noah A. Smith. Annota- tors with attitudes: How annotator beliefs and identities bias toxic language detection. InProceedings of NAACL 2022, pages 5884–5906, 2022. doi: 10.18653/v1/2022. naacl-main.431

work page doi:10.18653/v1/2022 2022

[37] [38]

Moralized lan- guage predicts hate speech on social media.PNAS Nexus, 1(5):pgac281, 2022

Kirill Solovev and Nicolas Pröllochs. Moralized lan- guage predicts hate speech on social media.PNAS Nexus, 1(5):pgac281, 2022. doi: 10.1093/pnasnexus/pgac281

work page doi:10.1093/pnasnexus/pgac281 2022

[38] [39]

Stephan and Cookie White Stephan

Walter G. Stephan and Cookie White Stephan. An integrated threat theory of prejudice. In Stuart Os- kamp, editor,Reducing Prejudice and Discrimination, pages 23–45. Lawrence Erlbaum, 2000. doi: 10.4324/ 9781410605634-7

2000

[39] [40]

Stephan and Cookie White Stephan

Walter G. Stephan and Cookie White Stephan. Inter- group threats. In Chris G. Sibley and Fiona Kate Barlow, editors,The Cambridge Handbook of the Psychology of Prejudice, pages 131–148. Cambridge University Press,

[40] [41]

doi: 10.1017/9781316161579.007

work page doi:10.1017/9781316161579.007

[41] [42]

Henri Tajfel and John C. Turner. An integrative theory of intergroup conflict. InThe Social Psychology of Inter- group Relations, pages 33–47. Brooks/Cole, 1979. doi: 10.4324/9780203505984-16

work page doi:10.4324/9780203505984-16 1979

[42] [43]

Detecting east asian prejudice on social media

Bertie Vidgen, Austin Botelho, David Broniatowski, Ella Guest, Matthew Hall, Helen Margetts, Rebekah Tromble, Zeerak Waseem, and Scott Hale. Detecting east asian prejudice on social media. InProceedings of the 4th Workshop on Online Abuse and Harms (WOAH), pages 162–172, 2020. doi: 10.18653/v1/2020.alw-1.19. 10

work page doi:10.18653/v1/2020.alw-1.19 2020