arxiv: 2604.16654 · v2 · submitted 2026-04-17 · 💻 cs.CL

Recognition: unknown

IYKYK (But AI Doesn't): Automated Content Moderation Does Not Capture Communities' Heterogeneous Attitudes Towards Reclaimed Language

Christina Chance , Rebecca Pattichis , Arjun Subramonian , James He , Shruti Narayanan , Saadia Gabriel , Kai-Wei Chang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:16 UTC · model grok-4.3

classification 💻 cs.CL

keywords reclaimed slurshate speech detectioncontent moderationinter-annotator agreementmarginalized communitiesPerspective APIonline discoursecontextual interpretation

0 comments

The pith

Automated hate speech detectors align poorly with how members of LGBTQIA+, Black, and women communities judge reclaimed slur usage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates attitudes toward reclaimed slurs in online spaces among LGBTQIA+, Black, and women communities. It finds that in-group annotators disagree substantially on whether specific texts should be flagged as hate speech, even when using slurs like the f-word, n-word, or b-word in potentially reclaimed contexts. Automated tools such as Perspective API show poor alignment with these human judgments. This mismatch can lead to the suppression of meaningful community expressions of solidarity and identity. The work emphasizes the contextual and subjective nature of interpreting such language online.

Core claim

Through an annotated corpus of online slur usages collected from community members, the authors demonstrate low inter-annotator agreement across all groups and questions, coupled with weak correspondence to Perspective API scores. Features such as whether the slur was used derogatorily or targeted at the speaker influence annotator decisions, but overall, the lack of clear identity and intent signals leads to varied interpretations even among in-group members.

What carries the argument

The annotated corpus of slur-containing texts, paired with annotator judgments on hate speech flagging and contextual features, compared against automated assessments from Perspective API.

If this is right

Annotators report texts as hate speech more often when slur usage is derogatory or targeted at oneself.
Low inter-annotator agreement persists across communities, showing disagreement even within groups.
Automated assessments fail to distinguish reclaimed uses, potentially suppressing marginalized voices.
Personal history and lived experience contribute to differences in interpretation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Moderation systems may need to incorporate community-specific training data or user-reported context to better handle reclaimed language.
Future tools could allow users to signal reclaimed intent, reducing over-flagging of in-group speech.
Policy for platforms should account for the heterogeneity rather than assuming uniform community standards.

Load-bearing premise

That the attitudes of the recruited social media users accurately represent the stable views of their broader communities despite variations in personal experience.

What would settle it

A study with a larger sample of annotators from the same communities showing high agreement on the same set of texts would indicate that the observed disagreement is not representative.

Figures

Figures reproduced from arXiv: 2604.16654 by Arjun Subramonian, Christina Chance, James He, Kai-Wei Chang, Rebecca Pattichis, Saadia Gabriel, Shruti Narayanan.

**Figure 1.** Figure 1: XGBoost feature importance measured by weight, showing the raw frequency each feature was used to split nodes across all [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗

**Figure 2.** Figure 2: Heatmap depicting the raw annotation counts for a text’s secondary salient context, disaggregated by its annotated primary [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Alignment between human consensus and Perspective API varies by slur. Each tweet is represented as a pair of points sharing [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Distributional comparisons between the probability of a human annotator labeling the text as hate speech (conditioned on the [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗

read the original abstract

Reclaimed slur usage is a common and meaningful practice online for many marginalized communities. It serves as a source of solidarity, identity, and shared experience. However, contemporary automated and AI-based moderation tools for online content largely fail to distinguish between reclaimed and hateful uses of slurs, resulting in the suppression of marginalized voices. In this work, we use quantitative and qualitative methods to examine the attitudes of social media users in LGBTQIA+, Black, and women communities around reclaimed slurs targeting our focus groups including the f-word, n-word, and b-word. With social media users from these communities, we collect and analyze an annotated online slur usage corpus. The corpus includes annotators' perceptions of whether an online text containing a slur should be flagged as hate speech, as well as contextual features of the slur usage. Across all communities and annotation questions, we observe low inter-annotator agreement, indicating substantial disagreement among in-group annotators. This is compounded by the fact that, absent clear contextual signals of identity and intent, even in-group members may disagree on how to interpret reclaimed slur usage online. Semi-structured interviews with annotators suggest that differences in lived experience and personal history contribute to this variation as well. We find poor alignment between annotator judgments and automated hate speech assessments produced by Perspective API. We further observe that certain features of a text such as whether the slur usage was derogatory and if the slur was targeted at oneself are more associated with whether annotators report the text as hate speech. Together, these findings highlight the inherent subjectivity and contextual nature of how marginalized communities interpret slurs online.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows low agreement among community annotators on when reclaimed slurs count as hate speech plus clear mismatch with Perspective API, but the recruited sample's fit to broader groups is unverified.

read the letter

The main takeaway is that in-group annotators disagree substantially on flagging reclaimed slur use and that Perspective API tracks their judgments poorly. Certain features like derogatory tone or self-targeting correlate with higher flagging rates, and interviews point to lived experience as one source of the variation. This is backed by a new annotated corpus drawn from online texts and direct API comparisons on the same items.

Referee Report

3 major / 2 minor

Summary. The paper claims that automated hate speech detection tools like Perspective API fail to capture heterogeneous attitudes toward reclaimed slur usage (f-word, n-word, b-word) in LGBTQIA+, Black, and women communities. It supports this via a newly collected annotated corpus from in-group social media users, reporting low inter-annotator agreement across communities and questions, poor alignment between annotator judgments and API scores, associations between features like derogatory tone or self-targeting and hate speech perceptions, and interview insights linking variation to differences in lived experience.

Significance. If the empirical findings hold after addressing methodological gaps, the work would be significant for NLP and content moderation research. It supplies direct evidence that community attitudes toward reclaimed language are context-dependent and non-monolithic, with implications for why current automated systems over-flag or under-flag such content. The mixed-methods approach (annotations plus interviews) and focus on specific slurs and communities add concrete data to ongoing debates about inclusive moderation design.

major comments (3)

[Annotation Collection / Methods] Annotation Collection / Methods: The paper recruits social media users via self-identification with the target communities but provides no sample sizes, demographic stratification (age, region, usage frequency), verification of in-group status, or comparison to population benchmarks. This is load-bearing for the central claim, as the reported low IAA and API misalignment could be artifacts of unrepresentative sampling rather than inherent community heterogeneity (see skeptic note on recruitment).
[Results] Results: The abstract and main text assert 'low inter-annotator agreement' and 'substantial disagreement' without reporting quantitative metrics (e.g., Krippendorff's alpha, Fleiss' kappa), per-question or per-community values, or statistical tests. This leaves the robustness of the heterogeneity conclusion unclear and prevents readers from assessing whether agreement is truly low or merely moderate.
[API Comparison / Results] API Comparison / Results: The claim of 'poor alignment' between annotator judgments and Perspective API is central but lacks specifics on the alignment metric (correlation, thresholded accuracy, etc.), the exact API output used, or any baseline comparisons. Without these, it is hard to judge the magnitude or generalizability of the misalignment finding.

minor comments (2)

[Abstract] Abstract: Lacks any numerical details on sample size, agreement scores, or effect sizes, which would allow immediate evaluation of the strength of the reported findings.
[Discussion] Discussion: The feature associations (derogatory use, self-targeting) are noted but could be strengthened by reporting odds ratios or regression coefficients to quantify their predictive power.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which has helped us identify areas for improvement in clarity and reporting. We address each major comment below, indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Annotation Collection / Methods] The paper recruits social media users via self-identification with the target communities but provides no sample sizes, demographic stratification (age, region, usage frequency), verification of in-group status, or comparison to population benchmarks. This is load-bearing for the central claim, as the reported low IAA and API misalignment could be artifacts of unrepresentative sampling rather than inherent community heterogeneity.

Authors: We agree that providing more details on recruitment will strengthen transparency. In the revised manuscript, we will add sample sizes for annotators per community, available demographic information (age ranges and regions), and explicit description of the self-identification process. We will also add a limitations discussion on the lack of population benchmarks. However, the observed heterogeneity is supported by within-sample variation and interview data on lived experience, which indicate it is not solely an artifact of sampling. revision: partial
Referee: [Results] The abstract and main text assert 'low inter-annotator agreement' and 'substantial disagreement' without reporting quantitative metrics (e.g., Krippendorff's alpha, Fleiss' kappa), per-question or per-community values, or statistical tests.

Authors: We apologize for not highlighting the metrics in the abstract and summary. The results section reports Krippendorff's alpha per community and question (indicating poor agreement). We will revise the abstract and results to explicitly include these values, per-question breakdowns, and relevant statistical tests to allow readers to assess the robustness of the heterogeneity finding. revision: yes
Referee: [API Comparison / Results] The claim of 'poor alignment' between annotator judgments and Perspective API is central but lacks specifics on the alignment metric (correlation, thresholded accuracy, etc.), the exact API output used, or any baseline comparisons.

Authors: We agree more precise reporting is needed. In the revision, we will specify the alignment metrics (e.g., correlation with annotator judgments and accuracy at standard thresholds), confirm use of the toxicity score, and add baseline comparisons (e.g., against majority vote or random) to better contextualize the misalignment magnitude. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical study relies on newly collected annotations and external API comparison

full rationale

The paper presents an observational analysis of annotator judgments on slur usage collected from recruited social media users, reports inter-annotator agreement statistics, and directly compares those judgments to Perspective API outputs. No equations, fitted parameters, or predictions are defined in terms of the target results. No self-citation chain is invoked to justify the core claims about heterogeneity or misalignment; the findings rest on the primary data collection and standard agreement metrics. The sampling assumption (representativeness of recruited users) is a potential external-validity concern but does not create internal circularity in the reported derivation or results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that in-group annotators can reliably surface heterogeneous community attitudes and that the selected contextual features adequately capture the relevant interpretive signals.

axioms (1)

domain assumption Annotators drawn from the target communities provide valid and informative perspectives on reclaimed slur usage
The study design and conclusions depend on treating the collected annotations as representative of community-level attitudes.

pith-pipeline@v0.9.0 · 5620 in / 1293 out tokens · 38316 ms · 2026-05-10T08:16:24.826752+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

96 extracted references · 69 canonical work pages

[1]

Hala Al Kuwatly, Maximilian Wich, and Georg Groh. 2020. Identifying and Measuring Annotator Bias Based on Annotators’ Demographic Characteristics. InProceedings of the Fourth Workshop on Online Abuse and Harms, Seyi Akiwowo, Bertie Vidgen, Vinodkumar Prabhakaran, and Zeerak Waseem (Eds.). Association for Computational Linguistics, Online, 184–190. doi:10....

work page doi:10.18653/v1/2020.alw-1.21 2020
[2]

2023.We Asked People Who Used to Make Homophobic Comments What Their Turning Point Was

David Allegretti. 2023.We Asked People Who Used to Make Homophobic Comments What Their Turning Point Was. Vice. https://www.vice.com/en/ article/we-asked-people-who-used-to-make-homophobic-comments-what-their-turning-point-was/

2023
[3]

Luvell Anderson and Ernie Lepore. 2013. Slurring Words.Noûs47, 1 (2013), 25–48. doi:10.1111/j.1468-0068.2010.00820.x

work page doi:10.1111/j.1468-0068.2010.00820.x 2013
[4]

Luvell Anderson and Ernie Lepore. 2013. What Did You Call Me? Slurs as Prohibited Words.Analytic Philosophy54, 3 (2013), 350–363. doi:10.1111/ phib.12023

2013
[5]

Carolina Are, Catherine Talbot, and Pam Briggs. 2025. Social media affordances of LGBTQIA+ expression and community formation.Convergence 31, 4 (2025), 1401–1422. arXiv:https://doi.org/10.1177/13548565241296628 doi:10.1177/13548565241296628

work page doi:10.1177/13548565241296628 2025
[6]

2007.Nigger vs

Jabari Asim. 2007.Nigger vs. Nigga. Houghton Mifflin Harcourt, Boston, MA, USA, 212–257

2007
[7]

Giuseppe Attanasio, Debora Nozza, Dirk Hovy, and Elena Baralis. 2022. Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists. InFindings of the Association for Computational Linguistics: ACL 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, Dublin, Ireland, 1105–11...

work page doi:10.18653/v1/2022.findings-acl.88 2022
[8]

Joris Baan, Wilker Aziz, Barbara Plank, and Raquel Fernandez. 2022. Stop Measuring Calibration When Humans Disagree. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 1892–1915. doi:10....

work page doi:10.18653/v1/2022.emnlp-main.124 2022
[9]

Hritik Bansal, Ashima Suvarna, Gantavya Bhatt, Nanyun Peng, Kai-Wei Chang, and Aditya Grover. 2025. Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization. arXiv:2404.00530 [cs.CL] https://arxiv.org/abs/2404.00530

work page arXiv 2025
[10]

Valerio Basile, Michael Fell, Tommaso Fornaciari, Dirk Hovy, Silviu Paun, Barbara Plank, Massimo Poesio, and Alexandra Uma. 2021. We Need to Consider Disagreement in Evaluation. InProceedings of the 1st Workshop on Benchmarking: Past, Present and Future, Kenneth Church, Mark Liberman, and Valia Kordoni (Eds.). Association for Computational Linguistics, On...

work page doi:10.18653/v1/2021.bppf-1.3 2021
[11]

Claudia Bianchi. 2014. Slurs and appropriation: An echoic account.Journal of Pragmatics66 (2014), 35–44. doi:10.1016/j.pragma.2014.02.009

work page doi:10.1016/j.pragma.2014.02.009 2014
[12]

Laura Biester, Vanita Sharma, Ashkan Kazemi, Naihao Deng, Steven Wilson, and Rada Mihalcea. 2022. Analyzing the Effects of Annotator Gender across NLP Tasks. InProceedings of the 1st Workshop on Perspectivist Approaches to NLP LREC2022, Gavin Abercrombie, Valerio Basile, Sara Tonelli, Verena Rieser, and Alexandra Uma (Eds.). European Language Resources As...

2022
[13]

BITCHSTIX. 2026. BITCHSTIX: A Balm for Every Body. https://bitchstix.com/

2026
[14]

Black in AI. 2017. Black in AI. https://www.blackinai.org/

2017
[15]

As long as we stick together everything is going to be OK

Laura Bradley. 2023.“As long as we stick together everything is going to be OK”: An Examination of Aotearoa’s Far-Right Subculture on 4chan.Master’s thesis. Victoria University of Wellington. doi:10.26686/wgtn.24871797

work page doi:10.26686/wgtn.24871797 2023
[16]

John P. Brammer. 2019.A Decade of LGBTQ Pop Culture Visibility but Stalled Political Progress. NBC News. https://www.nbcnews.com/think/ opinion/decade-lgbtq-pop-culture-visibility-stalled-political-progress-ncna1108786

2019
[17]

They Edited Out her Nip Nops

Kendra Calhoun and Alexia Fawcett. 2023. “They Edited Out her Nip Nops”: Linguistic Innovation as Textual Censorship Avoidance on TikTok. LanguageInternet21 (Dec. 2023), 1–30. doi:10.14434/li.v21.37371

work page doi:10.14434/li.v21.37371 2023
[18]

Santiago Castro. 2017. Fast Krippendorff: Fast computation of Krippendorff’s alpha agreement measure. https://github.com/pln-fing-udelar/fast- krippendorff

2017
[19]

Bianca Cepollaro. 2021. The moral status of the reclamation of slurs.Organon F28, 3 (2021), 672–688

2021
[20]

Bianca Cepollaro and Dan López de Sa. 2022. Who Reclaims Slurs?Pacific Philosophical Quarterly103, 3 (2022), 606–619. doi:10.1111/papq.12403

work page doi:10.1111/papq.12403 2022
[21]

Amanda Cercas Curry, Gavin Abercrombie, and Zeerak Talat. 2024. Subjective Isms? On the Danger of Conflating Hate and Offence in Abusive Language Detection. InProceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024), Yi-Ling Chung, Zeerak Talat, Debora Nozza, Flor Miriam Plaza-del Arco, Paul Röttger, Aida Mostafazadeh Davani, and Agostina Cal...

work page doi:10.18653/v1/2024.woah-1.22 2024
[22]

Christina Chance and Kai-Wei Chang. 2025. Re-imagining Virtual Communities: Ethical Guidelines for Studying Black Twitter.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society8, 1 (Oct. 2025), 541–553. doi:10.1609/aies.v8i1.36569

work page doi:10.1609/aies.v8i1.36569 2025
[23]

Thomas Davidson, Debasmita Bhattacharya, and Ingmar Weber. 2019. Racial Bias in Hate Speech and Abusive Language Detection Datasets. In Proceedings of the Third Workshop on Abusive Language Online, Sarah T. Roberts, Joel Tetreault, Vinodkumar Prabhakaran, and Zeerak Waseem (Eds.). Association for Computational Linguistics, Florence, Italy, 25–35. doi:10.1...

work page doi:10.18653/v1/w19-3504 2019
[24]

Naihao Deng, Xinliang Zhang, Siyang Liu, Winston Wu, Lu Wang, and Rada Mihalcea. 2023. You Are What You Annotate: Towards Better Models through Annotator Representations. InFindings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 12475–124...

work page doi:10.18653/v1/2023.findings-emnlp.832 2023
[25]

Remi Denton, Mark Díaz, Ian Kivlichan, Vinodkumar Prabhakaran, and Rachel Rosen. 2021. Whose Ground Truth? Accounting for Individual and Collective Identities Underlying Dataset Annotation. arXiv:2112.04554 [cs.LG] https://arxiv.org/abs/2112.04554 18 Chance et al

work page arXiv 2021
[26]

Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2018. Measuring and Mitigating Unintended Bias in Text Classification. InProceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society(New Orleans, LA, USA)(AIES ’18). Association for Computing Machinery, New York, NY, USA, 67–73. doi:10.1145/3278721.3278729

work page doi:10.1145/3278721.3278729 2018
[27]

Rebecca Dorn, Lee Kezar, Fred Morstatter, and Kristina Lerman. 2024. Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias. InProceedings of the 4th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization(San Luis Potosi, Mexico) (EAAMO ’24). Association for Computing Machinery, New York, NY, USA, Article 6...

work page doi:10.1145/3689904.3694704 2024
[28]

Lia Draetta, Chiara Ferrando, Marco Cuccarini, Liam James, and Viviana Patti. 2024. ReCLAIM Project: Exploring Italian Slurs Reappropriation with Large Language Models. InProceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), Felice Dell’Orletta, Alessandro Lenci, Simonetta Montemagni, and Rachele Sprugnoli (Eds.). CEUR Wo...

2024
[29]

Shangbin Feng, Taylor Sorensen, Yuhan Liu, Jillian Fisher, Chan Young Park, Yejin Choi, and Yulia Tsvetkov. 2024. Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computati...

work page doi:10.18653/v1/2024.emnlp- 2024
[30]

Eve Fleisig, Rediet Abebe, and Dan Klein. 2023. When the Majority is Wrong: Modeling Annotator Disagreement for Subjective Tasks. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 6715–6726. doi:10.18653/v1/2023.e...

work page doi:10.18653/v1/2023.emnlp-main.415 2023
[31]

Eve Fleisig, Su Lin Blodgett, Dan Klein, and Zeerak Talat. 2024. The Perspectivist Paradigm Shift: Assumptions and Challenges of Capturing Human Labels. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Kevin Duh, Helena Gomez, and Steve...

work page doi:10.18653/v1/2024.naacl-long.126 2024
[32]

Black Twitter

Sarah Florini. 2014. Tweets, Tweeps, and Signifyin’: Communication and Cultural Performance on “Black Twitter”.Television & New Media15, 3 (2014), 223–237. arXiv:https://doi.org/10.1177/1527476413480247 doi:10.1177/1527476413480247

work page doi:10.1177/1527476413480247 2014
[33]

Chiara Francesconi, Cristina Bosco, Fabio Poletto, and Manuela Sanguinetti. 2019. Error Analysis in a Hate Speech Detection Task: The Case of HaSpeeDe-TW at EVALITA 2018. InProceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019), Raffaella Bernardi, Roberto Navigli, and Giovanni Semeraro (Eds.). CEUR Workshop Proceedings, B...

2019
[34]

Adam D Galinsky, Kurt Hugenberg, Carla Groom, and Galen V Bodenhausen. 2003. The reappropriation of stigmatizing labels: Implications for soical identity. InIdentity Issues in Groups. Emerald Group Publishing Limited, Leeds, UK. doi:10.1016/S1534-0856(02)05009-0

work page doi:10.1016/s1534-0856(02)05009-0 2003
[35]

2026.NJ cops who collected millions in salary while suspended for using the N-word can finally be fired: judge rules

Shane Galvin. 2026.NJ cops who collected millions in salary while suspended for using the N-word can finally be fired: judge rules. New York Post. https: //nypost.com/2026/01/03/us-news/nj-cops-who-collected-millions-in-salary-while-suspended-for-using-the-n-word-can-be-fired-judge-rules/

2026
[36]

2021.Intel’s Dystopian Anti-Harassment AI Lets Users Opt In for ‘Some’ Racism

Matthew Gault. 2021.Intel’s Dystopian Anti-Harassment AI Lets Users Opt In for ‘Some’ Racism. Vice. https://www.vice.com/en/article/intels- dystopian-anti-harassment-ai-lets-users-opt-in-for-some-racism Accessed: 2026-01-09

2021
[37]

Nitesh Goyal, Ian Kivlichan, Rachel Rosen, and Lucy Vasserman. 2022. Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation. arXiv:2205.00501 [cs.HC] https://arxiv.org/abs/2205.00501

work page arXiv 2022
[38]

Kivlichan, Rachel Rosen, and Lucy Vasserman

Nitesh Goyal, Ian D. Kivlichan, Rachel Rosen, and Lucy Vasserman. 2022. Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation.Proc. ACM Hum.-Comput. Interact.6, CSCW2, Article 363 (nov 2022), 28 pages. doi:10.1145/3555088

work page doi:10.1145/3555088 2022
[39]

Melody Y. Guan, Manas Joglekar, Eric Wallace, Saachi Jain, Boaz Barak, Alec Helyar, Rachel Dias, Andrea Vallone, Hongyu Ren, Jason Wei, Hyung Won Chung, Sam Toyer, Johannes Heidecke, Alex Beutel, and Amelia Glaese. 2025. Deliberative Alignment: Reasoning Enables Safer Language Models. arXiv:2412.16339 [cs.CL] https://arxiv.org/abs/2412.16339

work page arXiv 2025
[40]

Necdet Gürkan, Mohammed Almarzouq, and Pon Rahul Murugaraj. 2023. Personalized Content Moderation and Emergent Outcomes. InProceedings of the International Conference on Information Systems (ICIS). Association for Information Systems, Hyderabad, India, 8 pages. https://aisel.aisnet. org/icis2023/paperthon/paperthon/3

2023
[41]

Hala Guta and Magdalena Karolak. 2015. Veiling and blogging: Social media as sites of identity negotiation and expression among Saudi women. Journal of International Women’s Studies16, 2 (2015), 115–127

2015
[42]

Haimson, Daniel Delmonaco, Peipei Nie, and Andrea Wegner

Oliver L. Haimson, Daniel Delmonaco, Peipei Nie, and Andrea Wegner. 2021. Disproportionate Removals and Differing Content Moderation Experiences for Conservative, Transgender, and Black Social Media Users: Marginalization and Moderation Gray Areas.Proc. ACM Hum.-Comput. Interact.5, CSCW2, Article 466 (oct 2021), 35 pages. doi:10.1145/3479610

work page doi:10.1145/3479610 2021
[43]

Laura Hanu and Unitary team. 2020. Detoxify. Github. https://github.com/unitaryai/detoxify

2020
[44]

Honestly, I Think TikTok has a Vendetta Against Black Creators

Camille Harris, Amber Gayle Johnson, Sadie Palmer, Diyi Yang, and Amy Bruckman. 2023. "Honestly, I Think TikTok has a Vendetta Against Black Creators": Understanding Black Content Creator Experiences on TikTok.Proc. ACM Hum.-Comput. Interact.7, CSCW2, Article 320 (oct 2023), 31 pages. doi:10.1145/3610169

work page doi:10.1145/3610169 2023
[45]

Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, and Ece Kamar. 2022. ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Smaranda Muresan, Preslav Nakov, and Aline ...

work page doi:10.18653/v1/2022.acl-long.234 2022
[46]

Caroline Hatchett. 2024. I Ate My Way Through America’s Bitch-Themed Restaurants and I Have Feelings. https://www.cosmopolitan.com/lifestyle/ a60257625/i-ate-my-way-through-americas-bitch-themed-restaurants-and-i-have-feelings/. Cosmopolitan, published May 22, 2024. Content Moderation Does Not Capture Communities’ Heterogeneous Attitudes Towards Reclaimed...

2024
[47]

Monique Hennink and Bonnie N. Kaiser. 2022. Sample sizes for saturation in qualitative research: A systematic review of empirical tests.Social Science & Medicine292 (2022), 114523. doi:10.1016/j.socscimed.2021.114523

work page doi:10.1016/j.socscimed.2021.114523 2022
[48]

Christopher Hom. 2008. The Semantics of Racial Epithets.The Journal of Philosophy105, 8 (2008), 416–440. http://www.jstor.org/stable/20620116

work page arXiv 2008
[49]

Jennifer Hornsby. 2001. Meaning and Uselessness: How to Think about Derogatory Words.Midwest Studies In Philosophy25, 1 (2001), 128–141. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/1475-4975.00042 doi:10.1111/1475-4975.00042

work page doi:10.1111/1475-4975.00042 2001
[50]

Queer in AI. 2022. Queer Experiences In AI Survey 2022-23. https://www.jotform.com/form/222480720967056

work page arXiv 2022
[51]

Robin Jeshion. 2020. Pride and Prejudiced.Grazer Philosophische Studien97, 1 (2020), 106–137. doi:10.1163/18756735-09701007

work page doi:10.1163/18756735-09701007 2020
[52]

Shagun Jhaver, Sucheta Ghoshal, Amy Bruckman, and Eric Gilbert. 2018. Online Harassment and Content Moderation: The Case of Blocklists.ACM Trans. Comput.-Hum. Interact.25, 2, Article 12 (March 2018), 33 pages. doi:10.1145/3185593

work page doi:10.1145/3185593 2018
[53]

Shagun Jhaver, Alice Qian Zhang, Quan Ze Chen, Nikhila Natarajan, Ruotong Wang, and Amy X. Zhang. 2023. Personalizing Content Moderation on Social Media: User Perspectives on Moderation Choices, Interface Design, and Labor.Proc. ACM Hum.-Comput. Interact.7, CSCW2, Article 289 (Oct. 2023), 33 pages. doi:10.1145/3610080

work page doi:10.1145/3610080 2023
[54]

Shagun Jhaver and Amy X. Zhang. 2025. Do Users Want Platform Moderation or Individual Control? Examining the Role of Third-Person Effects and Free Speech Support in Shaping Moderation Preferences.New Media & Society27, 5 (2025), 2930–2950. arXiv:https://doi.org/10.1177/14614448231217993 doi:10.1177/14614448231217993

work page doi:10.1177/14614448231217993 2025
[55]

Shivani Kapania, Alex S Taylor, and Ding Wang. 2023. A hunt for the Snark: Annotator Diversity in Data Practices. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 133, 15 pages. doi:10.1145/3544548.3580645

work page doi:10.1145/3544548.3580645 2023
[56]

Alt-Right

Mary M. Kosse. 2025.Semiotics of Online "Alt-Right" Discourse. Ph. D. Dissertation. University of Colorado at Boulder. https://scholar.colorado.edu/ concern/graduate_thesis_or_dissertations/r781wh84t

2025
[57]

Jana Kurrek, Haji Mohammad Saleem, and Derek Ruths. 2020. Towards a Comprehensive Taxonomy and Large-Scale Annotated Corpus for Online Slur Usage. InProceedings of the Fourth Workshop on Online Abuse and Harms, Seyi Akiwowo, Bertie Vidgen, Vinodkumar Prabhakaran, and Zeerak Waseem (Eds.). Association for Computational Linguistics, Online, 138–149. doi:10....

work page doi:10.18653/v1/2020.alw-1.17 2020
[58]

Francesca Lameiro, Lavinia Dunagan, Dallas Card, Eric Gilbert, and Oliver Haimson. 2025. TIDEs: A Transgender and Nonbinary Community-Labeled Dataset and Model for Transphobia Identification in Digital Environments. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’25). Association for Computing Machinery, New ...

work page doi:10.1145/3715275.3732095 2025
[59]

Savannah Larimore, Ian Kennedy, Breon Haskett, and Alina Arseniev-Koehler. 2021. Reconsidering Annotator Disagreement about Racist Language: Noise or Signal?. InProceedings of the Ninth International Workshop on Natural Language Processing for Social Media, Lun-Wei Ku and Cheng-Te Li (Eds.). Association for Computational Linguistics, Online, 81–90. doi:10...

work page doi:10.18653/v1/2021.socialnlp-1.7 2021
[60]

Nayeon Lee, Chani Jung, Junho Myung, Jiho Jin, Jose Camacho-Collados, Juho Kim, and Alice Oh. 2024. Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies ...

work page doi:10.18653/v1/2024.naacl-long.236 2024
[61]

There’s Always a Way to Get Around the Guidelines

Valerie Lookingbill and Kimanh Le. 2024. “There’s Always a Way to Get Around the Guidelines”: Nonsuicidal Self-Injury and Content Moderation on TikTok.Social Media + Society10, 2 (2024), 20563051241254371. doi:10.1177/20563051241254371

work page doi:10.1177/20563051241254371 2024
[62]

Chu Fei Luo, Samuel Dahan, and Xiaodan Zhu. 2025. Towards Low-Resource Alignment to Diverse Perspectives with Sparse Feedback. InFindings of the Association for Computational Linguistics: EMNLP 2025, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics, Suzhou, China, 20330–20339....

work page doi:10.18653/v1/2025.findings-emnlp.1106 2025
[63]

Marwick and danah boyd

Alice E. Marwick and danah boyd. 2011. I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience.New Media & Society13, 1 (2011), 114–133. doi:10.1177/1461444810365313

work page doi:10.1177/1461444810365313 2011
[64]

Giacomo Marzi, Marco Balzano, and Davide Marchiori. 2024. K-Alpha Calculator–Krippendorff’s Alpha Calculator: A user-friendly tool for computing Krippendorff’s Alpha inter-rater reliability coefficient.MethodsX12 (2024), 102545. doi:10.1016/j.mex.2023.102545

work page doi:10.1016/j.mex.2023.102545 2024
[65]

Merriam-Webster. 2026. derogatory. https://www.merriam-webster.com/dictionary/derogatory. Accessed: March 24, 2026

2026
[66]

Emanuele Moscato, Tiancheng Hu, Matthias Orlikowski, Paul Röttger, and Debora Nozza. 2025. Personalization up to a Point: Why Personalized Content Moderation Needs Boundaries, and How We Can Enforce Them. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose,...

work page doi:10.18653/v1/2025.emnlp-main.1726 2025
[67]

Luigi Pavone. 2024. Slur reclamation, irony, and resilience.Lodz Papers in Pragmatics20, 2 (2024), 349–363. doi:10.1515/lpp-2024-0029

work page doi:10.1515/lpp-2024-0029 2024
[68]

Jiaxin Pei and David Jurgens. 2023. When Do Annotator Demographics Matter? Measuring the Influence of Annotator Demographics with the POPQUORN Dataset. InProceedings of the 17th Linguistic Annotation Workshop (LA W-XVII), Jakob Prange and Annemarie Friedrich (Eds.). Association for Computational Linguistics, Toronto, Canada, 252–265. doi:10.18653/v1/2023.law-1.25

work page doi:10.18653/v1/2023.law-1.25 2023
[69]

Barbara Plank. 2022. The “Problem” of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 10671–10682. doi:10.1865...

work page doi:10.18653/v1/2022.emnlp-main.731 2022
[70]

Mihaela Popa-Wyatt. 2020. Reclamation: Taking Back Control of Words.Grazer Philosophische Studien97, 1 (2020), 159–176. doi:10.1163/18756735- 09701009 20 Chance et al

work page doi:10.1163/18756735- 2020
[71]

Bennett, Shaun K

Grace Proebsting, Oghenefejiro Isaacs Anigboro, Charlie M. Crawford, Danaé Metaxa, and Sorelle A. Friedler. 2025. Identity-related Speech Suppression in Generative AI Content Moderation. InProceedings of the 5th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO ’25). Association for Computing Machinery, New York, NY, U...

work page doi:10.1145/3757887.3763010 2025
[72]

Organizers Of Queerinai, Anaelia Ovalle, Arjun Subramonian, Ashwin Singh, Claas Voelcker, Danica J. Sutherland, Davide Locatelli, Eva Breznik, Filip Klubicka, Hang Yuan, Hetvi J, Huan Zhang, Jaidev Shriram, Kruno Lehman, Luca Soldaini, Maarten Sap, Marc Peter Deisenroth, Maria Leonor Pacheco, Maria Ryskina, Martin Mundt, Milind Agarwal, Nyx Mclean, Pan Xu...

work page doi:10.1145/3593013.3594134 2023
[73]

Joni Salminen, Hind Almerekhi, Milica Milenković, Soon-gyo Jung, Jisun An, Haewoon Kwak, and Bernard Jansen. 2018. Anatomy of Online Hate: Developing a Taxonomy and Machine Learning Models for Identifying and Classifying Hate in Online News Media.Proceedings of the International AAAI Conference on Web and Social Media12, 1 (Jun. 2018), 330–339. doi:10.160...

work page doi:10.1609/icwsm.v12i1.15028 2018
[74]

Joni Salminen, Fabio Veronesi, Hind Almerekhi, Soon-Gvo Jung, and Bernard J. Jansen. 2018. Online Hate Interpretation Varies by Country, But More by Individual: A Statistical Analysis Using Crowdsourced Ratings. In2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS). Institute of Electrical and Electronics Engin...

work page doi:10.1109/snams.2018.8554954 2018
[75]

Sebastin Santy, Jenny Liang, Ronan Le Bras, Katharina Reinecke, and Maarten Sap. 2023. NLPositionality: Characterizing Design Biases of Datasets and Models. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computati...

work page doi:10.18653/v1/2023.acl-long.505 2023
[76]

Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, and Noah A. Smith. 2019. The Risk of Racial Bias in Hate Speech Detection. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Korhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, Florence, Italy, 1668–1678. doi:10.18653...

work page doi:10.18653/v1/p19-1163 2019
[77]

Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, and Noah A. Smith. 2022. Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection. InProceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Marine Carpuat, Mari...

work page doi:10.18653/v1/2022.naacl-main.431 2022
[78]

Julia Schuster. 2013. Invisible feminists? Social media and young women’s political participation.Political Science65, 1 (2013), 8–24. arXiv:https://doi.org/10.1177/0032318713486474 doi:10.1177/0032318713486474

work page doi:10.1177/0032318713486474 2013
[79]

Farhana Shahid and Aditya Vashistha. 2023. Decolonizing Content Moderation: Does Uniform Global Community Standard Resemble Utopian Equality or Western Power Hegemony?. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 391, 18 pages. d...

work page doi:10.1145/3544548.3581538 2023
[80]

Taylor Sorensen, Jared Moore, Jillian Fisher, Mitchell L Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, Tim Althoff, and Yejin Choi. 2024. Position: A Roadmap to Pluralistic Alignment. InProceedings of the 41st International Conference on Machine Learning (Proceedings of Machine Learning Researc...

2024

Showing first 80 references.