The Power of Social Norms: How Initial Responses to Toxicity Shape Conversations on Twitter

Ana Aleksandric; Anne Groggel; Mohit Singhal; Shirin Nilizadeh

arxiv: 2211.10764 · v2 · submitted 2022-11-19 · 💻 cs.SI · cs.CY

The Power of Social Norms: How Initial Responses to Toxicity Shape Conversations on Twitter

Ana Aleksandric , Mohit Singhal , Anne Groggel , Shirin Nilizadeh This is my paper

Pith reviewed 2026-05-24 11:07 UTC · model grok-4.3

classification 💻 cs.SI cs.CY

keywords social normstoxicityTwittergroup dynamicsonline harassmentantisocial behaviorconversation analysis

0 comments

The pith

More users before a toxic tweet on Twitter are linked to fewer non-toxic responses, and a toxic first reply makes later replies more toxic.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper explores how the number of participants before a toxic tweet and the tone of the first reply influence whether conversations stay civil or grow abusive. It argues that larger groups diffuse responsibility to counter toxicity, while an early toxic reply sets a norm that encourages more abuse. Evidence comes from statistical associations in a sample of 187,000 tweets across 9,000 conversations. If these patterns hold, social norms function as strong cues that either sanction or permit uncivil behavior online. The work draws on ideas about contagion of antisocial actions to explain why some replies stay non-toxic while others do not.

Core claim

An increased number of users participating in the conversation before receiving a toxic tweet is negatively associated with the number of users who responded to the toxic reply in a non-toxic way. Furthermore, posting a toxic reply immediately after a toxic comment is negatively associated with users posting non-toxic replies and Twitter conversations becoming increasingly toxic.

What carries the argument

Group dynamics of user count before toxicity and tone of the initial reply to a toxic tweet, which serve as explanatory factors affecting whether others feel uninhibited to post abusive replies.

If this is right

Group size before a toxic tweet can diffuse individual responsibility to respond non-toxically.
The toxicity of the first direct reply establishes group norms that shape subsequent replies.
Social norms act as powerful cues that can maintain or sanction toxicity in online conversations.
Responses to uncivil comments reveal mechanisms by which antisocial behavior spreads.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Platform designs that highlight early non-toxic replies could interrupt the formation of toxic norms.
Similar patterns might appear in other threaded discussion systems if user visibility and reply order are comparable.
Experiments that randomly vary the display of prior participants could test whether the association is causal.

Load-bearing premise

The observed statistical associations between prior user count, first-reply toxicity, and later non-toxic replies reflect the causal operation of social norms rather than confounding variables such as conversation topic, user self-selection, or algorithmic visibility.

What would settle it

A controlled analysis of the same conversations that finds no negative association between prior user count and non-toxic responses, or between first-reply toxicity and later toxicity, after matching on topic and user history.

Figures

Figures reproduced from arXiv: 2211.10764 by Ana Aleksandric, Anne Groggel, Mohit Singhal, Shirin Nilizadeh.

**Figure 1.** Figure 1: An example of a conversation tree around 0.5. Therefore, around 6.5% of the tweets in our dataset were considered as toxic, where about 52% of toxic tweets were directed to the root authors. We only considered conversations with at least one toxic reply directed to the root author because we aim to study the effect of group dynamics in how these conversations unfold, and the behavior of users when toxicity… view at source ↗

read the original abstract

Online harassment and abusive language continue to be a growing concern on social media platforms. In this study, we explore the power of group dynamics to shape the toxicity of Twitter conversations. First, we examine how the presence of others in a conversation can potentially diffuse Twitter users' responsibility to address a toxic reply. Second, we examine whether the toxicity of the first direct reply to a toxic tweet in conversations establishes group norms for subsequent replies. By doing so, we outline users participating in the conversation before the first toxic reply and the tone of initial responses to a toxic reply as explanatory factors that affect whether others feel uninhibited to post their own abusive or derogatory replies. We test this premise by analyzing a random sample of more than 187K tweets belonging to ~ 9K conversations. This analysis of group dynamics is motivated by a larger body of scholarship on contagion of antisocial behavior and the power of establishing social norms that maintain rather than sanction toxicity. We find evidence that an increased number of users participating in the conversation before receiving a toxic tweet is negatively associated with the number of users who responded to the toxic reply in a non-toxic way. Furthermore, posting a toxic reply immediately after a toxic comment is negatively associated with users posting non-toxic replies and Twitter conversations becoming increasingly toxic. We argue that understanding how social media users respond to uncivil comments or abusive language reveals social norms as powerful social cues that can shape human behavior online.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Observational Twitter associations on toxicity are clear enough but the norm-based causal claims rest on untested assumptions about confounders.

read the letter

The main takeaway is that this paper documents two associations in a sample of roughly 9,000 Twitter conversations: more users active before a toxic tweet links to fewer non-toxic replies afterward, and a toxic first reply links to fewer non-toxic replies overall plus rising toxicity in the thread. The authors read these as signs that social norms and diffused responsibility shape later behavior. That framing is the central claim to evaluate. The work applies existing ideas on antisocial contagion and norm maintenance to Twitter toxicity metrics, using a random sample of 187k tweets. The scale is reasonable for observational social media research, and the choice to zoom in on the first reply as a potential norm signal gives the analysis a focused operationalization. The results line up with the directional predictions from the cited literature, which is at least consistent. The soft spot is the move from association to explanation. The abstract presents the user count and first-reply tone as explanatory factors without describing regression controls, topic fixed effects, user-level covariates, or robustness checks. Conversations on different topics can easily differ in both size and toxicity baseline, and users self-select into threads, so those patterns could drive the numbers without any norm mechanism at work. The stress-test note on identification is on target given what is shown. This paper is for computational social scientists who track online harassment and platform dynamics. Someone designing interventions around reply order or visibility might pull the first-reply angle for follow-up experiments. It is coherent on its own terms and grounded in real data, so it deserves a serious referee to examine the methods section and ask for the missing checks rather than a desk reject.

Referee Report

2 major / 1 minor

Summary. The paper claims that in a sample of ~187K tweets from ~9K Twitter conversations, a larger number of users participating before a toxic tweet is negatively associated with the number of non-toxic responses to the toxic reply, and that a toxic first reply is negatively associated with subsequent non-toxic replies and with conversations becoming more toxic overall. These associations are interpreted as evidence that social norms (via diffusion of responsibility and norm establishment) shape whether users post abusive replies.

Significance. A large observational dataset on conversation-level dynamics offers potential to extend work on antisocial behavior contagion and online norm formation. If the associations survive controls for topic, user selection, and visibility, the results could inform platform interventions targeting initial replies; the scale of the data is a clear asset for descriptive work.

major comments (2)

[Abstract] Abstract: the reported negative associations are presented as evidence that prior user count and first-reply toxicity 'affect whether others feel uninhibited,' yet no regression specification, control variables, fixed effects, or robustness checks are described, so it is impossible to evaluate whether the associations isolate the proposed norm mechanism or reflect topic/selection confounders.
[Abstract] The central interpretive step (associations reflect diffusion of responsibility and norm establishment) is load-bearing for the title and abstract claims, but the manuscript supplies no identification strategy or falsification tests against alternative explanations such as conversation topic heterogeneity or algorithmic ranking.

minor comments (1)

[Abstract] Clarify in the abstract whether the analysis is purely correlational or includes any attempt at causal identification.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed feedback on our manuscript. Below we respond to each major comment, clarifying our methodological approach and the observational nature of the study. We will make revisions to enhance transparency and temper causal language where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: the reported negative associations are presented as evidence that prior user count and first-reply toxicity 'affect whether others feel uninhibited,' yet no regression specification, control variables, fixed effects, or robustness checks are described, so it is impossible to evaluate whether the associations isolate the proposed norm mechanism or reflect topic/selection confounders.

Authors: The abstract provides a high-level summary of the findings. The full manuscript details the regression specifications in the Methods section, including controls for conversation size and topic. We will revise the abstract to include a brief mention of the regression models used and add robustness checks to the supplementary materials to address potential confounders. revision: yes
Referee: [Abstract] The central interpretive step (associations reflect diffusion of responsibility and norm establishment) is load-bearing for the title and abstract claims, but the manuscript supplies no identification strategy or falsification tests against alternative explanations such as conversation topic heterogeneity or algorithmic ranking.

Authors: Our analysis is observational and we do not claim to have an identification strategy for causal effects. The results are associations that we interpret in light of social norms theory. We will update the abstract and title to use more cautious language focusing on associations. We will also include additional discussion of alternative explanations such as topic heterogeneity. Our data does not permit direct tests of algorithmic ranking effects. revision: partial

standing simulated objections not resolved

Direct tests against algorithmic ranking, due to lack of visibility or ranking data in the dataset.

Circularity Check

0 steps flagged

No significant circularity; purely empirical reporting of data associations

full rationale

The paper performs observational analysis on a sample of Twitter conversations and reports negative statistical associations between prior user count, first-reply toxicity, and subsequent non-toxic replies. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. All claims rest on external data patterns rather than internal construction or self-definition. This is the expected non-finding for an empirical study without mathematical modeling.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities beyond standard social-science assumptions of representative sampling and observable associations reflecting underlying social processes.

pith-pipeline@v0.9.0 · 5796 in / 1184 out tokens · 30257 ms · 2026-05-24T11:07:03.049157+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Attention: What Prevents Young Adults from Speaking Up Against Cyberbullying in an LLM-Powered Social Media Simulation
cs.HC 2026-05 unverdicted novelty 7.0

Practicing bystander intervention in an LLM multi-agent simulation helps young adults speak up publicly against cyberbullying only after three specific attention shifts from inattention and self-focus to audience awar...

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Ashley A Anderson, Sara K Yeo, Dominique Brossard, Dietram A Scheufele, and Michael A Xenos. 2016. Toxic talk: How online incivility can undermine perceptions of media. International Journal of Public Opinion Research 30, 1 (2016), 156–168

work page 2016
[2]

Solomon E Asch. 1956. Studies of independence and conformity: I. A minority of one against a unanimous majority. Psychological monographs: General and applied 70, 9 (1956), 1

work page 1956
[3]

Sara Bastiaensens, Sara Pabian, Heidi Vandebosch, Karolien Poels, Katrien Van Cleemput, Ann DeSmet, and Ilse De Bourdeaudhuij. 2016. From normative influence to social pressure: How relevant others affect whether bystanders join in cyberbullying. Social Development 25, 1 (2016), 193–211

work page 2016
[4]

Amy Binns. 2012. DON’T FEED THE TROLLS! Managing troublemakers in magazines’ online communities. Journalism practice 6, 4 (2012), 547–562

work page 2012
[5]

Carrie A Blair, Lori Foster Thompson, and Karl L Wuensch. 2005. Electronic helping behavior: The virtual presence of others makes a difference. Basic and Applied Social Psychology 27, 2 (2005), 171–178

work page 2005
[6]

Daniëlle NM Bleize, Martin Tanis, Doeschka J Anschütz, and Moniek Buijzen

work page
[7]

Social Development 30, 4 (2021), 941–956

A social identity perspective on conformity to cyber aggression among early adolescents on WhatsApp. Social Development 30, 4 (2021), 941–956

work page 2021
[8]

Alexander Brown. 2018. What is so special about online (as compared to offline) hate speech? Ethnicities 18, 3 (2018), 297–326

work page 2018
[9]

Justin Cheng, Michael Bernstein, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. 2017. Anyone can become a troll: Causes of trolling behavior in online discussions. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing . 1217–1230

work page 2017
[10]

John M Darley and Bibb Latané. 1968. Bystander intervention in emergencies: diffusion of responsibility. Journal of personality and social psychology 8, 4p1 (1968), 377

work page 1968
[11]

Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Eleventh international aaai conference on web and social media

work page 2017
[12]

Ann DeSmet, Sara Bastiaensens, Katrien Van Cleemput, Karolien Poels, Heidi Vandebosch, Greet Cardon, and Ilse De Bourdeaudhuij. 2016. Deciding whether to look after them, to like it, or leave it: A multidimensional analysis of predictors of positive and negative bystander behavior in cyberbullying among adolescents. Computers in Human Behavior 57 (2016), 398–415

work page 2016
[13]

Dominic DiFranzo, Samuel Hardman Taylor, Franccesca Kazerooni, Olivia D Wherry, and Natalya N Bazarova. 2018. Upstanding by design: Bystander inter- vention in cyberbullying. In Proceedings of the 2018 CHI conference on human factors in computing systems . 1–12

work page 2018
[14]

Karthik Dinakar, Birago Jones, Catherine Havasi, Henry Lieberman, and Rosalind Picard. 2012. Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems (TiiS) 2, 3 (2012), 18

work page 2012
[15]

Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavl- jevic, and Narayan Bhamidipati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th international conference on world wide web . ACM, 29–30

work page 2015
[16]

Fernando Domínguez-Hernández, Lars Bonell, and Alejandro Martínez-González

work page
[17]

Cyberpsychology: Journal of Psychosocial Research on Cyberspace 12, 4 (2018)

A systematic literature review of factors that moderate bystanders’ actions in cyberbullying. Cyberpsychology: Journal of Psychosocial Research on Cyberspace 12, 4 (2018)

work page 2018
[18]

Amanda L Duffy and Drew Nesdale. 2009. Peer groups, social identity, and children’s bullying behavior.Social development 18, 1 (2009), 121–139

work page 2009
[19]

Maeve Duggan. 2014. Online harassment. Pew Research Center

work page 2014
[20]

William H Dutton. 1996. Network rules of order: Regulating speech in public electronic fora. Media, Culture & Society 18, 2 (1996), 269–290

work page 1996
[21]

Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, and Elizabeth Belding. 2018. Hate lingo: A target-based linguistic analysis of hate speech in social media. In Twelfth International AAAI Conference on Web and Social Media

work page 2018
[22]

Pnina Fichman and Elizabeth Peters. 2019. The impacts of territorial commu- nication norms and composition on online trolling. International Journal of Communication 13 (2019), 20

work page 2019
[23]

Peter Fischer, Joachim I Krueger, Tobias Greitemeyer, Claudia Vogrincic, An- dreas Kastenmüller, Dieter Frey, Moritz Heene, Magdalena Wicher, and Martina Kainbacher. 2011. The bystander-effect: a meta-analytic review on bystander intervention in dangerous and non-dangerous emergencies.Psychological bulletin 137, 4 (2011), 517

work page 2011
[24]

Stephanie D Freis and Regan AR Gurung. 2013. A Facebook analysis of helping behavior in online bullying. Psychology of popular media culture 2, 1 (2013), 11

work page 2013
[25]

Njagi Dennis Gitari, Zhang Zuping, Hanyurwimfura Damien, and Jun Long. 2015. A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering 10, 4 (2015), 215–230

work page 2015
[26]

Google Perspective API. 2021. https://www.perspectiveapi.com/

work page 2021
[27]

Tommi Gröndahl, Luca Pajola, Mika Juuti, Mauro Conti, and N Asokan. 2018. All You Need is" Love" Evading Hate Speech Detection. In Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security . 2–12

work page 2018
[28]

Billy Henson, Bonnie S Fisher, and Bradford W Reyns. 2020. There is virtually no excuse: The frequency and predictors of college students’ bystander intervention behaviors directed at online victimization. Violence Against Women 26, 5 (2020), 505–527

work page 2020
[29]

Hyunseo Hwang, Porismita Borah, Kang Namkoong, and A Veenstra. 2008. Does civility matter in the blogosphere? Examining the interaction effects of inci- vility and disagreement on citizen attitudes. In 58th Annual Conference of the International Communication Association, Montreal, QC, Canada

work page 2008
[30]

Sara Kiesler, Jane Siegel, and Timothy W McGuire. 1984. Social psychological aspects of computer-mediated communication. American psychologist 39, 10 (1984), 1123

work page 1984
[31]

Animesh Koratana and Kevin Hu. [n. d.]. Toxic Speech Detection. ([n. d.])

work page
[32]

Robin M Kowalski, Amber N Schroeder, and Carrie A Smith. 2013. Bystanders and their willingness to intervene in cyberbullying situations. From cyber bullying to cyber safety: Issues and approaches in educational contexts (2013), 77–100

work page 2013
[33]

Strict Moderation?

Nihal Kumarswamy et al . 2022. “Strict Moderation?” The Impact of Increased Moderation on Parler Content and User Behavior . Ph. D. Dissertation

work page 2022
[34]

Bibb Latané and John M Darley. 1969. Bystander" apathy". American Scientist 57, 2 (1969), 244–268

work page 1969
[35]

So-Hyun Lee and Hee-Woong Kim. 2015. Why people post benevolent and malicious comments online. Commun. ACM 58, 11 (2015), 74–79

work page 2015
[36]

Paul Benjamin Lowry, Jun Zhang, Chuang Wang, and Mikko Siponen. 2016. Why do adults engage in cyberbullying on social media? An integration of online disinhibition and deindividuation effects with the social structure and social learning model. Information Systems Research 27, 4 (2016), 962–986

work page 2016
[37]

Patrick M Markey. 2000. Bystander intervention in computer-mediated commu- nication. Computers in Human Behavior 16, 2 (2000), 183–188

work page 2000
[38]

Yashar Mehdad and Joel Tetreault. 2016. Do characters abuse more than words?. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 299–303

work page 2016
[39]

Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang

work page
[40]

In Proceedings of the 25th international conference on world wide web

Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web . International World Wide Web Conferences Steering Committee, 145–153

work page
[41]

Magdalena Obermaier, Nayla Fawzi, and Thomas Koch. 2016. Bystanding or standing by? How the number of bystanders affects the intention to intervene in cyberbullying. New media & society 18, 8 (2016), 1491–1507

work page 2016
[42]

Zizi Papacharissi. 2002. The virtual sphere: The internet as a public sphere. New media & society 4, 1 (2002), 9–27

work page 2002
[43]

Jenny L Paterson, Rupert Brown, and Mark A Walters. 2019. The short and longer term impacts of hate crimes experienced directly, indirectly, and through the media. Personality and Social Psychology Bulletin 45, 7 (2019), 994–1010

work page 2019
[44]

Georgios K Pitsilis, Heri Ramampiaro, and Helge Langseth. 2018. Detecting offensive language in tweets using deep learning. arXiv preprint arXiv:1801.04433 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[45]

Katja Rost, Lea Stahel, and Bruno S Frey. 2016. Digital social norm enforcement: Online firestorms in social media. PLoS one 11, 6 (2016), e0155923

work page 2016
[46]

Gregory K Rutkowski, Charles L Gruder, and Daniel Romer. 1983. Group cohe- siveness, social norms, and bystander intervention. Journal of Personality and Social Psychology 44, 3 (1983), 545. The Web Conference (WWW) ’23, April 30–May 4, 2023, Austin, Texas Ana Aleksandric, Mohit Singhal, Anne Groggel, and Shirin Nilizadeh

work page 1983
[47]

Nazanin Salehabadi, Anne Groggel, Mohit Singhal, Sayak Saha Roy, and Shirin Nilizadeh. 2022. User Engagement and the Toxicity of Tweets. https://doi.org/ 10.48550/ARXIV.2211.03856

work page doi:10.48550/arxiv.2211.03856 2022
[48]

Martin Saveski, Brandon Roy, and Deb Roy. 2021. The structure of toxic conver- sations on Twitter. In Proceedings of the Web Conference 2021 . 1086–1097

work page 2021
[49]

Karina Schumann, Jamil Zaki, and Carol S Dweck. 2014. Addressing the empathy deficit: beliefs about the malleability of empathy predict effortful responses when empathy is challenging. Journal of personality and social psychology 107, 3 (2014), 475

work page 2014
[50]

Jane Siegel, Vitaly Dubrovsky, Sara Kiesler, and Timothy W McGuire. 1986. Group processes in computer-mediated communication. Organizational behavior and human decision processes 37, 2 (1986), 157–187

work page 1986
[51]

Mohit Singhal, Chen Ling, Pujan Paudel, Poojitha Thota, Nihal Kumarswamy, Gianluca Stringhini, and Shirin Nilizadeh. 2022. SoK: Content Moderation in Social Media, from Guidelines to Enforcement, and Research to Practice. https: //doi.org/10.48550/ARXIV.2206.14855

work page doi:10.48550/arxiv.2206.14855 2022
[52]

Sara Sood, Judd Antin, and Elizabeth Churchill. 2012. Profanity use in online com- munities. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1481–1490

work page 2012
[53]

John Suler. 2004. The online disinhibition effect. Cyberpsychology & behavior 7, 3 (2004), 321–326

work page 2004
[54]

Henri Tajfel. 2010. Social identity and intergroup relations . Vol. 7. Cambridge University Press

work page 2010
[55]

Twarc. 2020. Collect Twitter Data with Twarc! https://scholarslab.github.io/learn- twarc/

work page 2020
[56]

Twitter. 2022. Twitter API. https://developer.twitter.com/en/docs/twitter-api

work page 2022
[57]

Marco Van Bommel, Jan-Willem Van Prooijen, Henk Elffers, and Paul AM Van Lange. 2012. Be aware to care: Public self-awareness leads to a reversal of the bystander effect. Journal of Experimental Social Psychology 48, 4 (2012), 926–930

work page 2012
[58]

Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predic- tive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop. 88–93

work page 2016
[59]

Savvas Zannettou, Mai ElSherief, Elizabeth Belding, Shirin Nilizadeh, and Gi- anluca Stringhini. 2020. Measuring and Characterizing Hate Speech on News Websites. In 12TH ACM WEB SCIENCE CONFERENCE . ACM

work page 2020
[60]

Justine Zhang, Ravi Kumar, Sujith Ravi, and Cristian Danescu-Niculescu-Mizil

work page
[61]

Conversational flow in Oxford-style debates.arXiv preprint arXiv:1604.03114 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[1] [1]

Ashley A Anderson, Sara K Yeo, Dominique Brossard, Dietram A Scheufele, and Michael A Xenos. 2016. Toxic talk: How online incivility can undermine perceptions of media. International Journal of Public Opinion Research 30, 1 (2016), 156–168

work page 2016

[2] [2]

Solomon E Asch. 1956. Studies of independence and conformity: I. A minority of one against a unanimous majority. Psychological monographs: General and applied 70, 9 (1956), 1

work page 1956

[3] [3]

Sara Bastiaensens, Sara Pabian, Heidi Vandebosch, Karolien Poels, Katrien Van Cleemput, Ann DeSmet, and Ilse De Bourdeaudhuij. 2016. From normative influence to social pressure: How relevant others affect whether bystanders join in cyberbullying. Social Development 25, 1 (2016), 193–211

work page 2016

[4] [4]

Amy Binns. 2012. DON’T FEED THE TROLLS! Managing troublemakers in magazines’ online communities. Journalism practice 6, 4 (2012), 547–562

work page 2012

[5] [5]

Carrie A Blair, Lori Foster Thompson, and Karl L Wuensch. 2005. Electronic helping behavior: The virtual presence of others makes a difference. Basic and Applied Social Psychology 27, 2 (2005), 171–178

work page 2005

[6] [6]

Daniëlle NM Bleize, Martin Tanis, Doeschka J Anschütz, and Moniek Buijzen

work page

[7] [7]

Social Development 30, 4 (2021), 941–956

A social identity perspective on conformity to cyber aggression among early adolescents on WhatsApp. Social Development 30, 4 (2021), 941–956

work page 2021

[8] [8]

Alexander Brown. 2018. What is so special about online (as compared to offline) hate speech? Ethnicities 18, 3 (2018), 297–326

work page 2018

[9] [9]

Justin Cheng, Michael Bernstein, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. 2017. Anyone can become a troll: Causes of trolling behavior in online discussions. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing . 1217–1230

work page 2017

[10] [10]

John M Darley and Bibb Latané. 1968. Bystander intervention in emergencies: diffusion of responsibility. Journal of personality and social psychology 8, 4p1 (1968), 377

work page 1968

[11] [11]

Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Eleventh international aaai conference on web and social media

work page 2017

[12] [12]

Ann DeSmet, Sara Bastiaensens, Katrien Van Cleemput, Karolien Poels, Heidi Vandebosch, Greet Cardon, and Ilse De Bourdeaudhuij. 2016. Deciding whether to look after them, to like it, or leave it: A multidimensional analysis of predictors of positive and negative bystander behavior in cyberbullying among adolescents. Computers in Human Behavior 57 (2016), 398–415

work page 2016

[13] [13]

Dominic DiFranzo, Samuel Hardman Taylor, Franccesca Kazerooni, Olivia D Wherry, and Natalya N Bazarova. 2018. Upstanding by design: Bystander inter- vention in cyberbullying. In Proceedings of the 2018 CHI conference on human factors in computing systems . 1–12

work page 2018

[14] [14]

Karthik Dinakar, Birago Jones, Catherine Havasi, Henry Lieberman, and Rosalind Picard. 2012. Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems (TiiS) 2, 3 (2012), 18

work page 2012

[15] [15]

Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavl- jevic, and Narayan Bhamidipati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th international conference on world wide web . ACM, 29–30

work page 2015

[16] [16]

Fernando Domínguez-Hernández, Lars Bonell, and Alejandro Martínez-González

work page

[17] [17]

Cyberpsychology: Journal of Psychosocial Research on Cyberspace 12, 4 (2018)

A systematic literature review of factors that moderate bystanders’ actions in cyberbullying. Cyberpsychology: Journal of Psychosocial Research on Cyberspace 12, 4 (2018)

work page 2018

[18] [18]

Amanda L Duffy and Drew Nesdale. 2009. Peer groups, social identity, and children’s bullying behavior.Social development 18, 1 (2009), 121–139

work page 2009

[19] [19]

Maeve Duggan. 2014. Online harassment. Pew Research Center

work page 2014

[20] [20]

William H Dutton. 1996. Network rules of order: Regulating speech in public electronic fora. Media, Culture & Society 18, 2 (1996), 269–290

work page 1996

[21] [21]

Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, and Elizabeth Belding. 2018. Hate lingo: A target-based linguistic analysis of hate speech in social media. In Twelfth International AAAI Conference on Web and Social Media

work page 2018

[22] [22]

Pnina Fichman and Elizabeth Peters. 2019. The impacts of territorial commu- nication norms and composition on online trolling. International Journal of Communication 13 (2019), 20

work page 2019

[23] [23]

Peter Fischer, Joachim I Krueger, Tobias Greitemeyer, Claudia Vogrincic, An- dreas Kastenmüller, Dieter Frey, Moritz Heene, Magdalena Wicher, and Martina Kainbacher. 2011. The bystander-effect: a meta-analytic review on bystander intervention in dangerous and non-dangerous emergencies.Psychological bulletin 137, 4 (2011), 517

work page 2011

[24] [24]

Stephanie D Freis and Regan AR Gurung. 2013. A Facebook analysis of helping behavior in online bullying. Psychology of popular media culture 2, 1 (2013), 11

work page 2013

[25] [25]

Njagi Dennis Gitari, Zhang Zuping, Hanyurwimfura Damien, and Jun Long. 2015. A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering 10, 4 (2015), 215–230

work page 2015

[26] [26]

Google Perspective API. 2021. https://www.perspectiveapi.com/

work page 2021

[27] [27]

Tommi Gröndahl, Luca Pajola, Mika Juuti, Mauro Conti, and N Asokan. 2018. All You Need is" Love" Evading Hate Speech Detection. In Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security . 2–12

work page 2018

[28] [28]

Billy Henson, Bonnie S Fisher, and Bradford W Reyns. 2020. There is virtually no excuse: The frequency and predictors of college students’ bystander intervention behaviors directed at online victimization. Violence Against Women 26, 5 (2020), 505–527

work page 2020

[29] [29]

Hyunseo Hwang, Porismita Borah, Kang Namkoong, and A Veenstra. 2008. Does civility matter in the blogosphere? Examining the interaction effects of inci- vility and disagreement on citizen attitudes. In 58th Annual Conference of the International Communication Association, Montreal, QC, Canada

work page 2008

[30] [30]

Sara Kiesler, Jane Siegel, and Timothy W McGuire. 1984. Social psychological aspects of computer-mediated communication. American psychologist 39, 10 (1984), 1123

work page 1984

[31] [31]

Animesh Koratana and Kevin Hu. [n. d.]. Toxic Speech Detection. ([n. d.])

work page

[32] [32]

Robin M Kowalski, Amber N Schroeder, and Carrie A Smith. 2013. Bystanders and their willingness to intervene in cyberbullying situations. From cyber bullying to cyber safety: Issues and approaches in educational contexts (2013), 77–100

work page 2013

[33] [33]

Strict Moderation?

Nihal Kumarswamy et al . 2022. “Strict Moderation?” The Impact of Increased Moderation on Parler Content and User Behavior . Ph. D. Dissertation

work page 2022

[34] [34]

Bibb Latané and John M Darley. 1969. Bystander" apathy". American Scientist 57, 2 (1969), 244–268

work page 1969

[35] [35]

So-Hyun Lee and Hee-Woong Kim. 2015. Why people post benevolent and malicious comments online. Commun. ACM 58, 11 (2015), 74–79

work page 2015

[36] [36]

Paul Benjamin Lowry, Jun Zhang, Chuang Wang, and Mikko Siponen. 2016. Why do adults engage in cyberbullying on social media? An integration of online disinhibition and deindividuation effects with the social structure and social learning model. Information Systems Research 27, 4 (2016), 962–986

work page 2016

[37] [37]

Patrick M Markey. 2000. Bystander intervention in computer-mediated commu- nication. Computers in Human Behavior 16, 2 (2000), 183–188

work page 2000

[38] [38]

Yashar Mehdad and Joel Tetreault. 2016. Do characters abuse more than words?. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 299–303

work page 2016

[39] [39]

Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang

work page

[40] [40]

In Proceedings of the 25th international conference on world wide web

Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web . International World Wide Web Conferences Steering Committee, 145–153

work page

[41] [41]

Magdalena Obermaier, Nayla Fawzi, and Thomas Koch. 2016. Bystanding or standing by? How the number of bystanders affects the intention to intervene in cyberbullying. New media & society 18, 8 (2016), 1491–1507

work page 2016

[42] [42]

Zizi Papacharissi. 2002. The virtual sphere: The internet as a public sphere. New media & society 4, 1 (2002), 9–27

work page 2002

[43] [43]

Jenny L Paterson, Rupert Brown, and Mark A Walters. 2019. The short and longer term impacts of hate crimes experienced directly, indirectly, and through the media. Personality and Social Psychology Bulletin 45, 7 (2019), 994–1010

work page 2019

[44] [44]

Georgios K Pitsilis, Heri Ramampiaro, and Helge Langseth. 2018. Detecting offensive language in tweets using deep learning. arXiv preprint arXiv:1801.04433 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[45] [45]

Katja Rost, Lea Stahel, and Bruno S Frey. 2016. Digital social norm enforcement: Online firestorms in social media. PLoS one 11, 6 (2016), e0155923

work page 2016

[46] [46]

Gregory K Rutkowski, Charles L Gruder, and Daniel Romer. 1983. Group cohe- siveness, social norms, and bystander intervention. Journal of Personality and Social Psychology 44, 3 (1983), 545. The Web Conference (WWW) ’23, April 30–May 4, 2023, Austin, Texas Ana Aleksandric, Mohit Singhal, Anne Groggel, and Shirin Nilizadeh

work page 1983

[47] [47]

Nazanin Salehabadi, Anne Groggel, Mohit Singhal, Sayak Saha Roy, and Shirin Nilizadeh. 2022. User Engagement and the Toxicity of Tweets. https://doi.org/ 10.48550/ARXIV.2211.03856

work page doi:10.48550/arxiv.2211.03856 2022

[48] [48]

Martin Saveski, Brandon Roy, and Deb Roy. 2021. The structure of toxic conver- sations on Twitter. In Proceedings of the Web Conference 2021 . 1086–1097

work page 2021

[49] [49]

Karina Schumann, Jamil Zaki, and Carol S Dweck. 2014. Addressing the empathy deficit: beliefs about the malleability of empathy predict effortful responses when empathy is challenging. Journal of personality and social psychology 107, 3 (2014), 475

work page 2014

[50] [50]

Jane Siegel, Vitaly Dubrovsky, Sara Kiesler, and Timothy W McGuire. 1986. Group processes in computer-mediated communication. Organizational behavior and human decision processes 37, 2 (1986), 157–187

work page 1986

[51] [51]

Mohit Singhal, Chen Ling, Pujan Paudel, Poojitha Thota, Nihal Kumarswamy, Gianluca Stringhini, and Shirin Nilizadeh. 2022. SoK: Content Moderation in Social Media, from Guidelines to Enforcement, and Research to Practice. https: //doi.org/10.48550/ARXIV.2206.14855

work page doi:10.48550/arxiv.2206.14855 2022

[52] [52]

Sara Sood, Judd Antin, and Elizabeth Churchill. 2012. Profanity use in online com- munities. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1481–1490

work page 2012

[53] [53]

John Suler. 2004. The online disinhibition effect. Cyberpsychology & behavior 7, 3 (2004), 321–326

work page 2004

[54] [54]

Henri Tajfel. 2010. Social identity and intergroup relations . Vol. 7. Cambridge University Press

work page 2010

[55] [55]

Twarc. 2020. Collect Twitter Data with Twarc! https://scholarslab.github.io/learn- twarc/

work page 2020

[56] [56]

Twitter. 2022. Twitter API. https://developer.twitter.com/en/docs/twitter-api

work page 2022

[57] [57]

Marco Van Bommel, Jan-Willem Van Prooijen, Henk Elffers, and Paul AM Van Lange. 2012. Be aware to care: Public self-awareness leads to a reversal of the bystander effect. Journal of Experimental Social Psychology 48, 4 (2012), 926–930

work page 2012

[58] [58]

Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predic- tive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop. 88–93

work page 2016

[59] [59]

Savvas Zannettou, Mai ElSherief, Elizabeth Belding, Shirin Nilizadeh, and Gi- anluca Stringhini. 2020. Measuring and Characterizing Hate Speech on News Websites. In 12TH ACM WEB SCIENCE CONFERENCE . ACM

work page 2020

[60] [60]

Justine Zhang, Ravi Kumar, Sujith Ravi, and Cristian Danescu-Niculescu-Mizil

work page

[61] [61]

Conversational flow in Oxford-style debates.arXiv preprint arXiv:1604.03114 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016