The Power of Social Norms: How Initial Responses to Toxicity Shape Conversations on Twitter
Pith reviewed 2026-05-24 11:07 UTC · model grok-4.3
The pith
More users before a toxic tweet on Twitter are linked to fewer non-toxic responses, and a toxic first reply makes later replies more toxic.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
An increased number of users participating in the conversation before receiving a toxic tweet is negatively associated with the number of users who responded to the toxic reply in a non-toxic way. Furthermore, posting a toxic reply immediately after a toxic comment is negatively associated with users posting non-toxic replies and Twitter conversations becoming increasingly toxic.
What carries the argument
Group dynamics of user count before toxicity and tone of the initial reply to a toxic tweet, which serve as explanatory factors affecting whether others feel uninhibited to post abusive replies.
If this is right
- Group size before a toxic tweet can diffuse individual responsibility to respond non-toxically.
- The toxicity of the first direct reply establishes group norms that shape subsequent replies.
- Social norms act as powerful cues that can maintain or sanction toxicity in online conversations.
- Responses to uncivil comments reveal mechanisms by which antisocial behavior spreads.
Where Pith is reading between the lines
- Platform designs that highlight early non-toxic replies could interrupt the formation of toxic norms.
- Similar patterns might appear in other threaded discussion systems if user visibility and reply order are comparable.
- Experiments that randomly vary the display of prior participants could test whether the association is causal.
Load-bearing premise
The observed statistical associations between prior user count, first-reply toxicity, and later non-toxic replies reflect the causal operation of social norms rather than confounding variables such as conversation topic, user self-selection, or algorithmic visibility.
What would settle it
A controlled analysis of the same conversations that finds no negative association between prior user count and non-toxic responses, or between first-reply toxicity and later toxicity, after matching on topic and user history.
Figures
read the original abstract
Online harassment and abusive language continue to be a growing concern on social media platforms. In this study, we explore the power of group dynamics to shape the toxicity of Twitter conversations. First, we examine how the presence of others in a conversation can potentially diffuse Twitter users' responsibility to address a toxic reply. Second, we examine whether the toxicity of the first direct reply to a toxic tweet in conversations establishes group norms for subsequent replies. By doing so, we outline users participating in the conversation before the first toxic reply and the tone of initial responses to a toxic reply as explanatory factors that affect whether others feel uninhibited to post their own abusive or derogatory replies. We test this premise by analyzing a random sample of more than 187K tweets belonging to ~ 9K conversations. This analysis of group dynamics is motivated by a larger body of scholarship on contagion of antisocial behavior and the power of establishing social norms that maintain rather than sanction toxicity. We find evidence that an increased number of users participating in the conversation before receiving a toxic tweet is negatively associated with the number of users who responded to the toxic reply in a non-toxic way. Furthermore, posting a toxic reply immediately after a toxic comment is negatively associated with users posting non-toxic replies and Twitter conversations becoming increasingly toxic. We argue that understanding how social media users respond to uncivil comments or abusive language reveals social norms as powerful social cues that can shape human behavior online.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that in a sample of ~187K tweets from ~9K Twitter conversations, a larger number of users participating before a toxic tweet is negatively associated with the number of non-toxic responses to the toxic reply, and that a toxic first reply is negatively associated with subsequent non-toxic replies and with conversations becoming more toxic overall. These associations are interpreted as evidence that social norms (via diffusion of responsibility and norm establishment) shape whether users post abusive replies.
Significance. A large observational dataset on conversation-level dynamics offers potential to extend work on antisocial behavior contagion and online norm formation. If the associations survive controls for topic, user selection, and visibility, the results could inform platform interventions targeting initial replies; the scale of the data is a clear asset for descriptive work.
major comments (2)
- [Abstract] Abstract: the reported negative associations are presented as evidence that prior user count and first-reply toxicity 'affect whether others feel uninhibited,' yet no regression specification, control variables, fixed effects, or robustness checks are described, so it is impossible to evaluate whether the associations isolate the proposed norm mechanism or reflect topic/selection confounders.
- [Abstract] The central interpretive step (associations reflect diffusion of responsibility and norm establishment) is load-bearing for the title and abstract claims, but the manuscript supplies no identification strategy or falsification tests against alternative explanations such as conversation topic heterogeneity or algorithmic ranking.
minor comments (1)
- [Abstract] Clarify in the abstract whether the analysis is purely correlational or includes any attempt at causal identification.
Simulated Author's Rebuttal
We thank the referee for the detailed feedback on our manuscript. Below we respond to each major comment, clarifying our methodological approach and the observational nature of the study. We will make revisions to enhance transparency and temper causal language where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: the reported negative associations are presented as evidence that prior user count and first-reply toxicity 'affect whether others feel uninhibited,' yet no regression specification, control variables, fixed effects, or robustness checks are described, so it is impossible to evaluate whether the associations isolate the proposed norm mechanism or reflect topic/selection confounders.
Authors: The abstract provides a high-level summary of the findings. The full manuscript details the regression specifications in the Methods section, including controls for conversation size and topic. We will revise the abstract to include a brief mention of the regression models used and add robustness checks to the supplementary materials to address potential confounders. revision: yes
-
Referee: [Abstract] The central interpretive step (associations reflect diffusion of responsibility and norm establishment) is load-bearing for the title and abstract claims, but the manuscript supplies no identification strategy or falsification tests against alternative explanations such as conversation topic heterogeneity or algorithmic ranking.
Authors: Our analysis is observational and we do not claim to have an identification strategy for causal effects. The results are associations that we interpret in light of social norms theory. We will update the abstract and title to use more cautious language focusing on associations. We will also include additional discussion of alternative explanations such as topic heterogeneity. Our data does not permit direct tests of algorithmic ranking effects. revision: partial
- Direct tests against algorithmic ranking, due to lack of visibility or ranking data in the dataset.
Circularity Check
No significant circularity; purely empirical reporting of data associations
full rationale
The paper performs observational analysis on a sample of Twitter conversations and reports negative statistical associations between prior user count, first-reply toxicity, and subsequent non-toxic replies. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. All claims rest on external data patterns rather than internal construction or self-definition. This is the expected non-finding for an empirical study without mathematical modeling.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Attention: What Prevents Young Adults from Speaking Up Against Cyberbullying in an LLM-Powered Social Media Simulation
Practicing bystander intervention in an LLM multi-agent simulation helps young adults speak up publicly against cyberbullying only after three specific attention shifts from inattention and self-focus to audience awar...
Reference graph
Works this paper leans on
-
[1]
Ashley A Anderson, Sara K Yeo, Dominique Brossard, Dietram A Scheufele, and Michael A Xenos. 2016. Toxic talk: How online incivility can undermine perceptions of media. International Journal of Public Opinion Research 30, 1 (2016), 156–168
work page 2016
-
[2]
Solomon E Asch. 1956. Studies of independence and conformity: I. A minority of one against a unanimous majority. Psychological monographs: General and applied 70, 9 (1956), 1
work page 1956
-
[3]
Sara Bastiaensens, Sara Pabian, Heidi Vandebosch, Karolien Poels, Katrien Van Cleemput, Ann DeSmet, and Ilse De Bourdeaudhuij. 2016. From normative influence to social pressure: How relevant others affect whether bystanders join in cyberbullying. Social Development 25, 1 (2016), 193–211
work page 2016
-
[4]
Amy Binns. 2012. DON’T FEED THE TROLLS! Managing troublemakers in magazines’ online communities. Journalism practice 6, 4 (2012), 547–562
work page 2012
-
[5]
Carrie A Blair, Lori Foster Thompson, and Karl L Wuensch. 2005. Electronic helping behavior: The virtual presence of others makes a difference. Basic and Applied Social Psychology 27, 2 (2005), 171–178
work page 2005
-
[6]
Daniëlle NM Bleize, Martin Tanis, Doeschka J Anschütz, and Moniek Buijzen
-
[7]
Social Development 30, 4 (2021), 941–956
A social identity perspective on conformity to cyber aggression among early adolescents on WhatsApp. Social Development 30, 4 (2021), 941–956
work page 2021
-
[8]
Alexander Brown. 2018. What is so special about online (as compared to offline) hate speech? Ethnicities 18, 3 (2018), 297–326
work page 2018
-
[9]
Justin Cheng, Michael Bernstein, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. 2017. Anyone can become a troll: Causes of trolling behavior in online discussions. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing . 1217–1230
work page 2017
-
[10]
John M Darley and Bibb Latané. 1968. Bystander intervention in emergencies: diffusion of responsibility. Journal of personality and social psychology 8, 4p1 (1968), 377
work page 1968
-
[11]
Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Eleventh international aaai conference on web and social media
work page 2017
-
[12]
Ann DeSmet, Sara Bastiaensens, Katrien Van Cleemput, Karolien Poels, Heidi Vandebosch, Greet Cardon, and Ilse De Bourdeaudhuij. 2016. Deciding whether to look after them, to like it, or leave it: A multidimensional analysis of predictors of positive and negative bystander behavior in cyberbullying among adolescents. Computers in Human Behavior 57 (2016), 398–415
work page 2016
-
[13]
Dominic DiFranzo, Samuel Hardman Taylor, Franccesca Kazerooni, Olivia D Wherry, and Natalya N Bazarova. 2018. Upstanding by design: Bystander inter- vention in cyberbullying. In Proceedings of the 2018 CHI conference on human factors in computing systems . 1–12
work page 2018
-
[14]
Karthik Dinakar, Birago Jones, Catherine Havasi, Henry Lieberman, and Rosalind Picard. 2012. Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems (TiiS) 2, 3 (2012), 18
work page 2012
-
[15]
Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavl- jevic, and Narayan Bhamidipati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th international conference on world wide web . ACM, 29–30
work page 2015
-
[16]
Fernando Domínguez-Hernández, Lars Bonell, and Alejandro Martínez-González
-
[17]
Cyberpsychology: Journal of Psychosocial Research on Cyberspace 12, 4 (2018)
A systematic literature review of factors that moderate bystanders’ actions in cyberbullying. Cyberpsychology: Journal of Psychosocial Research on Cyberspace 12, 4 (2018)
work page 2018
-
[18]
Amanda L Duffy and Drew Nesdale. 2009. Peer groups, social identity, and children’s bullying behavior.Social development 18, 1 (2009), 121–139
work page 2009
-
[19]
Maeve Duggan. 2014. Online harassment. Pew Research Center
work page 2014
-
[20]
William H Dutton. 1996. Network rules of order: Regulating speech in public electronic fora. Media, Culture & Society 18, 2 (1996), 269–290
work page 1996
-
[21]
Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, and Elizabeth Belding. 2018. Hate lingo: A target-based linguistic analysis of hate speech in social media. In Twelfth International AAAI Conference on Web and Social Media
work page 2018
-
[22]
Pnina Fichman and Elizabeth Peters. 2019. The impacts of territorial commu- nication norms and composition on online trolling. International Journal of Communication 13 (2019), 20
work page 2019
-
[23]
Peter Fischer, Joachim I Krueger, Tobias Greitemeyer, Claudia Vogrincic, An- dreas Kastenmüller, Dieter Frey, Moritz Heene, Magdalena Wicher, and Martina Kainbacher. 2011. The bystander-effect: a meta-analytic review on bystander intervention in dangerous and non-dangerous emergencies.Psychological bulletin 137, 4 (2011), 517
work page 2011
-
[24]
Stephanie D Freis and Regan AR Gurung. 2013. A Facebook analysis of helping behavior in online bullying. Psychology of popular media culture 2, 1 (2013), 11
work page 2013
-
[25]
Njagi Dennis Gitari, Zhang Zuping, Hanyurwimfura Damien, and Jun Long. 2015. A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering 10, 4 (2015), 215–230
work page 2015
-
[26]
Google Perspective API. 2021. https://www.perspectiveapi.com/
work page 2021
-
[27]
Tommi Gröndahl, Luca Pajola, Mika Juuti, Mauro Conti, and N Asokan. 2018. All You Need is" Love" Evading Hate Speech Detection. In Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security . 2–12
work page 2018
-
[28]
Billy Henson, Bonnie S Fisher, and Bradford W Reyns. 2020. There is virtually no excuse: The frequency and predictors of college students’ bystander intervention behaviors directed at online victimization. Violence Against Women 26, 5 (2020), 505–527
work page 2020
-
[29]
Hyunseo Hwang, Porismita Borah, Kang Namkoong, and A Veenstra. 2008. Does civility matter in the blogosphere? Examining the interaction effects of inci- vility and disagreement on citizen attitudes. In 58th Annual Conference of the International Communication Association, Montreal, QC, Canada
work page 2008
-
[30]
Sara Kiesler, Jane Siegel, and Timothy W McGuire. 1984. Social psychological aspects of computer-mediated communication. American psychologist 39, 10 (1984), 1123
work page 1984
-
[31]
Animesh Koratana and Kevin Hu. [n. d.]. Toxic Speech Detection. ([n. d.])
-
[32]
Robin M Kowalski, Amber N Schroeder, and Carrie A Smith. 2013. Bystanders and their willingness to intervene in cyberbullying situations. From cyber bullying to cyber safety: Issues and approaches in educational contexts (2013), 77–100
work page 2013
-
[33]
Nihal Kumarswamy et al . 2022. “Strict Moderation?” The Impact of Increased Moderation on Parler Content and User Behavior . Ph. D. Dissertation
work page 2022
-
[34]
Bibb Latané and John M Darley. 1969. Bystander" apathy". American Scientist 57, 2 (1969), 244–268
work page 1969
-
[35]
So-Hyun Lee and Hee-Woong Kim. 2015. Why people post benevolent and malicious comments online. Commun. ACM 58, 11 (2015), 74–79
work page 2015
-
[36]
Paul Benjamin Lowry, Jun Zhang, Chuang Wang, and Mikko Siponen. 2016. Why do adults engage in cyberbullying on social media? An integration of online disinhibition and deindividuation effects with the social structure and social learning model. Information Systems Research 27, 4 (2016), 962–986
work page 2016
-
[37]
Patrick M Markey. 2000. Bystander intervention in computer-mediated commu- nication. Computers in Human Behavior 16, 2 (2000), 183–188
work page 2000
-
[38]
Yashar Mehdad and Joel Tetreault. 2016. Do characters abuse more than words?. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 299–303
work page 2016
-
[39]
Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang
-
[40]
In Proceedings of the 25th international conference on world wide web
Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web . International World Wide Web Conferences Steering Committee, 145–153
-
[41]
Magdalena Obermaier, Nayla Fawzi, and Thomas Koch. 2016. Bystanding or standing by? How the number of bystanders affects the intention to intervene in cyberbullying. New media & society 18, 8 (2016), 1491–1507
work page 2016
-
[42]
Zizi Papacharissi. 2002. The virtual sphere: The internet as a public sphere. New media & society 4, 1 (2002), 9–27
work page 2002
-
[43]
Jenny L Paterson, Rupert Brown, and Mark A Walters. 2019. The short and longer term impacts of hate crimes experienced directly, indirectly, and through the media. Personality and Social Psychology Bulletin 45, 7 (2019), 994–1010
work page 2019
-
[44]
Georgios K Pitsilis, Heri Ramampiaro, and Helge Langseth. 2018. Detecting offensive language in tweets using deep learning. arXiv preprint arXiv:1801.04433 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[45]
Katja Rost, Lea Stahel, and Bruno S Frey. 2016. Digital social norm enforcement: Online firestorms in social media. PLoS one 11, 6 (2016), e0155923
work page 2016
-
[46]
Gregory K Rutkowski, Charles L Gruder, and Daniel Romer. 1983. Group cohe- siveness, social norms, and bystander intervention. Journal of Personality and Social Psychology 44, 3 (1983), 545. The Web Conference (WWW) ’23, April 30–May 4, 2023, Austin, Texas Ana Aleksandric, Mohit Singhal, Anne Groggel, and Shirin Nilizadeh
work page 1983
-
[47]
Nazanin Salehabadi, Anne Groggel, Mohit Singhal, Sayak Saha Roy, and Shirin Nilizadeh. 2022. User Engagement and the Toxicity of Tweets. https://doi.org/ 10.48550/ARXIV.2211.03856
-
[48]
Martin Saveski, Brandon Roy, and Deb Roy. 2021. The structure of toxic conver- sations on Twitter. In Proceedings of the Web Conference 2021 . 1086–1097
work page 2021
-
[49]
Karina Schumann, Jamil Zaki, and Carol S Dweck. 2014. Addressing the empathy deficit: beliefs about the malleability of empathy predict effortful responses when empathy is challenging. Journal of personality and social psychology 107, 3 (2014), 475
work page 2014
-
[50]
Jane Siegel, Vitaly Dubrovsky, Sara Kiesler, and Timothy W McGuire. 1986. Group processes in computer-mediated communication. Organizational behavior and human decision processes 37, 2 (1986), 157–187
work page 1986
-
[51]
Mohit Singhal, Chen Ling, Pujan Paudel, Poojitha Thota, Nihal Kumarswamy, Gianluca Stringhini, and Shirin Nilizadeh. 2022. SoK: Content Moderation in Social Media, from Guidelines to Enforcement, and Research to Practice. https: //doi.org/10.48550/ARXIV.2206.14855
-
[52]
Sara Sood, Judd Antin, and Elizabeth Churchill. 2012. Profanity use in online com- munities. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1481–1490
work page 2012
-
[53]
John Suler. 2004. The online disinhibition effect. Cyberpsychology & behavior 7, 3 (2004), 321–326
work page 2004
-
[54]
Henri Tajfel. 2010. Social identity and intergroup relations . Vol. 7. Cambridge University Press
work page 2010
-
[55]
Twarc. 2020. Collect Twitter Data with Twarc! https://scholarslab.github.io/learn- twarc/
work page 2020
-
[56]
Twitter. 2022. Twitter API. https://developer.twitter.com/en/docs/twitter-api
work page 2022
-
[57]
Marco Van Bommel, Jan-Willem Van Prooijen, Henk Elffers, and Paul AM Van Lange. 2012. Be aware to care: Public self-awareness leads to a reversal of the bystander effect. Journal of Experimental Social Psychology 48, 4 (2012), 926–930
work page 2012
-
[58]
Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predic- tive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop. 88–93
work page 2016
-
[59]
Savvas Zannettou, Mai ElSherief, Elizabeth Belding, Shirin Nilizadeh, and Gi- anluca Stringhini. 2020. Measuring and Characterizing Hate Speech on News Websites. In 12TH ACM WEB SCIENCE CONFERENCE . ACM
work page 2020
-
[60]
Justine Zhang, Ravi Kumar, Sujith Ravi, and Cristian Danescu-Niculescu-Mizil
-
[61]
Conversational flow in Oxford-style debates.arXiv preprint arXiv:1604.03114 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.