pith. sign in

arxiv: 2211.10764 · v2 · submitted 2022-11-19 · 💻 cs.SI · cs.CY

The Power of Social Norms: How Initial Responses to Toxicity Shape Conversations on Twitter

Pith reviewed 2026-05-24 11:07 UTC · model grok-4.3

classification 💻 cs.SI cs.CY
keywords social normstoxicityTwittergroup dynamicsonline harassmentantisocial behaviorconversation analysis
0
0 comments X

The pith

More users before a toxic tweet on Twitter are linked to fewer non-toxic responses, and a toxic first reply makes later replies more toxic.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper explores how the number of participants before a toxic tweet and the tone of the first reply influence whether conversations stay civil or grow abusive. It argues that larger groups diffuse responsibility to counter toxicity, while an early toxic reply sets a norm that encourages more abuse. Evidence comes from statistical associations in a sample of 187,000 tweets across 9,000 conversations. If these patterns hold, social norms function as strong cues that either sanction or permit uncivil behavior online. The work draws on ideas about contagion of antisocial actions to explain why some replies stay non-toxic while others do not.

Core claim

An increased number of users participating in the conversation before receiving a toxic tweet is negatively associated with the number of users who responded to the toxic reply in a non-toxic way. Furthermore, posting a toxic reply immediately after a toxic comment is negatively associated with users posting non-toxic replies and Twitter conversations becoming increasingly toxic.

What carries the argument

Group dynamics of user count before toxicity and tone of the initial reply to a toxic tweet, which serve as explanatory factors affecting whether others feel uninhibited to post abusive replies.

If this is right

  • Group size before a toxic tweet can diffuse individual responsibility to respond non-toxically.
  • The toxicity of the first direct reply establishes group norms that shape subsequent replies.
  • Social norms act as powerful cues that can maintain or sanction toxicity in online conversations.
  • Responses to uncivil comments reveal mechanisms by which antisocial behavior spreads.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Platform designs that highlight early non-toxic replies could interrupt the formation of toxic norms.
  • Similar patterns might appear in other threaded discussion systems if user visibility and reply order are comparable.
  • Experiments that randomly vary the display of prior participants could test whether the association is causal.

Load-bearing premise

The observed statistical associations between prior user count, first-reply toxicity, and later non-toxic replies reflect the causal operation of social norms rather than confounding variables such as conversation topic, user self-selection, or algorithmic visibility.

What would settle it

A controlled analysis of the same conversations that finds no negative association between prior user count and non-toxic responses, or between first-reply toxicity and later toxicity, after matching on topic and user history.

Figures

Figures reproduced from arXiv: 2211.10764 by Ana Aleksandric, Anne Groggel, Mohit Singhal, Shirin Nilizadeh.

Figure 1
Figure 1. Figure 1: An example of a conversation tree around 0.5. Therefore, around 6.5% of the tweets in our dataset were considered as toxic, where about 52% of toxic tweets were directed to the root authors. We only considered conversations with at least one toxic reply directed to the root author because we aim to study the effect of group dynamics in how these conversations unfold, and the behavior of users when toxicity… view at source ↗
read the original abstract

Online harassment and abusive language continue to be a growing concern on social media platforms. In this study, we explore the power of group dynamics to shape the toxicity of Twitter conversations. First, we examine how the presence of others in a conversation can potentially diffuse Twitter users' responsibility to address a toxic reply. Second, we examine whether the toxicity of the first direct reply to a toxic tweet in conversations establishes group norms for subsequent replies. By doing so, we outline users participating in the conversation before the first toxic reply and the tone of initial responses to a toxic reply as explanatory factors that affect whether others feel uninhibited to post their own abusive or derogatory replies. We test this premise by analyzing a random sample of more than 187K tweets belonging to ~ 9K conversations. This analysis of group dynamics is motivated by a larger body of scholarship on contagion of antisocial behavior and the power of establishing social norms that maintain rather than sanction toxicity. We find evidence that an increased number of users participating in the conversation before receiving a toxic tweet is negatively associated with the number of users who responded to the toxic reply in a non-toxic way. Furthermore, posting a toxic reply immediately after a toxic comment is negatively associated with users posting non-toxic replies and Twitter conversations becoming increasingly toxic. We argue that understanding how social media users respond to uncivil comments or abusive language reveals social norms as powerful social cues that can shape human behavior online.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that in a sample of ~187K tweets from ~9K Twitter conversations, a larger number of users participating before a toxic tweet is negatively associated with the number of non-toxic responses to the toxic reply, and that a toxic first reply is negatively associated with subsequent non-toxic replies and with conversations becoming more toxic overall. These associations are interpreted as evidence that social norms (via diffusion of responsibility and norm establishment) shape whether users post abusive replies.

Significance. A large observational dataset on conversation-level dynamics offers potential to extend work on antisocial behavior contagion and online norm formation. If the associations survive controls for topic, user selection, and visibility, the results could inform platform interventions targeting initial replies; the scale of the data is a clear asset for descriptive work.

major comments (2)
  1. [Abstract] Abstract: the reported negative associations are presented as evidence that prior user count and first-reply toxicity 'affect whether others feel uninhibited,' yet no regression specification, control variables, fixed effects, or robustness checks are described, so it is impossible to evaluate whether the associations isolate the proposed norm mechanism or reflect topic/selection confounders.
  2. [Abstract] The central interpretive step (associations reflect diffusion of responsibility and norm establishment) is load-bearing for the title and abstract claims, but the manuscript supplies no identification strategy or falsification tests against alternative explanations such as conversation topic heterogeneity or algorithmic ranking.
minor comments (1)
  1. [Abstract] Clarify in the abstract whether the analysis is purely correlational or includes any attempt at causal identification.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed feedback on our manuscript. Below we respond to each major comment, clarifying our methodological approach and the observational nature of the study. We will make revisions to enhance transparency and temper causal language where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported negative associations are presented as evidence that prior user count and first-reply toxicity 'affect whether others feel uninhibited,' yet no regression specification, control variables, fixed effects, or robustness checks are described, so it is impossible to evaluate whether the associations isolate the proposed norm mechanism or reflect topic/selection confounders.

    Authors: The abstract provides a high-level summary of the findings. The full manuscript details the regression specifications in the Methods section, including controls for conversation size and topic. We will revise the abstract to include a brief mention of the regression models used and add robustness checks to the supplementary materials to address potential confounders. revision: yes

  2. Referee: [Abstract] The central interpretive step (associations reflect diffusion of responsibility and norm establishment) is load-bearing for the title and abstract claims, but the manuscript supplies no identification strategy or falsification tests against alternative explanations such as conversation topic heterogeneity or algorithmic ranking.

    Authors: Our analysis is observational and we do not claim to have an identification strategy for causal effects. The results are associations that we interpret in light of social norms theory. We will update the abstract and title to use more cautious language focusing on associations. We will also include additional discussion of alternative explanations such as topic heterogeneity. Our data does not permit direct tests of algorithmic ranking effects. revision: partial

standing simulated objections not resolved
  • Direct tests against algorithmic ranking, due to lack of visibility or ranking data in the dataset.

Circularity Check

0 steps flagged

No significant circularity; purely empirical reporting of data associations

full rationale

The paper performs observational analysis on a sample of Twitter conversations and reports negative statistical associations between prior user count, first-reply toxicity, and subsequent non-toxic replies. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. All claims rest on external data patterns rather than internal construction or self-definition. This is the expected non-finding for an empirical study without mathematical modeling.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities beyond standard social-science assumptions of representative sampling and observable associations reflecting underlying social processes.

pith-pipeline@v0.9.0 · 5796 in / 1184 out tokens · 30257 ms · 2026-05-24T11:07:03.049157+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Attention: What Prevents Young Adults from Speaking Up Against Cyberbullying in an LLM-Powered Social Media Simulation

    cs.HC 2026-05 unverdicted novelty 7.0

    Practicing bystander intervention in an LLM multi-agent simulation helps young adults speak up publicly against cyberbullying only after three specific attention shifts from inattention and self-focus to audience awar...

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Ashley A Anderson, Sara K Yeo, Dominique Brossard, Dietram A Scheufele, and Michael A Xenos. 2016. Toxic talk: How online incivility can undermine perceptions of media. International Journal of Public Opinion Research 30, 1 (2016), 156–168

  2. [2]

    Solomon E Asch. 1956. Studies of independence and conformity: I. A minority of one against a unanimous majority. Psychological monographs: General and applied 70, 9 (1956), 1

  3. [3]

    Sara Bastiaensens, Sara Pabian, Heidi Vandebosch, Karolien Poels, Katrien Van Cleemput, Ann DeSmet, and Ilse De Bourdeaudhuij. 2016. From normative influence to social pressure: How relevant others affect whether bystanders join in cyberbullying. Social Development 25, 1 (2016), 193–211

  4. [4]

    Amy Binns. 2012. DON’T FEED THE TROLLS! Managing troublemakers in magazines’ online communities. Journalism practice 6, 4 (2012), 547–562

  5. [5]

    Carrie A Blair, Lori Foster Thompson, and Karl L Wuensch. 2005. Electronic helping behavior: The virtual presence of others makes a difference. Basic and Applied Social Psychology 27, 2 (2005), 171–178

  6. [6]

    Daniëlle NM Bleize, Martin Tanis, Doeschka J Anschütz, and Moniek Buijzen

  7. [7]

    Social Development 30, 4 (2021), 941–956

    A social identity perspective on conformity to cyber aggression among early adolescents on WhatsApp. Social Development 30, 4 (2021), 941–956

  8. [8]

    Alexander Brown. 2018. What is so special about online (as compared to offline) hate speech? Ethnicities 18, 3 (2018), 297–326

  9. [9]

    Justin Cheng, Michael Bernstein, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. 2017. Anyone can become a troll: Causes of trolling behavior in online discussions. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing . 1217–1230

  10. [10]

    John M Darley and Bibb Latané. 1968. Bystander intervention in emergencies: diffusion of responsibility. Journal of personality and social psychology 8, 4p1 (1968), 377

  11. [11]

    Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Eleventh international aaai conference on web and social media

  12. [12]

    Ann DeSmet, Sara Bastiaensens, Katrien Van Cleemput, Karolien Poels, Heidi Vandebosch, Greet Cardon, and Ilse De Bourdeaudhuij. 2016. Deciding whether to look after them, to like it, or leave it: A multidimensional analysis of predictors of positive and negative bystander behavior in cyberbullying among adolescents. Computers in Human Behavior 57 (2016), 398–415

  13. [13]

    Dominic DiFranzo, Samuel Hardman Taylor, Franccesca Kazerooni, Olivia D Wherry, and Natalya N Bazarova. 2018. Upstanding by design: Bystander inter- vention in cyberbullying. In Proceedings of the 2018 CHI conference on human factors in computing systems . 1–12

  14. [14]

    Karthik Dinakar, Birago Jones, Catherine Havasi, Henry Lieberman, and Rosalind Picard. 2012. Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems (TiiS) 2, 3 (2012), 18

  15. [15]

    Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavl- jevic, and Narayan Bhamidipati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th international conference on world wide web . ACM, 29–30

  16. [16]

    Fernando Domínguez-Hernández, Lars Bonell, and Alejandro Martínez-González

  17. [17]

    Cyberpsychology: Journal of Psychosocial Research on Cyberspace 12, 4 (2018)

    A systematic literature review of factors that moderate bystanders’ actions in cyberbullying. Cyberpsychology: Journal of Psychosocial Research on Cyberspace 12, 4 (2018)

  18. [18]

    Amanda L Duffy and Drew Nesdale. 2009. Peer groups, social identity, and children’s bullying behavior.Social development 18, 1 (2009), 121–139

  19. [19]

    Maeve Duggan. 2014. Online harassment. Pew Research Center

  20. [20]

    William H Dutton. 1996. Network rules of order: Regulating speech in public electronic fora. Media, Culture & Society 18, 2 (1996), 269–290

  21. [21]

    Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, and Elizabeth Belding. 2018. Hate lingo: A target-based linguistic analysis of hate speech in social media. In Twelfth International AAAI Conference on Web and Social Media

  22. [22]

    Pnina Fichman and Elizabeth Peters. 2019. The impacts of territorial commu- nication norms and composition on online trolling. International Journal of Communication 13 (2019), 20

  23. [23]

    Peter Fischer, Joachim I Krueger, Tobias Greitemeyer, Claudia Vogrincic, An- dreas Kastenmüller, Dieter Frey, Moritz Heene, Magdalena Wicher, and Martina Kainbacher. 2011. The bystander-effect: a meta-analytic review on bystander intervention in dangerous and non-dangerous emergencies.Psychological bulletin 137, 4 (2011), 517

  24. [24]

    Stephanie D Freis and Regan AR Gurung. 2013. A Facebook analysis of helping behavior in online bullying. Psychology of popular media culture 2, 1 (2013), 11

  25. [25]

    Njagi Dennis Gitari, Zhang Zuping, Hanyurwimfura Damien, and Jun Long. 2015. A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering 10, 4 (2015), 215–230

  26. [26]

    Google Perspective API. 2021. https://www.perspectiveapi.com/

  27. [27]

    Tommi Gröndahl, Luca Pajola, Mika Juuti, Mauro Conti, and N Asokan. 2018. All You Need is" Love" Evading Hate Speech Detection. In Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security . 2–12

  28. [28]

    Billy Henson, Bonnie S Fisher, and Bradford W Reyns. 2020. There is virtually no excuse: The frequency and predictors of college students’ bystander intervention behaviors directed at online victimization. Violence Against Women 26, 5 (2020), 505–527

  29. [29]

    Hyunseo Hwang, Porismita Borah, Kang Namkoong, and A Veenstra. 2008. Does civility matter in the blogosphere? Examining the interaction effects of inci- vility and disagreement on citizen attitudes. In 58th Annual Conference of the International Communication Association, Montreal, QC, Canada

  30. [30]

    Sara Kiesler, Jane Siegel, and Timothy W McGuire. 1984. Social psychological aspects of computer-mediated communication. American psychologist 39, 10 (1984), 1123

  31. [31]

    Animesh Koratana and Kevin Hu. [n. d.]. Toxic Speech Detection. ([n. d.])

  32. [32]

    Robin M Kowalski, Amber N Schroeder, and Carrie A Smith. 2013. Bystanders and their willingness to intervene in cyberbullying situations. From cyber bullying to cyber safety: Issues and approaches in educational contexts (2013), 77–100

  33. [33]

    Strict Moderation?

    Nihal Kumarswamy et al . 2022. “Strict Moderation?” The Impact of Increased Moderation on Parler Content and User Behavior . Ph. D. Dissertation

  34. [34]

    Bibb Latané and John M Darley. 1969. Bystander" apathy". American Scientist 57, 2 (1969), 244–268

  35. [35]

    So-Hyun Lee and Hee-Woong Kim. 2015. Why people post benevolent and malicious comments online. Commun. ACM 58, 11 (2015), 74–79

  36. [36]

    Paul Benjamin Lowry, Jun Zhang, Chuang Wang, and Mikko Siponen. 2016. Why do adults engage in cyberbullying on social media? An integration of online disinhibition and deindividuation effects with the social structure and social learning model. Information Systems Research 27, 4 (2016), 962–986

  37. [37]

    Patrick M Markey. 2000. Bystander intervention in computer-mediated commu- nication. Computers in Human Behavior 16, 2 (2000), 183–188

  38. [38]

    Yashar Mehdad and Joel Tetreault. 2016. Do characters abuse more than words?. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 299–303

  39. [39]

    Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang

  40. [40]

    In Proceedings of the 25th international conference on world wide web

    Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web . International World Wide Web Conferences Steering Committee, 145–153

  41. [41]

    Magdalena Obermaier, Nayla Fawzi, and Thomas Koch. 2016. Bystanding or standing by? How the number of bystanders affects the intention to intervene in cyberbullying. New media & society 18, 8 (2016), 1491–1507

  42. [42]

    Zizi Papacharissi. 2002. The virtual sphere: The internet as a public sphere. New media & society 4, 1 (2002), 9–27

  43. [43]

    Jenny L Paterson, Rupert Brown, and Mark A Walters. 2019. The short and longer term impacts of hate crimes experienced directly, indirectly, and through the media. Personality and Social Psychology Bulletin 45, 7 (2019), 994–1010

  44. [44]

    Georgios K Pitsilis, Heri Ramampiaro, and Helge Langseth. 2018. Detecting offensive language in tweets using deep learning. arXiv preprint arXiv:1801.04433 (2018)

  45. [45]

    Katja Rost, Lea Stahel, and Bruno S Frey. 2016. Digital social norm enforcement: Online firestorms in social media. PLoS one 11, 6 (2016), e0155923

  46. [46]

    Gregory K Rutkowski, Charles L Gruder, and Daniel Romer. 1983. Group cohe- siveness, social norms, and bystander intervention. Journal of Personality and Social Psychology 44, 3 (1983), 545. The Web Conference (WWW) ’23, April 30–May 4, 2023, Austin, Texas Ana Aleksandric, Mohit Singhal, Anne Groggel, and Shirin Nilizadeh

  47. [47]

    Nazanin Salehabadi, Anne Groggel, Mohit Singhal, Sayak Saha Roy, and Shirin Nilizadeh. 2022. User Engagement and the Toxicity of Tweets. https://doi.org/ 10.48550/ARXIV.2211.03856

  48. [48]

    Martin Saveski, Brandon Roy, and Deb Roy. 2021. The structure of toxic conver- sations on Twitter. In Proceedings of the Web Conference 2021 . 1086–1097

  49. [49]

    Karina Schumann, Jamil Zaki, and Carol S Dweck. 2014. Addressing the empathy deficit: beliefs about the malleability of empathy predict effortful responses when empathy is challenging. Journal of personality and social psychology 107, 3 (2014), 475

  50. [50]

    Jane Siegel, Vitaly Dubrovsky, Sara Kiesler, and Timothy W McGuire. 1986. Group processes in computer-mediated communication. Organizational behavior and human decision processes 37, 2 (1986), 157–187

  51. [51]

    Mohit Singhal, Chen Ling, Pujan Paudel, Poojitha Thota, Nihal Kumarswamy, Gianluca Stringhini, and Shirin Nilizadeh. 2022. SoK: Content Moderation in Social Media, from Guidelines to Enforcement, and Research to Practice. https: //doi.org/10.48550/ARXIV.2206.14855

  52. [52]

    Sara Sood, Judd Antin, and Elizabeth Churchill. 2012. Profanity use in online com- munities. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1481–1490

  53. [53]

    John Suler. 2004. The online disinhibition effect. Cyberpsychology & behavior 7, 3 (2004), 321–326

  54. [54]

    Henri Tajfel. 2010. Social identity and intergroup relations . Vol. 7. Cambridge University Press

  55. [55]

    Twarc. 2020. Collect Twitter Data with Twarc! https://scholarslab.github.io/learn- twarc/

  56. [56]

    Twitter. 2022. Twitter API. https://developer.twitter.com/en/docs/twitter-api

  57. [57]

    Marco Van Bommel, Jan-Willem Van Prooijen, Henk Elffers, and Paul AM Van Lange. 2012. Be aware to care: Public self-awareness leads to a reversal of the bystander effect. Journal of Experimental Social Psychology 48, 4 (2012), 926–930

  58. [58]

    Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predic- tive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop. 88–93

  59. [59]

    Savvas Zannettou, Mai ElSherief, Elizabeth Belding, Shirin Nilizadeh, and Gi- anluca Stringhini. 2020. Measuring and Characterizing Hate Speech on News Websites. In 12TH ACM WEB SCIENCE CONFERENCE . ACM

  60. [60]

    Justine Zhang, Ravi Kumar, Sujith Ravi, and Cristian Danescu-Niculescu-Mizil

  61. [61]

    Conversational flow in Oxford-style debates.arXiv preprint arXiv:1604.03114 (2016)