Who, Why, and How: Disentangling the Effects of Moderation Source, Context, and Language on Post-Removal Behavior
Pith reviewed 2026-05-19 21:45 UTC · model grok-4.3
The pith
Bot moderation on Reddit produces higher compliance and lower self-censorship than human or modteam moderation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In a dataset of 11,795,036 moderation events across 9 million users, bot-moderated removals yield higher compliance and lower self-censorship than removals by humans or modteams. Modteam actions produce the largest withdrawal effects. Linguistic features such as elaborated explanations and direct address improve outcomes only for routine violations; for serious violations these same features increase withdrawal while prosocial and emotionally emphatic framing becomes most effective.
What carries the argument
Violation severity as a moderator of cue-based processing, tested inside an extension of the Human-AI Interaction Theory of Interactive Media Effects through probabilistic behavioral classification and regression on linguistic features extracted via PCA.
If this is right
- Routine violations can be routed to bots to raise compliance rates without raising self-censorship.
- Modteam interventions should be reserved for cases where institutional signaling is the goal rather than retention.
- Removal messages for high-severity violations should favor prosocial framing and emotional emphasis over detailed explanations.
- Moderation systems can become context-adaptive by letting violation severity select the linguistic strategy.
Where Pith is reading between the lines
- The compliance advantage of bots may extend to other platforms if their community structures resemble Reddit's subreddit model.
- Hybrid designs that start with bot messages and escalate serious cases to humans could capture both efficiency and perceived legitimacy.
- Long-term user retention on platforms might rise if self-censorship is lowered through calibrated moderation language.
Load-bearing premise
The large observational dataset lets researchers attribute differences in user compliance and withdrawal directly to moderator source and message language without major confounding from subreddit norms or moderator assignment choices.
What would settle it
A randomized experiment that assigns identical violations to bot, human, or team moderation while varying message language and then measures the fraction of users who post again versus those who reduce activity.
Figures
read the original abstract
Content moderation is a central mechanism through which platforms attempt to balance user engagement with community governance. Yet existing research has largely treated moderation as a uniform intervention, overlooking how moderator source, violation context, and linguistic style jointly shape user behavior. Drawing on the Human--AI Interaction Theory of Interactive Media Effects (HAII-TIME), this study examines how these three dimensions produce divergent post-moderation behavioral trajectories in a large-scale observational dataset of 11,795,036 moderation events across 9,285,410 users and 61,261 subreddits on Reddit (2021--2025). Using probabilistic behavioral classification, ANOVA, and OLS regression with PCA-derived linguistic features, we find that bot moderation consistently produces higher compliance and lower self-censorship than human or modteam moderation, challenging the assumption that human agency cues are inherently advantageous. Modteam moderation produces the strongest self-censorship effects, suggesting that institutional depersonalization is a meaningful driver of behavioral withdrawal. Violation severity emerges as a critical contingency: linguistic strategies effective in routine contexts -- elaborated explanation, community-scale appeals, direct personal address -- can backfire for serious violations, whereas prosocially framed and emotionally emphatic messages become most effective when stakes are highest. Of 480 linguistic interactions tested, 33 survive FDR correction. These findings extend HAII-TIME by introducing violation salience as a moderator of cue-based processing, and offer empirical grounding for context-adaptive moderation design.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper analyzes a large observational dataset of 11,795,036 moderation events across 9,285,410 users and 61,261 subreddits on Reddit (2021-2025) to examine how moderator source (bot, human, modteam), violation context, and linguistic style jointly influence post-moderation user behavior. Drawing on HAII-TIME, it employs probabilistic behavioral classification, ANOVA, and OLS regression with PCA-derived linguistic features, reporting that bot moderation is associated with higher compliance and lower self-censorship than human or modteam moderation, that modteam moderation drives the strongest self-censorship, and that violation severity moderates the effectiveness of linguistic strategies (with 33 of 480 interactions surviving FDR correction). The work claims to extend HAII-TIME by introducing violation salience as a moderator of cue-based processing.
Significance. If the central associations hold after addressing potential confounding, the findings would be significant for computational social science and platform governance research by providing large-scale evidence on differential effects of automated versus human moderation and by identifying violation severity as a key contingency for linguistic interventions. The dataset scale, use of FDR correction across 480 tests, and extension of an existing theoretical framework are clear strengths that would support practical implications for context-adaptive moderation design.
major comments (3)
- [Abstract] Abstract: The claim that 'bot moderation consistently produces higher compliance and lower self-censorship' attributes outcomes causally to moderator source, yet the observational design compares outcomes across non-randomly assigned sources without demonstrated controls (e.g., subreddit fixed effects, violation-type stratification, or propensity weighting) for selection into moderator type or subreddit norms; the reported OLS and ANOVA results on PCA features therefore cannot isolate the source cue itself from the contexts in which each source appears.
- [Methods/Results] Methods/Results (OLS and ANOVA sections): The manuscript does not detail whether the regression models include subreddit fixed effects, user-level clustering, or robustness checks such as propensity score weighting to address the non-random assignment of moderation sources noted in the skeptic's concern; without these, the source main effects and the 33 FDR-significant interactions remain vulnerable to confounding and cannot cleanly support the headline behavioral attribution.
- [Abstract and Discussion] Abstract and Discussion: The extension of HAII-TIME by 'introducing violation salience as a moderator' is presented as a theoretical contribution, but the observational data leave open whether the reported severity-by-language interactions reflect cue processing or unmeasured differences in how severe violations are routed to different moderator sources and linguistic framings.
minor comments (2)
- [Abstract] The abstract would benefit from a brief parenthetical definition or citation for 'probabilistic behavioral classification' to clarify how compliance and self-censorship are operationalized from the 11.8M events.
- [Results] Figure or table captions for the linguistic interaction results should explicitly state the exact number of tests (480) and the FDR threshold applied so readers can assess the 33 significant findings without returning to the text.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major point below, clarifying our approach and indicating revisions where the manuscript can be strengthened without overstating the observational evidence.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that 'bot moderation consistently produces higher compliance and lower self-censorship' attributes outcomes causally to moderator source, yet the observational design compares outcomes across non-randomly assigned sources without demonstrated controls (e.g., subreddit fixed effects, violation-type stratification, or propensity weighting) for selection into moderator type or subreddit norms; the reported OLS and ANOVA results on PCA features therefore cannot isolate the source cue itself from the contexts in which each source appears.
Authors: We agree that the phrasing 'produces' risks implying causation beyond what the observational data support. The reported OLS models control for violation severity, subreddit size, and other observed covariates, with violation-type stratification implicit in the interaction terms, but subreddit fixed effects and propensity weighting were not applied in the primary specifications. We will revise the abstract to use associative language ('is associated with') and add a dedicated robustness subsection describing these controls and limitations. revision: yes
-
Referee: [Methods/Results] Methods/Results (OLS and ANOVA sections): The manuscript does not detail whether the regression models include subreddit fixed effects, user-level clustering, or robustness checks such as propensity score weighting to address the non-random assignment of moderation sources noted in the skeptic's concern; without these, the source main effects and the 33 FDR-significant interactions remain vulnerable to confounding and cannot cleanly support the headline behavioral attribution.
Authors: The primary models include user-level random effects to address clustering and control for violation type and subreddit characteristics. Subreddit fixed effects were omitted from the main results to retain statistical power across 61,261 subreddits. We will expand the Methods section with complete model equations, explicit mention of the clustering approach, and new robustness analyses that incorporate subreddit fixed effects and propensity-score weighting on observable features such as subreddit activity and violation category. revision: yes
-
Referee: [Abstract and Discussion] Abstract and Discussion: The extension of HAII-TIME by 'introducing violation salience as a moderator' is presented as a theoretical contribution, but the observational data leave open whether the reported severity-by-language interactions reflect cue processing or unmeasured differences in how severe violations are routed to different moderator sources and linguistic framings.
Authors: The models explicitly interact linguistic features with violation severity while holding moderator source constant within strata, which provides evidence consistent with salience moderating cue effectiveness. We cannot fully exclude differential routing with observational data alone. We will revise the Discussion to acknowledge this limitation more explicitly, frame the HAII-TIME extension as an empirical pattern supporting the proposed moderator rather than a conclusive test, and suggest future experimental designs to isolate routing mechanisms. revision: partial
Circularity Check
No significant circularity; empirical analysis is self-contained
full rationale
The paper reports results from an observational dataset of 11.8M moderation events analyzed via probabilistic classification, ANOVA, and OLS regression on PCA-derived features. All load-bearing claims (bot moderation producing higher compliance, violation severity as moderator, 33 FDR-significant interactions) are statistical outputs from the data rather than quantities defined by the paper's own fitted parameters or reduced to self-citations by construction. The reference to HAII-TIME is used to frame the study and is extended by new empirical findings; it does not serve as a load-bearing premise whose validity depends on the present results. No self-definitional loops, fitted inputs called predictions, or ansatzes smuggled via citation appear in the derivation chain. The analysis is therefore independent of its own outputs and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
free parameters (2)
- PCA-derived linguistic feature dimensions
- OLS regression coefficients for interaction terms
axioms (2)
- domain assumption Probabilistic behavioral classification correctly identifies compliance versus self-censorship from post-moderation activity logs
- domain assumption OLS regression assumptions (linearity, no omitted variable bias, homoscedasticity) hold for the behavioral outcome models
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Using probabilistic behavioral classification, one-way ANOVA, and OLS regression with principal component analysis (PCA)-derived linguistic features...
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Bot moderation consistently produces higher compliance and lower self-censorship than human or modteam moderation
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Baym, N. K., & boyd danah, d. (2012). Socially Mediated Publicness: An Introduction [ eprint: https://doi.org/10.1080/08838151.2012.705200].Journal of Broadcasting & Electronic Media, 56(3), 320–329. https://doi.org/10.1080/08838151.2012.705200
-
[2]
Binns, R., Van Kleek, M., Veale, M., Lyngs, U., Zhao, J., & Shadbolt, N. (2018). ’It’s Reducing a Human Being to a Percentage’: Perceptions of Justice in Algorithmic Decisions.Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 1–14. https://doi.org/ 10.1145/3173574.3173951
-
[3]
Braithwaite, J. (2001).Restorative Justice & Responsive Regulation(1st ed.). Oxford University Press. https://doi.org/10.1093/oso/9780195136395.001.0001
-
[4]
Brehm, J. W. (1966).A Theory of Psychological Reactance. Academic Press
work page 1966
-
[5]
Brown, P., & Levinson, S. C. (1987).Politeness: Some Universals in Language Usage. Cambridge University Press
work page 1987
-
[6]
(2018).Content or Context Moderation? Artisanal, Community, and Industrial Approaches (tech
Caplan, R. (2018).Content or Context Moderation? Artisanal, Community, and Industrial Approaches (tech. rep.). Data & Society Research Institute. New York. https://datasociety.net/library/ content-or-context-moderation/
work page 2018
-
[7]
Chandrasekharan, E., Pavalanathan, U., Srinivasan, A., Glynn, A., Eisenstein, J., & Gilbert, E. (2017). You Can’t Stay Here: The Efficacy of Reddit’s 2015 Ban Examined Through Hate Speech. Proceedings of the ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW), 1–18. https://doi.org/10.1145/2998181.2998215
-
[8]
Chandrasekharan, E., Samory, M., Jhaver, S., Charvat, H., Bruckman, A., Lampe, C., Eisenstein, J., & Gilbert, E. (2018). The Internet’s Hidden Rules: An Empirical Study of Reddit Norm Violations at Micro, Meso, and Macro Scales.Proc. ACM Hum.-Comput. Interact.,2(CSCW). https://doi.org/10.1145/3274301
-
[9]
Chandrasekharan, E., Samory, M., Srinivasan, A., & Gilbert, E. (2022). Quarantined! Examining the Effects of Reddit Quarantines on Online Hate and Behavior.Proceedings of the International AAAI Conference on Web and Social Media (ICWSM),16(1), 109–120
work page 2022
-
[10]
Chang, J., Zhang, H., & Danescu-Niculescu-Mizil, C. (2022). Echoes of Moderation: How Banning Affects the Spread of Toxic Content Online.Proceedings of the International AAAI Conference on Web and Social Media (ICWSM),16(1), 76–87
work page 2022
-
[11]
Christin, A., Bernstein, M. S., Hancock, J. T., Jia, C., Mado, M. N., Tsai, J. L., & Xu, C. (2024). Inter- nal Fractures: The Competing Logics of Social Media Platforms [eprint: https://doi.org/10.1177/20563051241274668]. Social Media + Society,10(3), 20563051241274668. https://doi.org/10.1177/20563051241274668
-
[12]
Cialdini, R. B., & Goldstein, N. J. (2004). Social Influence: Compliance and Conformity.Annual review of psychology,55(1), 591–621. https://doi.org/10.1146/annurev.psych.55.090902.142015
-
[13]
Deci, E. L., & Ryan, R. M. (2000). The ”what” and ”why” of goal pursuits: Human needs and the self-determination of behavior.Psychological Inquiry,11(4), 227–268
work page 2000
-
[14]
A., Gergle, D., & Birnholtz, J
DeVito, M. A., Gergle, D., & Birnholtz, J. (2017). ”Algorithms ruin everything”: #RIPTwitter, Folk Theories, and Resistance to Algorithmic Change in Social Media.Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 3163–3174. https://doi.org/10.1145/ 3025453.3025659
-
[15]
Dillard, J. P., & Shen, L. (2005). On the nature of reactance and its role in persuasive health commu- nication.Communication Monographs,72(2), 144–168
work page 2005
-
[16]
Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots.Commun. ACM,59(7), 96–104. https://doi.org/10.1145/2818717
-
[17]
Gerrard, Y. (2018). Beyond Hashtags: Coded Discourse in the Pro–Eating Disorder Community on Instagram.New Media & Society,20(12), 4653–4670
work page 2018
-
[18]
Gillespie, T. (2018).Custodians of the Internet: Platforms, Content Moderation, and the Hidden De- cisions That Shape Social Media. Yale University Press
work page 2018
-
[19]
Gillespie, T. (2019, December).Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. Yale University Press. https://doi.org/10.12987/ 9780300235029
work page 2019
-
[20]
Gillespie, T. (2022). Do Not Recommend? Reduction as a Form of Content Moderation [ eprint: https://doi.org/10.1177/20563051221117552].Social Media + Society,8(3), 20563051221117552. https://doi.org/10.1177/20563051221117552 18 Gon¸ calves, J., Weber, I., Masullo, G. M., Silva, M. T. d., & Hofhuis, J. (2023). Common sense or censorship: How algorithmic mo...
-
[21]
Grimmelmann, J. (2015). The virtues of moderation.Yale Journal of Law & Technology,17, 42–109. Horta Ribeiro, M., Jhaver, S., Zannettou, S., Blackburn, J., Stringhini, G., De Cristofaro, E., &
work page 2015
-
[22]
West, R. (2021). Do Platform Migrations Compromise Content Moderation? Evidence from r/The donald and r/Incels.Proc. ACM Hum.-Comput. Interact.,5(CSCW2). https://doi.org/ 10.1145/3476057
-
[23]
Jenkins, H. (2006).Convergence Culture. NYU Press. Retrieved April 10, 2026, from http://www. jstor.org/stable/j.ctt9qffwr
work page 2006
-
[24]
Jhaver, S., Birman, I., Gilbert, E., & Bruckman, A. (2019). Did You Suspect the Post Would Be Removed? Understanding User Reactions to Content Moderation on Reddit.Proceedings of the ACM on Human-Computer Interaction (CSCW),3(CSCW), 1–33
work page 2019
-
[25]
Jhaver, S., Birman, I., Gilbert, E., & Bruckman, A. (2021). Measuring the Effectiveness of Content Moderation Efforts on YouTube.Proceedings of the ACM on Human-Computer Interaction (CSCW),5(CSCW2), 1–27
work page 2021
-
[26]
Jhaver, S., Bruckman, A., & Gilbert, E. (2019). Does Transparency in Moderation Affect User Be- havior?Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI), 1–14. https://doi.org/10.1145/3290605.3300479
-
[27]
Jhaver, S., Rathi, H., & Saha, K. (2024). Bystanders of Online Moderation: Examining the Effects of Witnessing Post-Removal Explanations.Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’24), 1–9. https://doi.org/10.1145/3613904.3642204
-
[28]
Jiang, J. ’., Middler, S., Brubaker, J. R., & Fiesler, C. (2020). Characterizing Community Guidelines on Social Media Platforms.Companion Publication of the 2020 Conference on Computer Sup- ported Cooperative Work and Social Computing, 287–291. https://doi.org/10.1145/3406865. 3418312
-
[29]
Molina, M. D., & Sundar, S. S. (2022). When AI moderates online content: Effects of human collabora- tion and interactive transparency on user trust [eprint: https://academic.oup.com/jcmc/article- pdf/27/4/zmac010/45048191/zmac010.pdf].Journal of Computer-Mediated Communication, 27(4), zmac010. https://doi.org/10.1093/jcmc/zmac010 Myers West, S. (2018). C...
-
[30]
Nass, C., Steuer, J., & Tauber, E. R. (1994). Computers are social actors.Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 72–78. https://doi.org/10.1145/191666. 191703
-
[31]
Penney, J. W. (2017). Chilling effects: Online surveillance and Wikipedia use.Berkeley Technology Law Journal,31(1), 117–182
work page 2017
-
[32]
Petty, R. E., & Cacioppo, J. T. (1986).Communication and Persuasion: Central and Peripheral Routes to Attitude Change. Springer
work page 1986
-
[33]
Puschmann, C. (2021). Coded Speech and Platform Governance.Internet, Policy & Politics Confer- ence
work page 2021
-
[34]
Rai, T. S., & Fiske, A. P. (2011). Moral Psychology Is Relationship Regulation: Moral Motives for
work page 2011
-
[35]
https: //doi.org/10.1037/a0021867
Unity, Hierarchy, Equality, and Proportionality.Psychological review,118(1), 57–75. https: //doi.org/10.1037/a0021867
-
[36]
Roberts, M. E. (2018).Censored: Distraction and Diversion Inside China’s Great Firewall. Princeton University Press
work page 2018
-
[37]
Saleem, H. M., & Ruths, D. (2018). The Aftermath of Reddit Bans on Hate Communities.Proceedings of the International AAAI Conference on Web and Social Media (ICWSM),12(1), 313–322
work page 2018
-
[38]
Schauer, F. (1978). Fear, risk and the first amendment: Unraveling the ”chilling effect”.Boston Uni- versity Law Review,58, 685–732
work page 1978
-
[39]
B., Danescu-Niculescu-Mizil, C., Lee, L., & Tan, C
Srinivasan, K. B., Danescu-Niculescu-Mizil, C., Lee, L., & Tan, C. (2019). Content Removal as a Moderation Strategy: Compliance and Other Outcomes in the ChangeMyView Community. Proc. ACM Hum.-Comput. Interact.,3(CSCW). https://doi.org/10.1145/3359265 19
-
[40]
Sundar, S. S. (2020). Rise of machine agency: A framework for studying the psychology of human-AI interaction (HAII).Journal of Computer-Mediated Communication,25(1), 74–88
work page 2020
-
[42]
https://doi.org/https://doi-org.libproxy2.usc.edu/10.1002/9781118426456.ch3
Sons, Ltd. https://doi.org/https://doi-org.libproxy2.usc.edu/10.1002/9781118426456.ch3
-
[43]
Tyler, T. R. (1990).Why People Obey the Law. Yale University Press
work page 1990
-
[44]
WALTHER, J. B. (1996). Computer-Mediated Communication: Impersonal, Interpersonal, and Hyper- personal Interaction [ eprint: https://doi.org/10.1177/009365096023001001].Communication Research,23(1), 3–43. https://doi.org/10.1177/009365096023001001 20 Appendix Data Overview Table 2: Summary Statistics of Moderator Roles and Activity Metric Bot Modteam Pers...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.