pith. machine review for the scientific record. sign in

arxiv: 2605.04491 · v2 · submitted 2026-05-06 · 💻 cs.CY · cs.CR

Recognition: 2 theorem links

· Lean Theorem

An Evaluation of Chat Safety Moderations in Roblox

Priya Kaushik, Rakibul Hasan, Sazzadur Rahaman, Sonja Brown

Pith reviewed 2026-05-11 00:43 UTC · model grok-4.3

classification 💻 cs.CY cs.CR
keywords chat moderationRobloxonline safetyLLM evaluationgroomingbullyingcontent moderationgaming platforms
0
0 comments X

The pith

Roblox's chat moderation allows numerous unsafe messages about grooming, bullying, and self-harm to pass through undetected.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper gathers roughly two million chat messages from four public Roblox games and examines how well the platform's automated filters catch policy violations. It first manually labels nearly one hundred thousand messages as safe or unsafe, then tests four large language models against that ground truth before applying the strongest model to the full collection. The results show many messages involving grooming, sexualizing minors, harassment, violence, self-harm, and sharing private details still reach users. Flagged players continue sending harmful content by using a variety of wording changes and other evasion methods. Because Roblox serves hundreds of millions of users daily, including many children, these gaps matter for real-time safety in live game chats.

Core claim

Our analysis of approximately 2 million chat messages from public Roblox servers reveals that numerous instances of unsafe chat messages related to grooming, sexualizing minors, bullying and harassment, violence, self-harm, and sharing sensitive information escaped the current moderation system. Users whose messages were previously flagged continue to send harmful messages by employing a wide range of techniques to evade the moderation system.

What carries the argument

A two-step evaluation that manually labels 99.8K messages to benchmark four large language models, then applies the best-performing model across the full corpus followed by iterative open and axial coding to categorize the unsafe messages.

If this is right

  • Current automated filters do not catch all policy-violating chats in real time.
  • Users who have been flagged persist in sending harmful messages through evasion tactics.
  • Categories such as grooming and self-harm show particularly high rates of escape.
  • Hybrid systems that address wording changes and context shifts are needed beyond simple keyword or model checks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar large-scale audits on other platforms could reveal whether Roblox's gaps are typical for live gaming chats.
  • Adding detection for conversation patterns over multiple turns might reduce the success of single-message evasion.
  • Public release of categorized unsafe examples, stripped of identifiers, could help improve open models for moderation.
  • The volume of escaped content suggests that moderation teams should prioritize the most frequent evasion patterns identified.

Load-bearing premise

The best-performing large language model accurately classifies unsafe messages at scale with low error rates, and the four selected games plus public servers provide a representative sample of Roblox's overall chat environment.

What would settle it

A manual review of several thousand messages the model labeled safe that uncovers a high rate of previously missed unsafe content, or evidence that unsafe activity in the four sampled games differs sharply from other popular Roblox titles.

Figures

Figures reproduced from arXiv: 2605.04491 by Priya Kaushik, Rakibul Hasan, Sazzadur Rahaman, Sonja Brown.

Figure 1
Figure 1. Figure 1: Example of Roblox’s graphical user interface with chat bubbles and chat window with dynamic background (anonymized) view at source ↗
Figure 2
Figure 2. Figure 2: Moderated/hashed message count per user. users to frequency groups, we do so based on the proportion of their hashed messages that met our confidence criteria. Specifically, we compute a frequency ratio by dividing the number of hashed messages by the total number of messages. Users with a ratio of above 0.90 are assigned to the high-frequency group. Users with ratios between 0.50 and 0.90 are assigned to … view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of hash span lengths less than 50 characters long C.5 LLM Prompt view at source ↗
Figure 4
Figure 4. Figure 4: Standardized prompt used for LLM-based classification of conversations view at source ↗
read the original abstract

Roblox is among the most popular online gaming platforms, used by hundreds of millions of users every day. A substantial portion of these users are underage, who are at a greater risk, where abusive users may utilize Roblox's real-time chat interface to make the initial contact with potential victims. Roblox employs automated chat moderation mechanisms to detect potentially abusive messages; however, to date, their effectiveness has not been independently investigated. Toward this goal, we collected approximately 2 million chat messages from four games across multiple age groups and analyzed them to evaluate the moderation system. These messages were collected from public game servers following ethical and legal norms as well as Roblox's terms of service. We use this corpus to qualitatively study which types of unsafe chats escape the moderation system and how policy-violating users evade the moderation system. Given the dataset's scale, it is prohibitively expensive to conduct qualitative content analysis manually. Therefore, we adopt a two-step approach. First, we manually labeled safe and unsafe messages (n=99.8K) and used them as a ground truth to evaluate four locally hosted state-of-the-art large language models (LLMs). Next, the best-performing LLM was applied to the entire corpus to identify potentially unsafe messages, which we manually categorized using iterative open and axial coding methods until thematic saturation was reached. Overall, our findings reveal a troublesome reality: numerous instances of unsafe chat messages related to grooming, sexualizing minors, bullying, & harassment, violence, self-harm, and sharing sensitive information, etc., escaped the current moderation. Our analysis of users whose messages were previously flagged revealed that they continue to send harmful messages by employing a wide range of techniques to evade the moderation system.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper claims to evaluate Roblox's chat moderation effectiveness by collecting ~2 million messages from public servers in four games, manually labeling 99.8K messages as safe/unsafe to benchmark four locally hosted LLMs, selecting and deploying the best-performing LLM on the full corpus to flag unsafe content, and then applying iterative open/axial qualitative coding (until thematic saturation) to categorize the flagged messages. The central findings are that numerous unsafe messages involving grooming, sexualizing minors, bullying/harassment, violence, self-harm, and sharing sensitive information escape moderation, and that previously flagged users evade detection via a range of techniques.

Significance. If the LLM classification step is reliable, the work provides a rare independent, large-scale observational assessment of moderation on a platform with hundreds of millions of daily users, many underage. The mixed-methods design (manual ground-truth labeling + LLM scaling + qualitative thematic analysis) is a pragmatic way to study rare but high-stakes events at scale. The ethical data collection and focus on concrete evasion strategies are positive features. The significance is limited, however, by the absence of detailed classifier performance data, which directly affects whether the reported escapes reflect genuine moderation failures.

major comments (2)
  1. [Methods (LLM evaluation)] Methods (LLM evaluation): The 99.8K manually labeled messages are used to evaluate four LLMs and select the best one for labeling the remaining corpus, yet no performance metrics (accuracy, per-class precision/recall, F1, or confusion matrix) are reported for the held-out test set. Unsafe categories are rare; without these numbers the flagged set used for qualitative coding may contain high false-positive rates, directly weakening the central claim that numerous grooming, self-harm, and similar messages escape moderation.
  2. [Results (large-scale scan)] Results (large-scale scan): No false-positive analysis, error bars, or uncertainty estimates are provided for the LLM labels applied to the ~2M-message corpus. Given the expected class imbalance, even modest per-class error rates could substantially inflate the count of reported unsafe messages and the qualitative themes derived from them.
minor comments (3)
  1. [Abstract] Abstract: The abstract refers to the 'best-performing LLM' without naming the model or summarizing its quantitative performance on the labeled set, which would allow readers to gauge reliability before the large-scale results.
  2. [Qualitative analysis] Qualitative analysis: The description of iterative open and axial coding does not report the number of coders, any inter-rater reliability measure, or the precise criteria used to determine thematic saturation.
  3. [Data collection] Sampling: The rationale for selecting the four specific games and public servers (versus other games or private servers) is not elaborated, limiting assessment of how representative the observed moderation failures are of Roblox overall.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the potential significance of our study on Roblox's chat moderation. We address the major comments below and have updated the manuscript to incorporate additional quantitative details on the LLM performance to enhance the robustness of our findings.

read point-by-point responses
  1. Referee: Methods (LLM evaluation): The 99.8K manually labeled messages are used to evaluate four LLMs and select the best one for labeling the remaining corpus, yet no performance metrics (accuracy, per-class precision/recall, F1, or confusion matrix) are reported for the held-out test set. Unsafe categories are rare; without these numbers the flagged set used for qualitative coding may contain high false-positive rates, directly weakening the central claim that numerous grooming, self-harm, and similar messages escape moderation.

    Authors: We agree with the referee that detailed performance metrics are crucial for validating the LLM's reliability, especially with rare unsafe categories. Although we evaluated the four LLMs on a held-out portion of the 99.8K labeled messages to select the best model, we did not report the specific metrics in the initial manuscript to prioritize the qualitative analysis of evasion techniques. In the revised version, we will include accuracy, per-class precision, recall, F1 scores, and a confusion matrix for the test set across all models. This addition will enable readers to assess the false-positive rates and confirm the appropriateness of our model selection. revision: yes

  2. Referee: Results (large-scale scan): No false-positive analysis, error bars, or uncertainty estimates are provided for the LLM labels applied to the ~2M-message corpus. Given the expected class imbalance, even modest per-class error rates could substantially inflate the count of reported unsafe messages and the qualitative themes derived from them.

    Authors: We acknowledge that providing uncertainty estimates would strengthen the large-scale results. We will add to the revised manuscript a discussion of the implications of the test set performance for the full corpus, including estimated false-positive rates and their potential impact on the thematic findings. Additionally, we will clarify our sampling strategy for the qualitative coding, which focused on messages flagged by the LLM to explore potential unsafe content while noting the limitations due to class imbalance. Full per-message uncertainty quantification is beyond the current scope but will be noted as a limitation. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical observational study

full rationale

The paper describes an empirical workflow with no equations, fitted parameters, derivations, or predictions that reduce to inputs by construction. It collects ~2M messages, manually labels a 99.8K subset as ground truth, evaluates four LLMs on that subset, applies the best model to the full corpus, and performs qualitative coding on the flagged messages. All central claims (unsafe messages escaping moderation, evasion techniques) rest on this direct data collection and classification process rather than any self-referential reduction, self-citation load-bearing premise, or renaming of known results. Potential concerns about LLM accuracy on rare categories or sample representativeness are validity/generalizability issues, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the work relies on standard qualitative coding practices and off-the-shelf LLMs applied to newly collected data.

pith-pipeline@v0.9.0 · 5615 in / 1074 out tokens · 36259 ms · 2026-05-11T00:43:07.036743+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

120 extracted references · 47 canonical work pages · 1 internal anchor

  1. [1]

    Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K Arora, Yu Bai, Bowen Baker, Haiming Bao, et al. 2025. gpt-oss-120b & gpt-oss-20b model card.arXiv preprint arXiv:2508.10925(2025)

  2. [2]

    Syed Hammad Ahmed, Shengnan Hu, and Gita Reese Sukthankar. 2023. The Potential of Vision-Language Models for Content Moderation of Children’s Videos.2023 International Conference on Machine Learning and Applications (ICMLA)(2023), 1237–1241. https://api.semanticscholar.org/CorpusID:266053028

  3. [3]

    Çinare O𝑢guz Aliyeva and Mete Ya𝑢gano𝑢glu. 2025. Deep learning approach to detect cyberbullying on twitter.Multimedia Tools and Applications84, 19 (2025), 20497–20520

  4. [4]

    Heajun An, Marcos Silva, Qi Zhang, Arav Singh, Minqian Liu, Xinyi Zhang, Sarvech Qadir, Sang Won Lee, Lifu Huang, Pamela Wisnieswski, et al . 2025. Toward Integrated Solutions: A Systematic Interdisciplinary Review of Cyber- grooming Research.arXiv preprint arXiv:2503.05727(2025)

  5. [5]

    Issa Annamoradnejad. 2022. Requirements for automating moderation in com- munity question-answering websites. InProceedings of the 15th Innovations in Software Engineering Conference. Association for Computing Machinery, New York, NY, USA, 1–4

  6. [6]

    Senior Vice President of Engineering Anupam Singh. 2025. The infrastructure supporting record-breaking experiences. (Jun 2025). https://about.roblox.com/ newsroom/2025/06/roblox-infrastructure-supporting-record-breaking-games

  7. [7]

    Arnt and S

    A. Arnt and S. Zilberstein. 2003. Learning to Perform Moderation in Online Forums. InProceedings IEEE/WIC International Conference on Web Intelligence (WI 2003). IEEE, Halifax, NS, Canada, 637–641. https://doi.org/10.1109/WI.2003. 1241285

  8. [8]

    Basel Barakat and Sardar Jaf. 2025. Beyond Traditional Classifiers: Evaluating Large Language Models for Robust Hate Speech Detection.Computation13, 8, Article 196 (2025), 19 pages. https://doi.org/10.3390/computation13080196

  9. [9]

    Geetanjali Bihani and Julia Rayz. 2025. A Fuzzy Evaluation of Sentence Encoders on Grooming Risk Classification. (2025). arXiv:cs.CL/2502.12576 https://arxiv. org/abs/2502.12576

  10. [10]

    G Bradski. 2025. OpenCV: Image Thresholding — docs.opencv.org. https://docs. opencv.org/4.x/d7/da8/tutorial_table_of_content_imgproc.html. (2025). [Ac- cessed 19-03-2026]

  11. [11]

    Briskilal, M

    J. Briskilal, M. Jaya Karthik, and Sai Praneeth. 2024. Detection of offensive text in memes using Deep Learning Techniques.AIP Conference Proceedings3075 (Jul 2024), 124484–124498. https://doi.org/10.1063/5.0217063

  12. [12]

    Jie Cai, Aashka Patel, Azadeh Naderi, and Donghee Yvette Wohn. 2024. Content moderation justice and fairness on social media: Comparisons across different contexts and platforms. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 84, 9 pages. https://doi.org/10....

  13. [13]

    Olivia Carville and Cecilia D’Anastasio. 2024. Roblox’s Pedophile Problam. (2024). https://www.bloomberg.com/features/2024-roblox-pedophile-problem/

  14. [14]

    Teng-Chang Chang, Sendren Sheng-Dong Xu, and Shun-Feng Su. 2015. SSIM- based quality-on-demand energy-saving schemes for OLED displays.IEEE Trans- actions on Systems, Man, and Cybernetics: Systems46, 5 (2015), 623–635

  15. [15]

    Chatbot App. 2026. AI Chatbot. https://chat.chatbot.app/?model=gemini-3-pro. (2026). Accessed: 2026-04-30

  16. [16]

    Mohamed Chawki. 2025. AI Moderation and Legal Frameworks in Child- Centric Social Media: A Case Study of Roblox. (2025). https://doi.org/10.3390/ laws14030029

  17. [17]

    Jesse Chen, Dharun Anandayuvaraj, James C Davis, and Sazzadur Rahaman. 2024. (Jul 2024). https://arxiv.org/pdf/2310.01653

  18. [18]

    Tsong Yueh Chen, Shing-Chi Cheung, and Siu-Ming Yiu. 2020. Metamorphic Testing: A New Approach for Generating Next Test Cases.CoRRabs/2002.12543 (2020). arXiv:2002.12543 https://arxiv.org/abs/2002.12543

  19. [19]

    Jessica Clement. 2025. Roblox Global Daily Active Users. https://www.statista. com/statistics/1192573/daily-active-users-global-roblox/. (2025)

  20. [20]

    Federal Trade Commission. 2013. Children’s Online Privacy Protection Rule: Final Rule Amendments. 16 C.F.R. Part 312. (2013). https://www.ftc.gov/legal-library/ browse/rules/childrens-online-privacy-protection-rule-coppa 78 FR 3972

  21. [21]

    Roblox Corporation. 2020. How We Scaled Bert to Serve 1+ Billion Daily Re- quests on CPUs. (May 2020). https://about.roblox.com/newsroom/2020/05/ scaled-bert-serve-1-billion-daily-requests-cpus

  22. [22]

    Roblox Corporation. 2025. Parental Controls Overview. https://en.help.roblox. com/hc/en-us/articles/30428310121620-Parental-Controls-Overview. (2025)

  23. [23]

    Roblox Corporation. 2026. Attributes and Signals. (2026). https://create.roblox. com/docs/matchmaking/attributes-and-signals#existing-signals

  24. [24]

    Roblox Corporation. 2026. Content Moderation on Roblox. https://en.help.roblox. com/hc/en-us/articles/21416271342868-Content-Moderation-on-Roblox. (2026)

  25. [25]

    Roblox Corporation. 2026. Korblox Deathwalker. (2026). https://www.roblox. com/bundles/319226/Korblox-Deathwalker

  26. [26]

    Roblox Corporation. 2026. Matchmaking. (2026). https://create.roblox.com/docs/ matchmaking

  27. [27]

    Roblox Corporation. 2026. Player. (2026). https://create.roblox.com/docs/ reference/engine/classes/Player#Idled

  28. [28]

    Roblox Corporation. 2026. Roblox Community Standards. https://en.help.roblox. com/hc/en-us/articles/203313410-Roblox-Community-Standards. (2026)

  29. [29]

    Roblox Corporation. 2026. Roblox Community Standards | Roblox — about.roblox.com. https://about.roblox.com/community-standards. (2026)

  30. [30]

    Roblox Corporation. 2026. Roblox experiences. (2026). https://www.roblox.com/ charts?device=computer&country=us

  31. [31]

    Roblox Corporation. 2026. Roblox Safety Tools. https://about.roblox.com/ safety-tools. (2026)

  32. [32]

    Roblox Corporation. 2026. Roblox Terms of Use — en.help.roblox.com. https:// en.help.roblox.com/hc/en-us/articles/115004647846-{R}oblox-{T}erms-of-{U}se. (2026)

  33. [33]

    Roblox Corporation. 2026. Server Scoring. (2026). https://create.roblox.com/ docs/matchmaking/scoring

  34. [34]

    Roblox Corporation. 2026. Text chat overview. (2026). https://create.roblox.com/ docs/chat/in-experience-text-chat

  35. [35]

    Roblox Corporation. 2026. Text filtering. (2026). https://create.roblox.com/docs/ ui/text-filtering

  36. [36]

    Samantha Craven, Sarah Brown, and Elizabeth Gilchrist. 2006. Sexual grooming of children: Review of literature and theoretical considerations.Journal of sexual aggression12, 3 (2006), 287–299

  37. [37]

    Wajeeh Daher. 2023. Saturation in Qualitative Educational Technology Research. Education Sciences13, 2 (2023). https://doi.org/10.3390/educsci13020098

  38. [38]

    Primavera De Filippi and Nathan Schneider. 2021. Peer Governance in Online Communities. (2021), 771586 pages

  39. [39]

    Naveen Reddy Dendi. 2025. Real-Time Content Moderation in Gaming Platforms: Technical Frameworks for Child Protection.Journal of Computer Science and Technology Studies7, 9 (Aug. 2025), 01–08. https://doi.org/10.32996/jcsts.2025.7.9.1

  40. [40]

    Statista Research Department. 2025. Gaming Reach Worldwide by Age and Gender 2025. (Nov 2025). https://www.statista.com/statistics/326420/ console-gamers-gender/

  41. [42]

    Angel Diaz-Garcia and Joao Paulo Carvalho

    J. Angel Diaz-Garcia and Joao Paulo Carvalho. 2025. A Literature Review of Textual Cyber Abuse Detection Using Cutting-Edge Natural Language Processing Techniques: Language Models and Large Language Models.WIREs Data Mining and Knowledge Discovery15, 3 (Jun 2025), 38. https://doi.org/10.1002/widm.70029

  42. [43]

    Divya, G

    P. Divya, G. Samprakash, B. Yazhini, R. Kesavan, R. Saravanakumar, and S. Jeya Lakshmi. 2025. AI-based Content Moderation System for Offensive Data Detec- tion.2025 8th International Conference on Computing Methodologies and Commu- nication (ICCMC)(2025), 1803–1809. https://doi.org/10.1109/ICCMC65190.2025. 11140757

  43. [44]

    Nicola Döring. 2014. Consensual sexting among adolescents: Risk prevention through abstinence education or safer sexting?Cyberpsychology: Journal of Psychosocial Research on Cyberspace8, 1 (2014)

  44. [45]

    Ben Ellery. 2025. Roblox Safety Failings Leave Children at Risk, Claim Experts. (2025). https://www.thetimes.com/uk/law/article/ roblox-kk-swastikas-childrens-safety-fx5n6tcl7

  45. [46]

    Fatmaelzahraa Eltaher, Rahul Krishna Gajula, Luis Miralles-Pechuán, Patrick Crotty, Juan Martínez-Otero, Christina Thorpe, and Susan McKeever. 2025. Pro- tecting Young Users on Social Media: Evaluating the Effectiveness of Content Moderation and Legal Safeguards on Video Sharing Platforms. (2025)

  46. [47]

    Larissa Engelmann, Christine A Weirich, and Corinne May-Chahal. 2025. Devel- oping quality standards for community-based online child sexual exploitation and abuse interventions.Child Abuse & Neglect164 (2025), 107444

  47. [48]

    Ahmet M Eskicioglu and Paul S Fisher. 1995. Image quality measures and their performance.IEEE Transactions on communications43, 12 (1995), 2959–2965

  48. [49]

    Fandom. 2021. GameCharlie1. (2021). https://robloxcities.fandom.com/wiki/ GameCharlie1

  49. [50]

    Casey Fiesler, Nathan Beard, and Brian C. Keegan. 2020. No Robots, Spiders, or Scrapers: Legal and Ethical Regulation of Data Collection Methods in Social Media Terms of Service. InProceedings of the Fourteenth International AAAI Conference on Web and Social Media, ICWSM 2020, Held Virtually, Original Venue: Atlanta, Georgia, USA, June 8-11, 2020, Munmun ...

  50. [51]

    David Finkelhor, Heather Turner, and Deirdre Colburn. 2022. Prevalence of online sexual offenses against children in the US.JAMA network open5, 10 (2022), e2234471–e2234471

  51. [52]

    Susan Flynn, Rose Doolan Maher, and Julie Byrne. 2024. Child protection and welfare risks and opportunities related to disability and internet use: Broadening current conceptualisations through critical literature review.Children and Youth Services Review157 (2024), 107410

  52. [53]

    Yannakakis

    Roberto Gallotta, Graham Todd, Marvin Zammit, Sam Earle, Antonios Liapis, Julian Togelius, and Georgios N. Yannakakis. 2024. Large Language Models and Games: A Survey and Roadmap.IEEE Transactions on Games(2024), 1–18. https://doi.org/10.1109/TG.2024.3461510

  53. [54]

    Valerio La Gatta, Luca Luceri, Francesco Fabbri, and Emilio Ferrara. 2023. The interconnected nature of online harm and moderation: Investigating the cross- platform spread of harmful content between youtube and twitter. InProceedings of the 34th ACM conference on hypertext and social media. 1–10

  54. [55]

    Tarleton Gillespie. 2020. Content moderation, AI, and the question of scale.Big Data & Society7, 2 (2020), 2053951720943234

  55. [57]

    Ian Goldstein, Laura Edelson, Minh-Kha Nguyen, Oana Goga, Damon McCoy, and Tobias Lauinger. 2023. Understanding the (In)Effectiveness of Content Moderation: A Case Study of Facebook in the Context of the U.S. Capitol Riot. (2023). arXiv:cs.SI/2301.02737 https://arxiv.org/abs/2301.02737

  56. [58]

    Kênia C Gonçalves, Flávio Soriano, Humberto T Marques-Neto, and Jussara M Almeida. 2026. Potential Exposure of Kids to Age-Inappropriate Content on Twitch: A Comparative Cross-Country Study.Social Network Analysis and Mining 16, 1 (2026), 1

  57. [59]

    Agam Goyal, Xianyang Zhan, Yilun Chen, Koustuv Saha, and Eshwar Chan- drasekharan. 2025. MoMoE: Mixture of Moderation Experts Framework for AI- Assisted Online Governance. InProceedings of the 2025 Congerence on Empirical Methods in Natural Language Processing. Association for Computational Linguis- tics, Suzhou, China, 12645–12660. https://doi.org/10.186...

  58. [60]

    Grammarly, Inc. 2026. Grammarly. (2026). https://app.grammarly.com/ Accessed: 30 April 2026

  59. [61]

    2024.Computer Vision: Challenges, Trends, and Opportunies

    Mahmudul Hasan, Kishan Shamsundar Athrey, Arfeen Khalid, Danfeng Xie, Ehsan Younessian, and Tony Braskich. 2024.Computer Vision: Challenges, Trends, and Opportunies. CRC Press, Boca Raton, FL, Chapter Applications of computer vision in entertainment and Media Industry, 205–238. https://doi.org/10.1201/ 9781003328957-10

  60. [62]

    Sameer Hinduja and Justin W Patchin. 2013. Social influences on cyberbully- ing behaviors among middle and high school students.Journal of youth and adolescence42, 5 (2013), 711–722

  61. [63]

    Monica Hong. 2024. The impact of social media in child sexual abuse.Journal of Paediatrics and Child Health60, 10 (2024), 476–478

  62. [64]

    Traci Hong, Zilu Tang, Manyuan Lu, Yunwen Wang, Jiaxi Wu, and Derry Wijaya

  63. [65]

    https: //doi.org/10.1177/14614448231187529

    Effects of #coronavirus content moderation on misinformation and anti- Asian hate on Instagram.New Media & Society27, 2 (2025), 931–954. https: //doi.org/10.1177/14614448231187529

  64. [66]

    James M Hudson and Amy Bruckman. 2004. “Go away”: Participant objections to being studied and the ethics of chatroom research.The information society20, 2 (2004), 127–139

  65. [67]

    Instagram. 2025. Parental Supervision. https://help.instagram.com/ 309877544512275. (2025)

  66. [68]

    Brubaker, and Casey Fiesler

    Jialun ’Aaron’ Jiang, Skyler Middler, Jed R. Brubaker, and Casey Fiesler. 2020. Characterizing Community Guidelines on Social Media Platforms. InCompanion Publication of the 2020 Conference on Computer Supported Cooperative Work and Social Computing (CSCW ’20 Companion). Association for Computing Machinery, New York, NY, USA, 287–291. https://doi.org/10.1...

  67. [69]

    Mekha Jose, Jocelyn Anthony, Jose V Joseph, Joshwa Thomas, and Sharon Baby Thomas. 2025. Automated Detection of Offensive Text in Social Media Images. In 2025 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA). IEEE, Cochin, Kerala, India, 1–6. https://doi.org/10. 1109/ACCTHPA65749.2025.11168676

  68. [70]

    Prerna Juneja, Md Momen Bhuiyan, and Tanushree Mitra. 2023. Assessing enactment of content regulation policies: A post hoc crowd-sourced audit of election misinformation on YouTube. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 545, 22 pages. https:...

  69. [71]

    Caroline W Kabiru, Helen H Habib, Sam Beckwith, Anthony Idowu Ajayi, Sheila Mukabana, Beryl Nyatuga Machoka, Robert Wm Blum, and Anna E Kagesten

  70. [72]

    (Oct 2024)

    Risk and Protective Factors for the Sexual and Reproductive Health of Young Adolescents: Lessons Learnt in the Past Decade and Research Priorities Moving Forward. (Oct 2024). https://doi.org/10.1016/j.jadohealth.2024.03.007

  71. [73]

    Joseph M Kayany. 1998. Contexts of Uninhibited Online Behavior: Flaming in Social Newsgroups on Usenet.Journal of the American Society for Information Science49, 12 (1998), 1135–1141

  72. [74]

    Sangmin Kim, Byeongcheon Lee, Muazzam Maqsood, Jihoon Moon, and Seung- min Rho. 2025. Deep Learning-Based Natural Language Processing Model and Optical Character Recognition for Detection of Online Grooming on Social Net- working Services.Computer Modeling in Engineering & Sciences143, 2 (2025), 2079–2108. https://doi.org/10.32604/cmes.2025.061653

  73. [75]

    Naren Koneru. 2025. How Roblox uses AI to Moderate Content on a Massive Scale. (Jul 2025). https://about.roblox.com/newsroom/2025/07/ roblox-ai-moderation-massive-scale

  74. [76]

    The System is Made to Inherently Push Child Gambling in my Opinion

    Yubo Kou, Rie Helene Hernandez, and Xinning Gui. 2025. “The System is Made to Inherently Push Child Gambling in my Opinion”: Child Safety, Monetization, and Moderation on Roblox. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–18

  75. [77]

    Deepak Kumar, Yousef Anees AbuHashem, and Zakir Durumeric. 2024. Watch your language: Investigating content moderation with large language models. Proceedings of the International AAAI Conference on Web and Social Media18 (May 2024), 865–878. https://doi.org/10.1609/icwsm.v18i1.31358

  76. [78]

    2012.Children, risk and safety on the internet: Research and policy challenges in comparative perspective

    Sonia Livingstone and Leslie Haddon. 2012.Children, risk and safety on the internet: Research and policy challenges in comparative perspective. Policy Press, Chicago, IL, USA

  77. [79]

    Rahul Makhijani, Parikshit Shah, Vashist Avadhanula, Caner Gocmen, Nicolás E Stier-Moses, and Julián Mestre. 2021. Quest: Queue simulation for content moderation at scale.arXiv preprint arXiv:2103.168162103.16816 (2021), 9

  78. [80]

    Alvaro Maranon and Dalia Wrocherinsky. 2023. Public Panics and Youth Online Safety – a Deep Dive. (Jul 2023). https://project-disco.org/featured/ public-panics-and-youth-online-safety-a-deep-dive/

  79. [81]

    J Nathan Matias. 2019. The Civic Labor of Volunteer Moderators Online.Social Media+ Society5, 2 (2019), 2056305119836778

  80. [82]

    Niall McCrae, Sheryl Gettings, and Edward Purssell. 2017. Social media and de- pressive symptoms in childhood and adolescence: A systematic review.Adolescent Research Review2, 4 (2017), 315–330

Showing first 80 references.