arxiv: 2605.04491 · v2 · submitted 2026-05-06 · 💻 cs.CY · cs.CR

Recognition: 2 theorem links

· Lean Theorem

An Evaluation of Chat Safety Moderations in Roblox

Priya Kaushik, Rakibul Hasan, Sazzadur Rahaman, Sonja Brown

Pith reviewed 2026-05-11 00:43 UTC · model grok-4.3

classification 💻 cs.CY cs.CR

keywords chat moderationRobloxonline safetyLLM evaluationgroomingbullyingcontent moderationgaming platforms

0 comments

The pith

Roblox's chat moderation allows numerous unsafe messages about grooming, bullying, and self-harm to pass through undetected.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper gathers roughly two million chat messages from four public Roblox games and examines how well the platform's automated filters catch policy violations. It first manually labels nearly one hundred thousand messages as safe or unsafe, then tests four large language models against that ground truth before applying the strongest model to the full collection. The results show many messages involving grooming, sexualizing minors, harassment, violence, self-harm, and sharing private details still reach users. Flagged players continue sending harmful content by using a variety of wording changes and other evasion methods. Because Roblox serves hundreds of millions of users daily, including many children, these gaps matter for real-time safety in live game chats.

Core claim

Our analysis of approximately 2 million chat messages from public Roblox servers reveals that numerous instances of unsafe chat messages related to grooming, sexualizing minors, bullying and harassment, violence, self-harm, and sharing sensitive information escaped the current moderation system. Users whose messages were previously flagged continue to send harmful messages by employing a wide range of techniques to evade the moderation system.

What carries the argument

A two-step evaluation that manually labels 99.8K messages to benchmark four large language models, then applies the best-performing model across the full corpus followed by iterative open and axial coding to categorize the unsafe messages.

If this is right

Current automated filters do not catch all policy-violating chats in real time.
Users who have been flagged persist in sending harmful messages through evasion tactics.
Categories such as grooming and self-harm show particularly high rates of escape.
Hybrid systems that address wording changes and context shifts are needed beyond simple keyword or model checks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar large-scale audits on other platforms could reveal whether Roblox's gaps are typical for live gaming chats.
Adding detection for conversation patterns over multiple turns might reduce the success of single-message evasion.
Public release of categorized unsafe examples, stripped of identifiers, could help improve open models for moderation.
The volume of escaped content suggests that moderation teams should prioritize the most frequent evasion patterns identified.

Load-bearing premise

The best-performing large language model accurately classifies unsafe messages at scale with low error rates, and the four selected games plus public servers provide a representative sample of Roblox's overall chat environment.

What would settle it

A manual review of several thousand messages the model labeled safe that uncovers a high rate of previously missed unsafe content, or evidence that unsafe activity in the four sampled games differs sharply from other popular Roblox titles.

Figures

Figures reproduced from arXiv: 2605.04491 by Priya Kaushik, Rakibul Hasan, Sazzadur Rahaman, Sonja Brown.

**Figure 1.** Figure 1: Example of Roblox’s graphical user interface with chat bubbles and chat window with dynamic background (anonymized) view at source ↗

**Figure 2.** Figure 2: Moderated/hashed message count per user. users to frequency groups, we do so based on the proportion of their hashed messages that met our confidence criteria. Specifically, we compute a frequency ratio by dividing the number of hashed messages by the total number of messages. Users with a ratio of above 0.90 are assigned to the high-frequency group. Users with ratios between 0.50 and 0.90 are assigned to … view at source ↗

**Figure 3.** Figure 3: Distribution of hash span lengths less than 50 characters long C.5 LLM Prompt view at source ↗

**Figure 4.** Figure 4: Standardized prompt used for LLM-based classification of conversations view at source ↗

read the original abstract

Roblox is among the most popular online gaming platforms, used by hundreds of millions of users every day. A substantial portion of these users are underage, who are at a greater risk, where abusive users may utilize Roblox's real-time chat interface to make the initial contact with potential victims. Roblox employs automated chat moderation mechanisms to detect potentially abusive messages; however, to date, their effectiveness has not been independently investigated. Toward this goal, we collected approximately 2 million chat messages from four games across multiple age groups and analyzed them to evaluate the moderation system. These messages were collected from public game servers following ethical and legal norms as well as Roblox's terms of service. We use this corpus to qualitatively study which types of unsafe chats escape the moderation system and how policy-violating users evade the moderation system. Given the dataset's scale, it is prohibitively expensive to conduct qualitative content analysis manually. Therefore, we adopt a two-step approach. First, we manually labeled safe and unsafe messages (n=99.8K) and used them as a ground truth to evaluate four locally hosted state-of-the-art large language models (LLMs). Next, the best-performing LLM was applied to the entire corpus to identify potentially unsafe messages, which we manually categorized using iterative open and axial coding methods until thematic saturation was reached. Overall, our findings reveal a troublesome reality: numerous instances of unsafe chat messages related to grooming, sexualizing minors, bullying, & harassment, violence, self-harm, and sharing sensitive information, etc., escaped the current moderation. Our analysis of users whose messages were previously flagged revealed that they continue to send harmful messages by employing a wide range of techniques to evade the moderation system.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Roblox moderation misses grooming, self-harm and similar chats in a 2M-message scan, but the LLM step lacks per-class validation on those rare categories.

read the letter

Roblox's automated chat filters let through grooming attempts, sexualizing minors, bullying, violence, self-harm references, and sharing of personal details. The authors pulled about 2 million messages from public servers in four games, hand-labeled 99.8k of them to benchmark four local LLMs, picked the strongest model, ran it across the full set, and then did open and axial coding on the flagged messages until saturation. They also tracked evasion tactics from users who had been flagged before.

Referee Report

2 major / 3 minor

Summary. The paper claims to evaluate Roblox's chat moderation effectiveness by collecting ~2 million messages from public servers in four games, manually labeling 99.8K messages as safe/unsafe to benchmark four locally hosted LLMs, selecting and deploying the best-performing LLM on the full corpus to flag unsafe content, and then applying iterative open/axial qualitative coding (until thematic saturation) to categorize the flagged messages. The central findings are that numerous unsafe messages involving grooming, sexualizing minors, bullying/harassment, violence, self-harm, and sharing sensitive information escape moderation, and that previously flagged users evade detection via a range of techniques.

Significance. If the LLM classification step is reliable, the work provides a rare independent, large-scale observational assessment of moderation on a platform with hundreds of millions of daily users, many underage. The mixed-methods design (manual ground-truth labeling + LLM scaling + qualitative thematic analysis) is a pragmatic way to study rare but high-stakes events at scale. The ethical data collection and focus on concrete evasion strategies are positive features. The significance is limited, however, by the absence of detailed classifier performance data, which directly affects whether the reported escapes reflect genuine moderation failures.

major comments (2)

[Methods (LLM evaluation)] Methods (LLM evaluation): The 99.8K manually labeled messages are used to evaluate four LLMs and select the best one for labeling the remaining corpus, yet no performance metrics (accuracy, per-class precision/recall, F1, or confusion matrix) are reported for the held-out test set. Unsafe categories are rare; without these numbers the flagged set used for qualitative coding may contain high false-positive rates, directly weakening the central claim that numerous grooming, self-harm, and similar messages escape moderation.
[Results (large-scale scan)] Results (large-scale scan): No false-positive analysis, error bars, or uncertainty estimates are provided for the LLM labels applied to the ~2M-message corpus. Given the expected class imbalance, even modest per-class error rates could substantially inflate the count of reported unsafe messages and the qualitative themes derived from them.

minor comments (3)

[Abstract] Abstract: The abstract refers to the 'best-performing LLM' without naming the model or summarizing its quantitative performance on the labeled set, which would allow readers to gauge reliability before the large-scale results.
[Qualitative analysis] Qualitative analysis: The description of iterative open and axial coding does not report the number of coders, any inter-rater reliability measure, or the precise criteria used to determine thematic saturation.
[Data collection] Sampling: The rationale for selecting the four specific games and public servers (versus other games or private servers) is not elaborated, limiting assessment of how representative the observed moderation failures are of Roblox overall.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the potential significance of our study on Roblox's chat moderation. We address the major comments below and have updated the manuscript to incorporate additional quantitative details on the LLM performance to enhance the robustness of our findings.

read point-by-point responses

Referee: Methods (LLM evaluation): The 99.8K manually labeled messages are used to evaluate four LLMs and select the best one for labeling the remaining corpus, yet no performance metrics (accuracy, per-class precision/recall, F1, or confusion matrix) are reported for the held-out test set. Unsafe categories are rare; without these numbers the flagged set used for qualitative coding may contain high false-positive rates, directly weakening the central claim that numerous grooming, self-harm, and similar messages escape moderation.

Authors: We agree with the referee that detailed performance metrics are crucial for validating the LLM's reliability, especially with rare unsafe categories. Although we evaluated the four LLMs on a held-out portion of the 99.8K labeled messages to select the best model, we did not report the specific metrics in the initial manuscript to prioritize the qualitative analysis of evasion techniques. In the revised version, we will include accuracy, per-class precision, recall, F1 scores, and a confusion matrix for the test set across all models. This addition will enable readers to assess the false-positive rates and confirm the appropriateness of our model selection. revision: yes
Referee: Results (large-scale scan): No false-positive analysis, error bars, or uncertainty estimates are provided for the LLM labels applied to the ~2M-message corpus. Given the expected class imbalance, even modest per-class error rates could substantially inflate the count of reported unsafe messages and the qualitative themes derived from them.

Authors: We acknowledge that providing uncertainty estimates would strengthen the large-scale results. We will add to the revised manuscript a discussion of the implications of the test set performance for the full corpus, including estimated false-positive rates and their potential impact on the thematic findings. Additionally, we will clarify our sampling strategy for the qualitative coding, which focused on messages flagged by the LLM to explore potential unsafe content while noting the limitations due to class imbalance. Full per-message uncertainty quantification is beyond the current scope but will be noted as a limitation. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical observational study

full rationale

The paper describes an empirical workflow with no equations, fitted parameters, derivations, or predictions that reduce to inputs by construction. It collects ~2M messages, manually labels a 99.8K subset as ground truth, evaluates four LLMs on that subset, applies the best model to the full corpus, and performs qualitative coding on the flagged messages. All central claims (unsafe messages escaping moderation, evasion techniques) rest on this direct data collection and classification process rather than any self-referential reduction, self-citation load-bearing premise, or renaming of known results. Potential concerns about LLM accuracy on rare categories or sample representativeness are validity/generalizability issues, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the work relies on standard qualitative coding practices and off-the-shelf LLMs applied to newly collected data.

pith-pipeline@v0.9.0 · 5615 in / 1074 out tokens · 36259 ms · 2026-05-11T00:43:07.036743+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
we adopt a two-step approach. First, we manually labeled safe and unsafe messages (n=99.8K) and used them as a ground truth to evaluate four locally hosted state-of-the-art large language models (LLMs). Next, the best-performing LLM was applied to the entire corpus to identify potentially unsafe messages, which we manually categorized using iterative open and axial coding methods until thematic saturation was reached.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
numerous instances of unsafe chat messages related to grooming, sexualizing minors, bullying, & harassment, violence, self-harm, and sharing sensitive information, etc., escaped the current moderation.

Reference graph

Works this paper leans on

120 extracted references · 47 canonical work pages · 1 internal anchor

[1]

Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K Arora, Yu Bai, Bowen Baker, Haiming Bao, et al. 2025. gpt-oss-120b & gpt-oss-20b model card.arXiv preprint arXiv:2508.10925(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Syed Hammad Ahmed, Shengnan Hu, and Gita Reese Sukthankar. 2023. The Potential of Vision-Language Models for Content Moderation of Children’s Videos.2023 International Conference on Machine Learning and Applications (ICMLA)(2023), 1237–1241. https://api.semanticscholar.org/CorpusID:266053028

2023
[3]

Çinare O𝑢guz Aliyeva and Mete Ya𝑢gano𝑢glu. 2025. Deep learning approach to detect cyberbullying on twitter.Multimedia Tools and Applications84, 19 (2025), 20497–20520

2025
[4]

Heajun An, Marcos Silva, Qi Zhang, Arav Singh, Minqian Liu, Xinyi Zhang, Sarvech Qadir, Sang Won Lee, Lifu Huang, Pamela Wisnieswski, et al . 2025. Toward Integrated Solutions: A Systematic Interdisciplinary Review of Cyber- grooming Research.arXiv preprint arXiv:2503.05727(2025)

work page arXiv 2025
[5]

Issa Annamoradnejad. 2022. Requirements for automating moderation in com- munity question-answering websites. InProceedings of the 15th Innovations in Software Engineering Conference. Association for Computing Machinery, New York, NY, USA, 1–4

2022
[6]

Senior Vice President of Engineering Anupam Singh. 2025. The infrastructure supporting record-breaking experiences. (Jun 2025). https://about.roblox.com/ newsroom/2025/06/roblox-infrastructure-supporting-record-breaking-games

2025
[7]

Arnt and S

A. Arnt and S. Zilberstein. 2003. Learning to Perform Moderation in Online Forums. InProceedings IEEE/WIC International Conference on Web Intelligence (WI 2003). IEEE, Halifax, NS, Canada, 637–641. https://doi.org/10.1109/WI.2003. 1241285

work page doi:10.1109/wi.2003 2003
[8]

Basel Barakat and Sardar Jaf. 2025. Beyond Traditional Classifiers: Evaluating Large Language Models for Robust Hate Speech Detection.Computation13, 8, Article 196 (2025), 19 pages. https://doi.org/10.3390/computation13080196

work page doi:10.3390/computation13080196 2025
[9]

Geetanjali Bihani and Julia Rayz. 2025. A Fuzzy Evaluation of Sentence Encoders on Grooming Risk Classification. (2025). arXiv:cs.CL/2502.12576 https://arxiv. org/abs/2502.12576

work page arXiv 2025
[10]

G Bradski. 2025. OpenCV: Image Thresholding — docs.opencv.org. https://docs. opencv.org/4.x/d7/da8/tutorial_table_of_content_imgproc.html. (2025). [Ac- cessed 19-03-2026]

2025
[11]

Briskilal, M

J. Briskilal, M. Jaya Karthik, and Sai Praneeth. 2024. Detection of offensive text in memes using Deep Learning Techniques.AIP Conference Proceedings3075 (Jul 2024), 124484–124498. https://doi.org/10.1063/5.0217063

work page doi:10.1063/5.0217063 2024
[12]

Jie Cai, Aashka Patel, Azadeh Naderi, and Donghee Yvette Wohn. 2024. Content moderation justice and fairness on social media: Comparisons across different contexts and platforms. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 84, 9 pages. https://doi.org/10....

work page doi:10.1145/3613905.3650882 2024
[13]

Olivia Carville and Cecilia D’Anastasio. 2024. Roblox’s Pedophile Problam. (2024). https://www.bloomberg.com/features/2024-roblox-pedophile-problem/

2024
[14]

Teng-Chang Chang, Sendren Sheng-Dong Xu, and Shun-Feng Su. 2015. SSIM- based quality-on-demand energy-saving schemes for OLED displays.IEEE Trans- actions on Systems, Man, and Cybernetics: Systems46, 5 (2015), 623–635

2015
[15]

Chatbot App. 2026. AI Chatbot. https://chat.chatbot.app/?model=gemini-3-pro. (2026). Accessed: 2026-04-30

2026
[16]

Mohamed Chawki. 2025. AI Moderation and Legal Frameworks in Child- Centric Social Media: A Case Study of Roblox. (2025). https://doi.org/10.3390/ laws14030029

2025
[17]

Jesse Chen, Dharun Anandayuvaraj, James C Davis, and Sazzadur Rahaman. 2024. (Jul 2024). https://arxiv.org/pdf/2310.01653

work page arXiv 2024
[18]

Tsong Yueh Chen, Shing-Chi Cheung, and Siu-Ming Yiu. 2020. Metamorphic Testing: A New Approach for Generating Next Test Cases.CoRRabs/2002.12543 (2020). arXiv:2002.12543 https://arxiv.org/abs/2002.12543

work page arXiv 2020
[19]

Jessica Clement. 2025. Roblox Global Daily Active Users. https://www.statista. com/statistics/1192573/daily-active-users-global-roblox/. (2025)

work page arXiv 2025
[20]

Federal Trade Commission. 2013. Children’s Online Privacy Protection Rule: Final Rule Amendments. 16 C.F.R. Part 312. (2013). https://www.ftc.gov/legal-library/ browse/rules/childrens-online-privacy-protection-rule-coppa 78 FR 3972

2013
[21]

Roblox Corporation. 2020. How We Scaled Bert to Serve 1+ Billion Daily Re- quests on CPUs. (May 2020). https://about.roblox.com/newsroom/2020/05/ scaled-bert-serve-1-billion-daily-requests-cpus

2020
[22]

Roblox Corporation. 2025. Parental Controls Overview. https://en.help.roblox. com/hc/en-us/articles/30428310121620-Parental-Controls-Overview. (2025)

work page arXiv 2025
[23]

Roblox Corporation. 2026. Attributes and Signals. (2026). https://create.roblox. com/docs/matchmaking/attributes-and-signals#existing-signals

2026
[24]

Roblox Corporation. 2026. Content Moderation on Roblox. https://en.help.roblox. com/hc/en-us/articles/21416271342868-Content-Moderation-on-Roblox. (2026)

work page arXiv 2026
[25]

Roblox Corporation. 2026. Korblox Deathwalker. (2026). https://www.roblox. com/bundles/319226/Korblox-Deathwalker

2026
[26]

Roblox Corporation. 2026. Matchmaking. (2026). https://create.roblox.com/docs/ matchmaking

2026
[27]

Roblox Corporation. 2026. Player. (2026). https://create.roblox.com/docs/ reference/engine/classes/Player#Idled

2026
[28]

Roblox Corporation. 2026. Roblox Community Standards. https://en.help.roblox. com/hc/en-us/articles/203313410-Roblox-Community-Standards. (2026)

work page arXiv 2026
[29]

Roblox Corporation. 2026. Roblox Community Standards | Roblox — about.roblox.com. https://about.roblox.com/community-standards. (2026)

2026
[30]

Roblox Corporation. 2026. Roblox experiences. (2026). https://www.roblox.com/ charts?device=computer&country=us

2026
[31]

Roblox Corporation. 2026. Roblox Safety Tools. https://about.roblox.com/ safety-tools. (2026)

2026
[32]

Roblox Corporation. 2026. Roblox Terms of Use — en.help.roblox.com. https:// en.help.roblox.com/hc/en-us/articles/115004647846-{R}oblox-{T}erms-of-{U}se. (2026)

work page arXiv 2026
[33]

Roblox Corporation. 2026. Server Scoring. (2026). https://create.roblox.com/ docs/matchmaking/scoring

2026
[34]

Roblox Corporation. 2026. Text chat overview. (2026). https://create.roblox.com/ docs/chat/in-experience-text-chat

2026
[35]

Roblox Corporation. 2026. Text filtering. (2026). https://create.roblox.com/docs/ ui/text-filtering

2026
[36]

Samantha Craven, Sarah Brown, and Elizabeth Gilchrist. 2006. Sexual grooming of children: Review of literature and theoretical considerations.Journal of sexual aggression12, 3 (2006), 287–299

2006
[37]

Wajeeh Daher. 2023. Saturation in Qualitative Educational Technology Research. Education Sciences13, 2 (2023). https://doi.org/10.3390/educsci13020098

work page doi:10.3390/educsci13020098 2023
[38]

Primavera De Filippi and Nathan Schneider. 2021. Peer Governance in Online Communities. (2021), 771586 pages

2021
[39]

Naveen Reddy Dendi. 2025. Real-Time Content Moderation in Gaming Platforms: Technical Frameworks for Child Protection.Journal of Computer Science and Technology Studies7, 9 (Aug. 2025), 01–08. https://doi.org/10.32996/jcsts.2025.7.9.1

work page doi:10.32996/jcsts.2025.7.9.1 2025
[40]

Statista Research Department. 2025. Gaming Reach Worldwide by Age and Gender 2025. (Nov 2025). https://www.statista.com/statistics/326420/ console-gamers-gender/

2025
[42]

Angel Diaz-Garcia and Joao Paulo Carvalho

J. Angel Diaz-Garcia and Joao Paulo Carvalho. 2025. A Literature Review of Textual Cyber Abuse Detection Using Cutting-Edge Natural Language Processing Techniques: Language Models and Large Language Models.WIREs Data Mining and Knowledge Discovery15, 3 (Jun 2025), 38. https://doi.org/10.1002/widm.70029

work page doi:10.1002/widm.70029 2025
[43]

Divya, G

P. Divya, G. Samprakash, B. Yazhini, R. Kesavan, R. Saravanakumar, and S. Jeya Lakshmi. 2025. AI-based Content Moderation System for Offensive Data Detec- tion.2025 8th International Conference on Computing Methodologies and Commu- nication (ICCMC)(2025), 1803–1809. https://doi.org/10.1109/ICCMC65190.2025. 11140757

work page doi:10.1109/iccmc65190.2025 2025
[44]

Nicola Döring. 2014. Consensual sexting among adolescents: Risk prevention through abstinence education or safer sexting?Cyberpsychology: Journal of Psychosocial Research on Cyberspace8, 1 (2014)

2014
[45]

Ben Ellery. 2025. Roblox Safety Failings Leave Children at Risk, Claim Experts. (2025). https://www.thetimes.com/uk/law/article/ roblox-kk-swastikas-childrens-safety-fx5n6tcl7

2025
[46]

Fatmaelzahraa Eltaher, Rahul Krishna Gajula, Luis Miralles-Pechuán, Patrick Crotty, Juan Martínez-Otero, Christina Thorpe, and Susan McKeever. 2025. Pro- tecting Young Users on Social Media: Evaluating the Effectiveness of Content Moderation and Legal Safeguards on Video Sharing Platforms. (2025)

2025
[47]

Larissa Engelmann, Christine A Weirich, and Corinne May-Chahal. 2025. Devel- oping quality standards for community-based online child sexual exploitation and abuse interventions.Child Abuse & Neglect164 (2025), 107444

2025
[48]

Ahmet M Eskicioglu and Paul S Fisher. 1995. Image quality measures and their performance.IEEE Transactions on communications43, 12 (1995), 2959–2965

1995
[49]

Fandom. 2021. GameCharlie1. (2021). https://robloxcities.fandom.com/wiki/ GameCharlie1

2021
[50]

Casey Fiesler, Nathan Beard, and Brian C. Keegan. 2020. No Robots, Spiders, or Scrapers: Legal and Ethical Regulation of Data Collection Methods in Social Media Terms of Service. InProceedings of the Fourteenth International AAAI Conference on Web and Social Media, ICWSM 2020, Held Virtually, Original Venue: Atlanta, Georgia, USA, June 8-11, 2020, Munmun ...

2020
[51]

David Finkelhor, Heather Turner, and Deirdre Colburn. 2022. Prevalence of online sexual offenses against children in the US.JAMA network open5, 10 (2022), e2234471–e2234471

2022
[52]

Susan Flynn, Rose Doolan Maher, and Julie Byrne. 2024. Child protection and welfare risks and opportunities related to disability and internet use: Broadening current conceptualisations through critical literature review.Children and Youth Services Review157 (2024), 107410

2024
[53]

Yannakakis

Roberto Gallotta, Graham Todd, Marvin Zammit, Sam Earle, Antonios Liapis, Julian Togelius, and Georgios N. Yannakakis. 2024. Large Language Models and Games: A Survey and Roadmap.IEEE Transactions on Games(2024), 1–18. https://doi.org/10.1109/TG.2024.3461510

work page doi:10.1109/tg.2024.3461510 2024
[54]

Valerio La Gatta, Luca Luceri, Francesco Fabbri, and Emilio Ferrara. 2023. The interconnected nature of online harm and moderation: Investigating the cross- platform spread of harmful content between youtube and twitter. InProceedings of the 34th ACM conference on hypertext and social media. 1–10

2023
[55]

Tarleton Gillespie. 2020. Content moderation, AI, and the question of scale.Big Data & Society7, 2 (2020), 2053951720943234

2020
[57]

Ian Goldstein, Laura Edelson, Minh-Kha Nguyen, Oana Goga, Damon McCoy, and Tobias Lauinger. 2023. Understanding the (In)Effectiveness of Content Moderation: A Case Study of Facebook in the Context of the U.S. Capitol Riot. (2023). arXiv:cs.SI/2301.02737 https://arxiv.org/abs/2301.02737

work page arXiv 2023
[58]

Kênia C Gonçalves, Flávio Soriano, Humberto T Marques-Neto, and Jussara M Almeida. 2026. Potential Exposure of Kids to Age-Inappropriate Content on Twitch: A Comparative Cross-Country Study.Social Network Analysis and Mining 16, 1 (2026), 1

2026
[59]

Agam Goyal, Xianyang Zhan, Yilun Chen, Koustuv Saha, and Eshwar Chan- drasekharan. 2025. MoMoE: Mixture of Moderation Experts Framework for AI- Assisted Online Governance. InProceedings of the 2025 Congerence on Empirical Methods in Natural Language Processing. Association for Computational Linguis- tics, Suzhou, China, 12645–12660. https://doi.org/10.186...

work page doi:10.18653/v1/2025.emnlp-main 2025
[60]

Grammarly, Inc. 2026. Grammarly. (2026). https://app.grammarly.com/ Accessed: 30 April 2026

2026
[61]

2024.Computer Vision: Challenges, Trends, and Opportunies

Mahmudul Hasan, Kishan Shamsundar Athrey, Arfeen Khalid, Danfeng Xie, Ehsan Younessian, and Tony Braskich. 2024.Computer Vision: Challenges, Trends, and Opportunies. CRC Press, Boca Raton, FL, Chapter Applications of computer vision in entertainment and Media Industry, 205–238. https://doi.org/10.1201/ 9781003328957-10

2024
[62]

Sameer Hinduja and Justin W Patchin. 2013. Social influences on cyberbully- ing behaviors among middle and high school students.Journal of youth and adolescence42, 5 (2013), 711–722

2013
[63]

Monica Hong. 2024. The impact of social media in child sexual abuse.Journal of Paediatrics and Child Health60, 10 (2024), 476–478

2024
[64]

Traci Hong, Zilu Tang, Manyuan Lu, Yunwen Wang, Jiaxi Wu, and Derry Wijaya
[65]

https: //doi.org/10.1177/14614448231187529

Effects of #coronavirus content moderation on misinformation and anti- Asian hate on Instagram.New Media & Society27, 2 (2025), 931–954. https: //doi.org/10.1177/14614448231187529

work page doi:10.1177/14614448231187529 2025
[66]

James M Hudson and Amy Bruckman. 2004. “Go away”: Participant objections to being studied and the ethics of chatroom research.The information society20, 2 (2004), 127–139

2004
[67]

Instagram. 2025. Parental Supervision. https://help.instagram.com/ 309877544512275. (2025)

2025
[68]

Brubaker, and Casey Fiesler

Jialun ’Aaron’ Jiang, Skyler Middler, Jed R. Brubaker, and Casey Fiesler. 2020. Characterizing Community Guidelines on Social Media Platforms. InCompanion Publication of the 2020 Conference on Computer Supported Cooperative Work and Social Computing (CSCW ’20 Companion). Association for Computing Machinery, New York, NY, USA, 287–291. https://doi.org/10.1...

work page doi:10.1145/3406865.3418312 2020
[69]

Mekha Jose, Jocelyn Anthony, Jose V Joseph, Joshwa Thomas, and Sharon Baby Thomas. 2025. Automated Detection of Offensive Text in Social Media Images. In 2025 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA). IEEE, Cochin, Kerala, India, 1–6. https://doi.org/10. 1109/ACCTHPA65749.2025.11168676

work page arXiv 2025
[70]

Prerna Juneja, Md Momen Bhuiyan, and Tanushree Mitra. 2023. Assessing enactment of content regulation policies: A post hoc crowd-sourced audit of election misinformation on YouTube. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 545, 22 pages. https:...

work page arXiv 2023
[71]

Caroline W Kabiru, Helen H Habib, Sam Beckwith, Anthony Idowu Ajayi, Sheila Mukabana, Beryl Nyatuga Machoka, Robert Wm Blum, and Anna E Kagesten
[72]

(Oct 2024)

Risk and Protective Factors for the Sexual and Reproductive Health of Young Adolescents: Lessons Learnt in the Past Decade and Research Priorities Moving Forward. (Oct 2024). https://doi.org/10.1016/j.jadohealth.2024.03.007

work page doi:10.1016/j.jadohealth.2024.03.007 2024
[73]

Joseph M Kayany. 1998. Contexts of Uninhibited Online Behavior: Flaming in Social Newsgroups on Usenet.Journal of the American Society for Information Science49, 12 (1998), 1135–1141

1998
[74]

Sangmin Kim, Byeongcheon Lee, Muazzam Maqsood, Jihoon Moon, and Seung- min Rho. 2025. Deep Learning-Based Natural Language Processing Model and Optical Character Recognition for Detection of Online Grooming on Social Net- working Services.Computer Modeling in Engineering & Sciences143, 2 (2025), 2079–2108. https://doi.org/10.32604/cmes.2025.061653

work page doi:10.32604/cmes.2025.061653 2025
[75]

Naren Koneru. 2025. How Roblox uses AI to Moderate Content on a Massive Scale. (Jul 2025). https://about.roblox.com/newsroom/2025/07/ roblox-ai-moderation-massive-scale

2025
[76]

The System is Made to Inherently Push Child Gambling in my Opinion

Yubo Kou, Rie Helene Hernandez, and Xinning Gui. 2025. “The System is Made to Inherently Push Child Gambling in my Opinion”: Child Safety, Monetization, and Moderation on Roblox. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–18

2025
[77]

Deepak Kumar, Yousef Anees AbuHashem, and Zakir Durumeric. 2024. Watch your language: Investigating content moderation with large language models. Proceedings of the International AAAI Conference on Web and Social Media18 (May 2024), 865–878. https://doi.org/10.1609/icwsm.v18i1.31358

work page doi:10.1609/icwsm.v18i1.31358 2024
[78]

2012.Children, risk and safety on the internet: Research and policy challenges in comparative perspective

Sonia Livingstone and Leslie Haddon. 2012.Children, risk and safety on the internet: Research and policy challenges in comparative perspective. Policy Press, Chicago, IL, USA

2012
[79]

Rahul Makhijani, Parikshit Shah, Vashist Avadhanula, Caner Gocmen, Nicolás E Stier-Moses, and Julián Mestre. 2021. Quest: Queue simulation for content moderation at scale.arXiv preprint arXiv:2103.168162103.16816 (2021), 9

work page arXiv 2021
[80]

Alvaro Maranon and Dalia Wrocherinsky. 2023. Public Panics and Youth Online Safety – a Deep Dive. (Jul 2023). https://project-disco.org/featured/ public-panics-and-youth-online-safety-a-deep-dive/

2023
[81]

J Nathan Matias. 2019. The Civic Labor of Volunteer Moderators Online.Social Media+ Society5, 2 (2019), 2056305119836778

2019
[82]

Niall McCrae, Sheryl Gettings, and Edward Purssell. 2017. Social media and de- pressive symptoms in childhood and adolescence: A systematic review.Adolescent Research Review2, 4 (2017), 315–330

2017

Showing first 80 references.