Scalable AI-Driven Analytics for User Engagement and Stance Detection on Social Media

Dinusha Vatsalan; Hassan Asghar; Mohamed Ali Kaafar; Muhammad Ikram; Thammitage Piyumi Wathsala Seneviratne

arxiv: 2605.29199 · v1 · pith:4IF4VS3Nnew · submitted 2026-05-28 · 💻 cs.SI

Scalable AI-Driven Analytics for User Engagement and Stance Detection on Social Media

Thammitage Piyumi Wathsala Seneviratne , Muhammad Ikram , Dinusha Vatsalan , Hassan Asghar , Mohamed Ali Kaafar This is my paper

Pith reviewed 2026-06-29 00:26 UTC · model grok-4.3

classification 💻 cs.SI

keywords conspiracy contentuser engagementstance detectionYouTubesocial media analyticsmisinformationamplification dynamicsscalable framework

0 comments

The pith

Conspiracy videos draw up to 70 percent of total user engagement in their first week after upload.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a modular pipeline that ingests, filters, models topics, and applies sentiment and stance detection to millions of YouTube comments on conspiracy videos. It uses this system to measure how engagement concentrates early and how users respond. The central finding is that most interaction happens quickly and most expressed positions support the narratives. The work argues this shows the value of continuous, service-oriented monitoring at platform scale.

Core claim

A scalable service framework combining data ingestion, topic modelling, sentiment analysis, and stance detection processes over 7 million comments from nearly 50,000 conspiracy-related YouTube videos. The analysis shows conspiracy content attracts up to 70 percent of total user engagement within the first week and that a majority of users express favourable positions toward the narratives, with a small set of highly active users driving disproportionate engagement across channels.

What carries the argument

The modular pipeline that chains data ingestion, filtering, topic modelling, sentiment analysis, and stance detection to operate on large real-world comment sets.

Load-bearing premise

The stance detection and sentiment models correctly classify user positions on conspiracy comments even though no accuracy metrics or validation results are supplied.

What would settle it

Label a random sample of the 7 million comments for stance and sentiment by hand, then measure how often the pipeline's classifications match those labels.

Figures

Figures reproduced from arXiv: 2605.29199 by Dinusha Vatsalan, Hassan Asghar, Mohamed Ali Kaafar, Muhammad Ikram, Thammitage Piyumi Wathsala Seneviratne.

**Figure 1.** Figure 1: Overview of the proposed scalable AI-driven service architecture. The system consists of five layers: (1) Data Sources Curation, (2) Data Ingestion (YouTube API [21]), (3) Processing Pipeline (filtering, topic modelling, sentiment and stance analysis), (4) Analytics Layer (engagement metrics and behavioural signals), and (5) Service Interface for real-time querying and monitoring. enables us to characteris… view at source ↗

**Figure 2.** Figure 2: Differences between reported and publicly available comment counts across datasets (log scale) [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Clustered document embeddings showing coherent topic clusters. Corpus package [24]. We manually verify that removing such phrases does not alter the semantic meaning of transcripts. Unlike traditional topic modelling approaches based on word-frequency matrices, our objective is to extract humaninterpretable topics that preserve semantic relationships within the data. Therefore, we adopt a transformer-base… view at source ↗

**Figure 4.** Figure 4: Our data preprocessing pipeline for stance detection. a user is responding in order to accurately infer their opinion. In large-scale social media environments, this challenge is further compounded by the presence of noisy, ambiguous, and low-information content. In particular, spam-like, irrelevant, or meaningless comments are prevalent and can negatively impact stance inference. To address this, we first… view at source ↗

**Figure 5.** Figure 5: Comments distribution. 10 0 10 1 10 2 10 3 Number of Unique Videos Per User 0.0 0.2 0.4 0.6 0.8 1.0 CDF Other Conspiracies QA-non Conspiracies Baseline Videos [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 8.** Figure 8: Pairwise Pearson correlation coefficient between users’ number of comments, likes, and views in each dataset. Takeaway (RQ1): Our findings suggest that conspiracy content exhibits stronger, more skewed, and more distributed engagement compared to mainstream content. A small subset of highly active users contributes disproportionately, and engagement is driven by content across multiple channels rather th… view at source ↗

**Figure 9.** Figure 9: Proportion of the comments in each dataset by sentiments of most actively engaging users. (i) Sentiment Analysis [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 10.** Figure 10 [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

**Figure 12.** Figure 12: Time series analysis of comments received over time. Normalised comment count is obtained by dividing total comments by the number of videos, enabling comparison across datasets of different sizes. (i) Early Engagement Dynamics [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗

read the original abstract

Social media platforms have become a major vector for the large-scale dissemination of misinformation and conspiracy content, posing significant risks to public trust, health, and societal stability. While prior work has primarily focused on analysing such content from a behavioural or content-centric perspective, there is a lack of scalable, service-oriented solutions that enable continuous monitoring and analysis of user engagement at platform scale. In this paper, we present a scalable AI-driven service framework for analysing user engagement and stance on social media content. Our system integrates data ingestion, filtering, topic modelling, sentiment analysis, and stance detection into a modular pipeline that can operate on large-scale, real-world datasets. We implement and evaluate our framework on a dataset comprising over 7 million user comments collected from nearly 50,000 YouTube videos associated with conspiracy narratives. Our analysis reveals that conspiracy content attracts up to 70% of total user engagement within the first week of publication, indicating strong early amplification dynamics. Furthermore, we identify a subset of highly active users who exhibit disproportionately high engagement across multiple videos and channels. Stance analysis shows that a majority of users express favourable positions toward conspiracy narratives, highlighting the role of user communities in reinforcing such content. The proposed framework demonstrates the feasibility of deploying scalable, service-oriented analytics for real-time monitoring of user engagement and behavioural patterns. These findings demonstrate the effectiveness of our framework in capturing large-scale engagement dynamics and highlight the importance of early-stage detection and service-based monitoring for mitigating the spread of harmful content.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a modular pipeline for YouTube conspiracy comment analysis at scale but leaves stance detection unvalidated, so the majority-favourable claim rests on an untested classifier.

read the letter

The core takeaway is that this work assembles existing NLP components into a service pipeline and runs them on a large YouTube dataset, but the headline stance result has no reported accuracy or validation. They ingest comments from nearly 50,000 conspiracy videos, apply topic modelling, sentiment analysis and stance detection, and report that conspiracy content draws up to 70 percent of engagement in the first week plus a majority of favourable user stances.

What the paper actually does is demonstrate a practical, modular system that can handle seven million comments. The early-amplification count looks like a direct measurement rather than a model output, and the identification of a small set of highly active users across videos is a straightforward observation from the data. The service-oriented framing is reasonable for people who need ongoing monitoring tools.

The clear weakness is the stance detection step. The abstract and stress-test note both show no accuracy figures, no held-out test results, no confusion matrix, and no discussion of how labels were obtained or how well the model handles conspiracy-specific language. If the classifier has even moderate error on that domain, the majority-favourable conclusion does not hold. The 70 percent engagement figure may stand on its own, but the behavioural claim that follows does not.

This paper is mainly useful to engineers building social-media monitoring services who want an example of a working pipeline at scale. Researchers looking for reliable empirical findings on user stance will find the missing validation a problem. The work shows clear engineering effort and honest application of standard tools, but the central user-behaviour claim is under-supported.

I would send it to peer review only if the authors supply the missing model evaluations and baselines; without those the stance result is too fragile to stand as a finding.

Referee Report

3 major / 2 minor

Summary. The paper presents a modular, service-oriented AI pipeline that combines data ingestion, filtering, topic modelling, sentiment analysis, and stance detection to monitor user engagement with conspiracy-related YouTube content at scale. Evaluated on a corpus of >7 million comments from ~50k videos, the work reports that conspiracy content captures up to 70% of total engagement within the first week and that a majority of users adopt favourable stances toward such narratives, while also identifying a small set of highly active users.

Significance. A validated, production-ready framework for continuous, large-scale stance and engagement monitoring would be a useful contribution to computational social science and platform-governance research. The dataset size and the emphasis on early amplification are strengths; however, the absence of any reported model validation, baselines, or uncertainty estimates for the core AI components substantially reduces the reliability of the headline quantitative claims.

major comments (3)

[Abstract and §4] Abstract and §4 (Stance Detection): The central claim that 'a majority of users express favourable positions toward conspiracy narratives' is produced by an unvalidated stance-detection module. No accuracy, F1, confusion matrix, inter-annotator agreement, or held-out test-set results are supplied for either the stance classifier or the upstream sentiment analysis. Without these metrics the majority conclusion cannot be assessed and is load-bearing for the paper's main empirical contribution.
[§3 and §5] §3 and §5 (Engagement Analysis): The reported 'up to 70% of total user engagement within the first week' is presented without baseline comparisons, temporal controls, or error bars. It is therefore impossible to determine whether this figure exceeds what would be expected under a null model of random or popularity-driven engagement.
[§4] §4 (Pipeline Evaluation): The manuscript asserts that the framework 'can operate on large-scale, real-world datasets' and demonstrates 'feasibility of deploying scalable, service-oriented analytics,' yet provides no throughput, latency, or resource-utilization measurements for the end-to-end pipeline on the 7 M comment corpus.

minor comments (2)

[Abstract] The abstract states quantitative findings without any accompanying validation statistics; this should be flagged as a limitation even in the abstract.
[§3] Notation for engagement ratios and stance polarity scores is introduced without explicit definitions or references to the precise formulas used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive feedback. We address each major comment below and will incorporate revisions to improve the manuscript's rigor and transparency.

read point-by-point responses

Referee: [Abstract and §4] The central claim that 'a majority of users express favourable positions toward conspiracy narratives' is produced by an unvalidated stance-detection module. No accuracy, F1, confusion matrix, inter-annotator agreement, or held-out test-set results are supplied for either the stance classifier or the upstream sentiment analysis. Without these metrics the majority conclusion cannot be assessed and is load-bearing for the paper's main empirical contribution.

Authors: We agree that the absence of validation metrics for the stance detection and sentiment components limits the assessability of the majority stance claim. The manuscript applies standard off-the-shelf NLP models without reporting dataset-specific performance. In revision we will add a new evaluation subsection in §4 that reports accuracy, macro-F1, a confusion matrix, and inter-annotator agreement on a manually labelled held-out test set of 500 comments. This will directly support the empirical contribution. revision: yes
Referee: [§3 and §5] The reported 'up to 70% of total user engagement within the first week' is presented without baseline comparisons, temporal controls, or error bars. It is therefore impossible to determine whether this figure exceeds what would be expected under a null model of random or popularity-driven engagement.

Authors: The 70 % figure is an observational statistic computed from the temporal distribution of comment volumes in the collected corpus. We acknowledge the lack of statistical controls. The revised manuscript will add, in §§3 and 5, a simple null-model baseline (random reassignment of engagement volumes) together with bootstrap-derived 95 % confidence intervals around the weekly engagement percentages to allow readers to judge whether the observed early amplification is distinguishable from chance. revision: yes
Referee: [§4] The manuscript asserts that the framework 'can operate on large-scale, real-world datasets' and demonstrates 'feasibility of deploying scalable, service-oriented analytics,' yet provides no throughput, latency, or resource-utilization measurements for the end-to-end pipeline on the 7 M comment corpus.

Authors: We concur that quantitative pipeline benchmarks are required to substantiate the scalability claims. The revised version will include, in §4, end-to-end measurements (comments processed per second, average latency per comment, peak CPU and memory usage) obtained while ingesting and analysing the full 7-million-comment corpus on a standard cloud VM configuration. These metrics will be presented alongside the existing qualitative feasibility discussion. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical pipeline outputs are direct data counts and model applications, not self-defined quantities.

full rationale

The paper describes a modular data-processing pipeline (ingestion, filtering, topic modelling, sentiment, stance detection) applied to a collected dataset of 7M comments. Reported figures such as the 70% early engagement and majority favourable stance are presented as analysis results from this pipeline. No equations, parameter-fitting steps, or derivations appear in the abstract or described framework. No self-citations, uniqueness theorems, or ansatzes are invoked to justify core claims. The stance-detection component is unvalidated in the provided text, but this is a validation gap rather than a circular reduction of the output to its own definition. The derivation chain is therefore self-contained against external data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are identifiable from the abstract. The work relies on standard, unspecified NLP tools for topic modelling, sentiment, and stance detection.

pith-pipeline@v0.9.1-grok · 5826 in / 1158 out tokens · 33338 ms · 2026-06-29T00:26:23.762806+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 3 linked inside Pith

[1]

Top websites ranking 2023,

“Top websites ranking 2023,” Sep 2023, [Accessed 12-10-2023]. [Online]. Available: https://www.similarweb.com/top-websites/

2023
[2]

32 youtube statistics 2024: Key insights & trends you need to know,

N. Dunn, “32 youtube statistics 2024: Key insights & trends you need to know,” 2024, [Accessed 14-10-2024]. [Online]. Available: https://www.charleagency.com/articles/youtube-statistics/

2024
[3]

Conspiracy theories as barriers to controlling the spread of covid-19 in the u.s

D. Romer and K. H. Jamieson, “Conspiracy theories as barriers to controlling the spread of covid-19 in the u.s.”Social Science & Medicine, vol. 263, p. 113356, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S027795362030575X

2020
[4]

Qanon: The networks of misinformation and conspiracy theories on social media,

S. Dastgeer and R. Thapaliya, “Qanon: The networks of misinformation and conspiracy theories on social media,” inThe Emerald Handbook of Computer-Mediated Communication and Social Media. Emerald Publishing Limited, 2022, pp. 251–268

2022
[5]

Managing harmful conspiracy theories on YouTube - blog.youtube,

Google, “Managing harmful conspiracy theories on YouTube - blog.youtube,” [15-OCT-2020], [Accessed 14-10- 2024]. [Online]. Available: https://blog.youtube/news-and-events/ harmful-conspiracy-theories-youtube/

2020
[6]

Continuing our work to improve recommendations on youtube — blog.youtube,

YouTube, “Continuing our work to improve recommendations on youtube — blog.youtube,” [25-01-2019], [Accessed 15- 10-2024]. [Online]. Available: https://blog.youtube/news-and-events/ continuing-our-work-to-improve/

2019
[7]

Trends in the diffusion of misinformation on social media,

H. Allcott, M. Gentzkow, and C. Yu, “Trends in the diffusion of misinformation on social media,”Research & Politics, vol. 6, no. 2, p. 2053168019848554, 2019

2019
[8]

A longitudinal analysis of youtube’s promotion of conspiracy videos,

M. Faddoul, G. Chaslot, and H. Farid, “A longitudinal analysis of youtube’s promotion of conspiracy videos,” 3 2020. [Online]. Available: http://arxiv.org/abs/2003.03318

arXiv 2020
[9]

Conspiracy beliefs, misinformation, social media platforms, and protest participation,

S. Boulianne and S. Lee, “Conspiracy beliefs, misinformation, social media platforms, and protest participation,”Media and Communication, vol. 10, pp. 30–41, 2022

2022
[10]

Conspiracy brokers: Under- standing the monetization of youtube conspiracy theories

C. Ballard, I. Goldstein, P. Mehta, G. Smothers, K. Take, V . Zhong, R. Greenstadt, T. Lauinger, and D. McCoy, “Conspiracy brokers: Under- standing the monetization of youtube conspiracy theories.” Association for Computing Machinery, Inc, 4 2022, pp. 2707–2718

2022
[11]

Science vs conspiracy: Collective narratives in the age of misinformation,

A. Bessi, M. Coletto, G. A. Davidescu, A. Scala, G. Caldarelli, and W. Quattrociocchi, “Science vs conspiracy: Collective narratives in the age of misinformation,”PloS one, vol. 10, no. 2, p. e0118093, 2015

2015
[12]

The spreading of misinformation online,

M. Del Vicario, A. Bessi, F. Zollo, F. Petroni, A. Scala, G. Caldarelli, H. E. Stanley, and W. Quattrociocchi, “The spreading of misinformation online,”Proceedings of the national academy of Sciences, vol. 113, no. 3, pp. 554–559, 2016

2016
[13]

Users polarization on facebook and youtube,

A. Bessi, F. Zollo, M. D. Vicario, M. Puliga, A. Scala, G. Caldarelli, B. Uzzi, and W. Quattrociocchi, “Users polarization on facebook and youtube,”PLoS ONE, vol. 11, 8 2016

2016
[14]

Conspiracy vs science: a large-scale analysis of online discussion cascades,

Y . Zhang, L. Wang, J. J. Zhu, and X. Wang, “Conspiracy vs science: a large-scale analysis of online discussion cascades,”World wide web, vol. 24, pp. 585–606, 2021

2021
[15]

Conspiracy in the time of corona: automatic detection of emerging covid-19 conspiracy theories in social media and the news,

S. Shahsavari, P. Holur, T. Wang, T. R. Tangherlini, and V . Roychowd- hury, “Conspiracy in the time of corona: automatic detection of emerging covid-19 conspiracy theories in social media and the news,”Journal of computational social science, vol. 3, no. 2, pp. 279–317, 2020

2020
[16]

Conspiracy theories and social media platforms,

M. Cinelli, G. Etta, M. Avalle, A. Quattrociocchi, N. Di Marco, C. Valensise, A. Galeazzi, and W. Quattrociocchi, “Conspiracy theories and social media platforms,”Current Opinion in Psychology, p. 101407, 2022

2022
[17]

Analyzing disinformation and crowd manipulation tactics on youtube,

M. N. Hussain, S. Tokdemir, N. Agarwal, and S. Al-Khateeb, “Analyzing disinformation and crowd manipulation tactics on youtube,” in2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 2018, pp. 1092–1095

2018
[18]

Caught in a networked collusion? homogeneity in conspiracy-related discussion net- works on youtube,

D. R ¨ochert, G. Neubaum, B. Ross, and S. Stieglitz, “Caught in a networked collusion? homogeneity in conspiracy-related discussion net- works on youtube,”Information Systems, vol. 103, p. 101866, 2022

2022
[19]

Antisemitic conspiracy fantasy in the age of digital media: Three ‘conspiracy theorists’ and their youtube audiences,

D. Allington, B. L. Buarque, and D. B. Flores, “Antisemitic conspiracy fantasy in the age of digital media: Three ‘conspiracy theorists’ and their youtube audiences,”Language and Literature, vol. 30, pp. 78–102, 2 2021

2021
[20]

Where conspiracy theories flourish: A study of youtube comments and bill gates conspiracy theories,

L. Ha, T. Graham, and J. Gray, “Where conspiracy theories flourish: A study of youtube comments and bill gates conspiracy theories,”Harvard Kennedy School Misinformation Review, 10 2022

2022
[21]

Google for developers — add youtube functionality to your app,

“Google for developers — add youtube functionality to your app,” Oct 2023, [Accessed: 03-03-2024]. [Online]. Available: https: //developers.google.com/youtube/v3

2023
[22]

youtube-transcript-api — pypi,

“youtube-transcript-api — pypi,” Oct 2024. [Online]. Available: https://pypi.org/project/youtube-transcript-api/

2024
[23]

Snorkel: Rapid training data creation with weak supervision,

A. Ratner, S. H. Bach, H. Ehrenberg, J. Fries, S. Wu, and C. R ´e, “Snorkel: Rapid training data creation with weak supervision,” in Proceedings of the VLDB Endowment. International Conference on V ery Large Data Bases, vol. 11, no. 3. NIH Public Access, 2017, p. 269

2017
[24]

P. O. Perry,corpus: Text Corpus Analysis, 2017, r package version 0.10.0. [Online]. Available: http://corpustext.com

2017
[25]

Bertopic: Neural topic modeling with a class-based tf-idf procedure,

M. Grootendorst, “Bertopic: Neural topic modeling with a class-based tf-idf procedure,”arXiv preprint arXiv:2203.05794, 2022

Pith/arXiv arXiv 2022
[26]

Umap: Uniform manifold approximation and projection for dimension reduction,

L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approximation and projection for dimension reduction,”arXiv preprint arXiv:1802.03426, 2018

Pith/arXiv arXiv 2018
[27]

hdbscan: Hierarchical density based clustering

L. McInnes, J. Healy, and S. Astels, “hdbscan: Hierarchical density based clustering.”J. Open Source Softw., vol. 2, no. 11, p. 205, 2017

2017
[28]

Scikit-learn: Machine learning in Python,

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vander- plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch- esnay, “Scikit-learn: Machine learning in Python,”Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011

2011
[29]

Nltk: The natural language toolkit,

E. Loper and S. Bird, “Nltk: The natural language toolkit,”arXiv preprint cs/0205028, 2002

Pith/arXiv arXiv 2002
[30]

Software Framework for Topic Modelling with Large Corpora,

R. ˇReh˚uˇrek and P. Sojka, “Software Framework for Topic Modelling with Large Corpora,” inProceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta, Malta: ELRA, May 2010, pp. 45–50, http://is.muni.cz/publication/884893/en

2010
[31]

Semeval-2018 Task 1: Affect in tweets,

S. M. Mohammad, F. Bravo-Marquez, M. Salameh, and S. Kiritchenko, “Semeval-2018 Task 1: Affect in tweets,” inProceedings of Interna- tional Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, USA, 2018

2018
[32]

Replicable semi-supervised approaches to state-of-the-art stance detection of tweets,

M. Reveilhac and G. Schneider, “Replicable semi-supervised approaches to state-of-the-art stance detection of tweets,”Information Processing and Management, vol. 60, no. 2, p. 103199, 2023

2023
[33]

On a test of whether one of two random variables is stochastically larger than the other,

H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochastically larger than the other,”The Annals of Mathematical Statistics, vol. 18, no. 1, pp. 50–60, 1947

1947

[1] [1]

Top websites ranking 2023,

“Top websites ranking 2023,” Sep 2023, [Accessed 12-10-2023]. [Online]. Available: https://www.similarweb.com/top-websites/

2023

[2] [2]

32 youtube statistics 2024: Key insights & trends you need to know,

N. Dunn, “32 youtube statistics 2024: Key insights & trends you need to know,” 2024, [Accessed 14-10-2024]. [Online]. Available: https://www.charleagency.com/articles/youtube-statistics/

2024

[3] [3]

Conspiracy theories as barriers to controlling the spread of covid-19 in the u.s

D. Romer and K. H. Jamieson, “Conspiracy theories as barriers to controlling the spread of covid-19 in the u.s.”Social Science & Medicine, vol. 263, p. 113356, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S027795362030575X

2020

[4] [4]

Qanon: The networks of misinformation and conspiracy theories on social media,

S. Dastgeer and R. Thapaliya, “Qanon: The networks of misinformation and conspiracy theories on social media,” inThe Emerald Handbook of Computer-Mediated Communication and Social Media. Emerald Publishing Limited, 2022, pp. 251–268

2022

[5] [5]

Managing harmful conspiracy theories on YouTube - blog.youtube,

Google, “Managing harmful conspiracy theories on YouTube - blog.youtube,” [15-OCT-2020], [Accessed 14-10- 2024]. [Online]. Available: https://blog.youtube/news-and-events/ harmful-conspiracy-theories-youtube/

2020

[6] [6]

Continuing our work to improve recommendations on youtube — blog.youtube,

YouTube, “Continuing our work to improve recommendations on youtube — blog.youtube,” [25-01-2019], [Accessed 15- 10-2024]. [Online]. Available: https://blog.youtube/news-and-events/ continuing-our-work-to-improve/

2019

[7] [7]

Trends in the diffusion of misinformation on social media,

H. Allcott, M. Gentzkow, and C. Yu, “Trends in the diffusion of misinformation on social media,”Research & Politics, vol. 6, no. 2, p. 2053168019848554, 2019

2019

[8] [8]

A longitudinal analysis of youtube’s promotion of conspiracy videos,

M. Faddoul, G. Chaslot, and H. Farid, “A longitudinal analysis of youtube’s promotion of conspiracy videos,” 3 2020. [Online]. Available: http://arxiv.org/abs/2003.03318

arXiv 2020

[9] [9]

Conspiracy beliefs, misinformation, social media platforms, and protest participation,

S. Boulianne and S. Lee, “Conspiracy beliefs, misinformation, social media platforms, and protest participation,”Media and Communication, vol. 10, pp. 30–41, 2022

2022

[10] [10]

Conspiracy brokers: Under- standing the monetization of youtube conspiracy theories

C. Ballard, I. Goldstein, P. Mehta, G. Smothers, K. Take, V . Zhong, R. Greenstadt, T. Lauinger, and D. McCoy, “Conspiracy brokers: Under- standing the monetization of youtube conspiracy theories.” Association for Computing Machinery, Inc, 4 2022, pp. 2707–2718

2022

[11] [11]

Science vs conspiracy: Collective narratives in the age of misinformation,

A. Bessi, M. Coletto, G. A. Davidescu, A. Scala, G. Caldarelli, and W. Quattrociocchi, “Science vs conspiracy: Collective narratives in the age of misinformation,”PloS one, vol. 10, no. 2, p. e0118093, 2015

2015

[12] [12]

The spreading of misinformation online,

M. Del Vicario, A. Bessi, F. Zollo, F. Petroni, A. Scala, G. Caldarelli, H. E. Stanley, and W. Quattrociocchi, “The spreading of misinformation online,”Proceedings of the national academy of Sciences, vol. 113, no. 3, pp. 554–559, 2016

2016

[13] [13]

Users polarization on facebook and youtube,

A. Bessi, F. Zollo, M. D. Vicario, M. Puliga, A. Scala, G. Caldarelli, B. Uzzi, and W. Quattrociocchi, “Users polarization on facebook and youtube,”PLoS ONE, vol. 11, 8 2016

2016

[14] [14]

Conspiracy vs science: a large-scale analysis of online discussion cascades,

Y . Zhang, L. Wang, J. J. Zhu, and X. Wang, “Conspiracy vs science: a large-scale analysis of online discussion cascades,”World wide web, vol. 24, pp. 585–606, 2021

2021

[15] [15]

Conspiracy in the time of corona: automatic detection of emerging covid-19 conspiracy theories in social media and the news,

S. Shahsavari, P. Holur, T. Wang, T. R. Tangherlini, and V . Roychowd- hury, “Conspiracy in the time of corona: automatic detection of emerging covid-19 conspiracy theories in social media and the news,”Journal of computational social science, vol. 3, no. 2, pp. 279–317, 2020

2020

[16] [16]

Conspiracy theories and social media platforms,

M. Cinelli, G. Etta, M. Avalle, A. Quattrociocchi, N. Di Marco, C. Valensise, A. Galeazzi, and W. Quattrociocchi, “Conspiracy theories and social media platforms,”Current Opinion in Psychology, p. 101407, 2022

2022

[17] [17]

Analyzing disinformation and crowd manipulation tactics on youtube,

M. N. Hussain, S. Tokdemir, N. Agarwal, and S. Al-Khateeb, “Analyzing disinformation and crowd manipulation tactics on youtube,” in2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 2018, pp. 1092–1095

2018

[18] [18]

Caught in a networked collusion? homogeneity in conspiracy-related discussion net- works on youtube,

D. R ¨ochert, G. Neubaum, B. Ross, and S. Stieglitz, “Caught in a networked collusion? homogeneity in conspiracy-related discussion net- works on youtube,”Information Systems, vol. 103, p. 101866, 2022

2022

[19] [19]

Antisemitic conspiracy fantasy in the age of digital media: Three ‘conspiracy theorists’ and their youtube audiences,

D. Allington, B. L. Buarque, and D. B. Flores, “Antisemitic conspiracy fantasy in the age of digital media: Three ‘conspiracy theorists’ and their youtube audiences,”Language and Literature, vol. 30, pp. 78–102, 2 2021

2021

[20] [20]

Where conspiracy theories flourish: A study of youtube comments and bill gates conspiracy theories,

L. Ha, T. Graham, and J. Gray, “Where conspiracy theories flourish: A study of youtube comments and bill gates conspiracy theories,”Harvard Kennedy School Misinformation Review, 10 2022

2022

[21] [21]

Google for developers — add youtube functionality to your app,

“Google for developers — add youtube functionality to your app,” Oct 2023, [Accessed: 03-03-2024]. [Online]. Available: https: //developers.google.com/youtube/v3

2023

[22] [22]

youtube-transcript-api — pypi,

“youtube-transcript-api — pypi,” Oct 2024. [Online]. Available: https://pypi.org/project/youtube-transcript-api/

2024

[23] [23]

Snorkel: Rapid training data creation with weak supervision,

A. Ratner, S. H. Bach, H. Ehrenberg, J. Fries, S. Wu, and C. R ´e, “Snorkel: Rapid training data creation with weak supervision,” in Proceedings of the VLDB Endowment. International Conference on V ery Large Data Bases, vol. 11, no. 3. NIH Public Access, 2017, p. 269

2017

[24] [24]

P. O. Perry,corpus: Text Corpus Analysis, 2017, r package version 0.10.0. [Online]. Available: http://corpustext.com

2017

[25] [25]

Bertopic: Neural topic modeling with a class-based tf-idf procedure,

M. Grootendorst, “Bertopic: Neural topic modeling with a class-based tf-idf procedure,”arXiv preprint arXiv:2203.05794, 2022

Pith/arXiv arXiv 2022

[26] [26]

Umap: Uniform manifold approximation and projection for dimension reduction,

L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approximation and projection for dimension reduction,”arXiv preprint arXiv:1802.03426, 2018

Pith/arXiv arXiv 2018

[27] [27]

hdbscan: Hierarchical density based clustering

L. McInnes, J. Healy, and S. Astels, “hdbscan: Hierarchical density based clustering.”J. Open Source Softw., vol. 2, no. 11, p. 205, 2017

2017

[28] [28]

Scikit-learn: Machine learning in Python,

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vander- plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch- esnay, “Scikit-learn: Machine learning in Python,”Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011

2011

[29] [29]

Nltk: The natural language toolkit,

E. Loper and S. Bird, “Nltk: The natural language toolkit,”arXiv preprint cs/0205028, 2002

Pith/arXiv arXiv 2002

[30] [30]

Software Framework for Topic Modelling with Large Corpora,

R. ˇReh˚uˇrek and P. Sojka, “Software Framework for Topic Modelling with Large Corpora,” inProceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta, Malta: ELRA, May 2010, pp. 45–50, http://is.muni.cz/publication/884893/en

2010

[31] [31]

Semeval-2018 Task 1: Affect in tweets,

S. M. Mohammad, F. Bravo-Marquez, M. Salameh, and S. Kiritchenko, “Semeval-2018 Task 1: Affect in tweets,” inProceedings of Interna- tional Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, USA, 2018

2018

[32] [32]

Replicable semi-supervised approaches to state-of-the-art stance detection of tweets,

M. Reveilhac and G. Schneider, “Replicable semi-supervised approaches to state-of-the-art stance detection of tweets,”Information Processing and Management, vol. 60, no. 2, p. 103199, 2023

2023

[33] [33]

On a test of whether one of two random variables is stochastically larger than the other,

H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochastically larger than the other,”The Annals of Mathematical Statistics, vol. 18, no. 1, pp. 50–60, 1947

1947