Detecting Synthetic Political Narratives in Cross-Platform Social Media Discourse

Despoina Antonakaki; Sotiris Ioannidis

arxiv: 2605.21540 · v1 · pith:WBHQQB72new · submitted 2026-05-20 · 💻 cs.SI · cs.AI· cs.CL· cs.CY

Detecting Synthetic Political Narratives in Cross-Platform Social Media Discourse

Despoina Antonakaki , Sotiris Ioannidis This is my paper

Pith reviewed 2026-05-22 01:26 UTC · model grok-4.3

classification 💻 cs.SI cs.AIcs.CLcs.CY

keywords synthetic narrativescoordination detectioncross-platform analysissocial medialexical diversitytemporal burstinessrhetorical repetitionsemantic homogenization

0 comments

The pith

Combining lexical diversity, burstiness, repetition and homogenization into one score detects synthetic political narratives more robustly than any single metric.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a cross-platform method to identify synthetic political narratives by tracking four coordination signals in social media content. It combines those signals into a single Synthetic Narrative Coordination Score that the authors apply to hundreds of thousands of posts from Telegram and Reddit around geopolitical events. A sympathetic reader would care because any one signal by itself can be misleading, for example when language differences reduce apparent overlap between channels. The results show that the combined score produces more consistent rankings of channels likely to be sources of coordinated synthetic content across different time windows.

Core claim

The paper claims that the Synthetic Narrative Coordination Score formed from lexical diversity D(C), temporal burstiness B(C), rhetorical repetition R(C), and semantic homogenization H(C) supplies a more robust and interpretable signal for detecting synthetic political narratives than any individual metric, as evidenced by one channel ranking highest on four of six event windows while another ranks last despite high semantic homogenization.

What carries the argument

The Synthetic Narrative Coordination Score SNC(C) that aggregates the four coordination signals to assess overall synthetic narrative activity.

If this is right

Channels with high SNC(C) values are more likely to be disseminating coordinated synthetic political narratives.
No single signal is sufficient because factors such as language can produce misleading results on individual measures.
The framework can rank sources by coordination likelihood without requiring pre-existing labeled examples.
The same multi-dimensional approach can be applied to additional platforms or event periods to track narrative patterns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Platforms could monitor high-scoring channels in real time to surface potential synthetic campaigns during major events.
Adjusting the signals for translation might expose more cross-language coordination that current metrics under-detect.
The composite score offers a practical starting point for automated flagging systems that do not rely on supervised training data.

Load-bearing premise

The four coordination signals reliably distinguish synthetic coordinated activity from organic discussion without any labeled ground-truth examples of synthetic narratives in the data.

What would settle it

A verified case in which a channel with a high SNC(C) score is shown to produce only uncoordinated organic content, or a low-scoring channel is confirmed as a source of synthetic narratives.

Figures

Figures reproduced from arXiv: 2605.21540 by Despoina Antonakaki, Sotiris Ioannidis.

**Figure 1.** Figure 1: Text length ECDF for the ukraine_war_general event window. Left: character count; right: word count. Both axes are log-scaled. Vertical dotted lines mark the 25th, 50th, 75th, and 95th percentiles. Telegram messages are shorter and more uniform; Reddit spans a wider range due to comment-tree variation. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗

**Figure 2.** Figure 2: Text length ECDF for the israel_gaza_general event window. The pattern mirrors the Ukraine window, with Telegram showing a narrower, left-shifted distribution relative to Reddit [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗

**Figure 3.** Figure 3: Inter-arrival time (IAT) ECDF per source for the [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Inter-arrival time ECDF for the israel_gaza_general window. Middle East Eye and DDGeopolitics show the most active posting rhythms for this topic [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: Engagement distribution ECDFs for the ukraine_war_general window. Top row: Telegram views and forwards. Bottom row: Reddit score and comment count. Both Telegram metrics follow heavy-tailed distributions; Reddit engagement is more zero-inflated. Axes use symlog scaling to accommodate zero values. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Engagement distribution ECDFs for the israel_gaza_general window. Telegram view counts are comparable to the Ukraine window; Reddit engagement is lower, consistent with smaller community overlap for Gazaspecific subreddits [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Daily message volume ECDF for the ukraine_war_general window. Reddit (including comment trees) shows higher peak-day volumes but more variance; Telegram is more consistent. The rightward tail of the Reddit curve reflects major news events driving comment spikes. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Daily message volume ECDF for the israel_gaza_general window. Telegram volume is lower than the Ukraine window, reflecting that Gazaspecific keywords match a smaller fraction of the channels’ total output [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 9.** Figure 9: Source activity ECDF for the ukraine_war_general window. Each data point represents one channel or subreddit. The steep Telegram curve indicates high concentration: a small number of channels dominate the corpus. Reddit is more uniform. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: Source activity ECDF for the israel_gaza_general window. Source concentration is more pronounced on Telegram than Reddit, and Middle East Eye dominates the Telegram Gaza corpus with the highest perchannel record count. 3.7.4 Lexical Diversity Analysis Lexical diversity measures the richness of vocabulary used by each source and provides an early signal of synthetic or templated content: AI-assisted narra… view at source ↗

**Figure 11.** Figure 11: Moving Average Type-Token Ratio (MATTR) by source for the [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗

**Figure 12.** Figure 12: Supporting lexical diversity analysis. Top row: Shannon word entropy (Hword, bits) per source for Russia–Ukraine (left) and Israel–Gaza (right) — rankings mirror MATTR. Bottom-left: MATTR vs. total token count on log scale; diversity differences are not explained by corpus size. Bottom-right: per-channel MATTR across both events on Telegram, showing stable source-level ranking. 3.7.5 Temporal Synchroniza… view at source ↗

**Figure 13.** Figure 13: Temporal synchronization T(C): number of distinct Telegram [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗

**Figure 14.** Figure 14: Temporal coordination — supporting analysis. [PITH_FULL_IMAGE:figures/full_fig_p025_14.png] view at source ↗

**Figure 15.** Figure 15: Rhetorical repetition R(C) score per Telegram source: mean [PITH_FULL_IMAGE:figures/full_fig_p027_15.png] view at source ↗

**Figure 16.** Figure 16: Rhetorical repetition — supporting analysis. [PITH_FULL_IMAGE:figures/full_fig_p028_16.png] view at source ↗

**Figure 17.** Figure 17: Within-source semantic homogenization H(C) per Telegram [PITH_FULL_IMAGE:figures/full_fig_p030_17.png] view at source ↗

**Figure 18.** Figure 18: Cross-source semantic similarity matrices for the Russia–Ukraine [PITH_FULL_IMAGE:figures/full_fig_p030_18.png] view at source ↗

**Figure 19.** Figure 19: SNC(C) ranked bar charts for the Russia–Ukraine (left) and [PITH_FULL_IMAGE:figures/full_fig_p032_19.png] view at source ↗

**Figure 20.** Figure 20: SNC(C) rankings for the four supporting event windows: NATO [PITH_FULL_IMAGE:figures/full_fig_p033_20.png] view at source ↗

**Figure 21.** Figure 21: Radar chart of normalised component scores ( [PITH_FULL_IMAGE:figures/full_fig_p034_21.png] view at source ↗

read the original abstract

The proliferation of large language models has introduced a new paradigm of synthetic political communication in which narratives may be generated, semantically coordinated, and strategically disseminated across platforms at scale. We present a cross-platform framework for detecting synthetic political narratives using four coordination signals -- lexical diversity D(C), temporal burstiness B(C), rhetorical repetition R(C), and semantic homogenization H(C) -- combined into a Synthetic Narrative Coordination Score SNC(C). We apply the framework to a corpus of 353,223 records spanning six geopolitical event windows collected from six Telegram channels and nine Reddit communities (2023--2026). Results show that IntelSlava exhibits the lowest lexical diversity (MATTR 0.52--0.54), the highest burstiness (B=+0.48 to +0.73), and the highest rhetorical overlap with peer channels (Jaccard 0.12), ranking first in the composite SNC(C) on four of six event windows (SNC 0.45--0.60). Rybar ranks last on all windows despite its high semantic homogenization, because its Russian-language output yields high lexical diversity and near-zero rhetorical Jaccard with English-language channels -- demonstrating that no single indicator is sufficient for coordination detection. Multi-dimensional SNC(C) scoring provides a more robust and interpretable signal than any individual metric.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper combines four signals into a composite SNC score and applies it to a large Telegram-Reddit corpus, showing why single metrics can mislead, but the link from high scores to synthetic LLM content stays unanchored by any ground truth or baseline.

read the letter

The main thing here is that this paper gives a workable multi-signal score for flagging coordinated political content across platforms, yet it does not supply the validation needed to treat high scores as evidence of synthetic generation rather than ordinary human coordination. They collect 353k records from six Telegram channels and nine Reddit communities over six event windows in 2023-2026, then score each channel on lexical diversity, temporal burstiness, rhetorical repetition, and semantic homogenization. The composite SNC(C) ranks IntelSlava first on four windows because it shows low diversity, high burstiness, and some rhetorical overlap, while Rybar ranks last despite high homogenization, mainly because its Russian output produces little rhetorical Jaccard with English channels. That contrast is useful: it demonstrates in concrete terms why no single indicator is enough and why a composite can surface different patterns. The data scale and the cross-platform setup are the parts that feel solid and worth building on. The soft spot is the missing ground truth. The abstract reports rankings and metric values but gives no labeled synthetic examples, no organic baseline set, and no explicit weighting or normalization for the four signals. Without those anchors the claim that the composite is more robust for detecting synthetic narratives rests on observed differences rather than on a test that separates LLM-driven coordination from human-organized discussion. The superiority argument therefore stays interpretive. This work would interest people who build practical tools for disinformation monitoring or cross-platform analysis. A reader already working on coordination detection could borrow the signal combination idea or the event-window framing. It has enough concrete data handling and a clear method to deserve serious referee time, even though reviewers will almost certainly ask for validation experiments or comparisons to existing baselines. I would send it out for review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The paper claims to present a cross-platform framework for detecting synthetic political narratives using four coordination signals—lexical diversity D(C), temporal burstiness B(C), rhetorical repetition R(C), and semantic homogenization H(C)—combined into a Synthetic Narrative Coordination Score SNC(C). Applied to a corpus of 353,223 records from six Telegram channels and nine Reddit communities across six geopolitical event windows (2023–2026), it reports that IntelSlava exhibits the lowest lexical diversity (MATTR 0.52–0.54), highest burstiness (B=+0.48 to +0.73), and highest rhetorical overlap (Jaccard 0.12), ranking first in SNC(C) on four of six windows (SNC 0.45–0.60), while Rybar ranks last on all windows; the central claim is that multi-dimensional SNC(C) scoring provides a more robust and interpretable signal than any individual metric.

Significance. If the mapping from elevated SNC(C) to synthetic (LLM-driven) narratives can be validated, the framework offers a timely methodological contribution for identifying coordinated inauthentic political communication at scale. The large corpus size, cross-platform design, and concrete empirical demonstration that no single signal suffices (e.g., Rybar’s high semantic homogenization offset by language and diversity differences) are strengths that could inform future detection tools. The work highlights practical challenges in multi-platform analysis but currently functions more as a heuristic for statistical coordination than as confirmed synthetic-narrative detection.

major comments (2)

[§3 (Methodology)] §3 (Methodology): The SNC(C) is presented as a composite of the four signals without explicit equations, normalization procedures, or weighting scheme; this prevents assessment of reproducibility and whether the score involves any fitted parameters or reduces to a self-referential definition.
[§5 (Results)] §5 (Results): The interpretation that high SNC(C) ranks indicate synthetic LLM-driven narratives (rather than organic or human-coordinated discourse) lacks any labeled ground-truth examples of synthetic content or explicit organic baseline sets in the corpus; this is load-bearing for the robustness claim, as the reported differences (e.g., IntelSlava vs. Rybar) could arise from other coordination mechanisms.

minor comments (2)

[Abstract] Abstract: The reported SNC(C) range (0.45–0.60) would be more interpretable if the theoretical minimum and maximum of the composite score were stated.
[Figures/Tables] Figure/Table captions: Ensure all metric definitions (MATTR, Jaccard, burstiness) are briefly restated or referenced for readers unfamiliar with the specific implementations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for highlighting both the strengths of the cross-platform corpus and the need for greater methodological transparency. We respond to each major comment below and describe the revisions that will be incorporated.

read point-by-point responses

Referee: §3 (Methodology): The SNC(C) is presented as a composite of the four signals without explicit equations, normalization procedures, or weighting scheme; this prevents assessment of reproducibility and whether the score involves any fitted parameters or reduces to a self-referential definition.

Authors: We agree that the current presentation of SNC(C) lacks sufficient formal detail. In the revised manuscript we will expand §3 to include the explicit mathematical definitions of each coordination signal, the min-max normalization applied to map every signal to [0,1], and the aggregation rule (equal-weighted average of the four normalized values). The formulation contains no learned or fitted parameters and is fully deterministic given the input data, thereby eliminating any self-referential character. revision: yes
Referee: §5 (Results): The interpretation that high SNC(C) ranks indicate synthetic LLM-driven narratives (rather than organic or human-coordinated discourse) lacks any labeled ground-truth examples of synthetic content or explicit organic baseline sets in the corpus; this is load-bearing for the robustness claim, as the reported differences (e.g., IntelSlava vs. Rybar) could arise from other coordination mechanisms.

Authors: We accept that the manuscript would be strengthened by a more explicit treatment of this limitation. The revised §5 will reframe the central claim to emphasize that SNC(C) is a multi-signal coordination heuristic whose elevated values are consistent with—but do not prove—synthetic generation. We will add a dedicated limitations paragraph that (a) acknowledges the absence of labeled synthetic or organic baselines in the present corpus, (b) notes that alternative explanations such as human-orchestrated campaigns remain possible, and (c) outlines how future work could inject controlled synthetic content for validation. The comparative analysis of IntelSlava and Rybar will be retained as evidence that single metrics are insufficient, while the interpretive language will be tempered accordingly. revision: partial

Circularity Check

0 steps flagged

No circularity in derivation of SNC(C) or coordination signals

full rationale

The paper defines four independent coordination signals (lexical diversity D(C), temporal burstiness B(C), rhetorical repetition R(C), semantic homogenization H(C)) and states they are combined into a composite SNC(C) score, then ranks channels such as IntelSlava and Rybar on observed values across event windows. No equations are supplied that render SNC(C) equivalent to any of its inputs by construction, no parameters are fitted to a subset and then relabeled as predictions, and no self-citations or uniqueness theorems are invoked to justify the combination or the mapping to synthetic narratives. The central claim therefore rests on direct computation from the collected corpus rather than on a self-referential loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework assumes the four signals capture synthetic coordination without circular dependence on the target labels; no free parameters or invented entities are explicitly introduced in the abstract, but the composite score implicitly requires a combination rule whose details are absent.

axioms (1)

domain assumption The selected Telegram channels and Reddit communities during the six geopolitical windows contain measurable differences in synthetic versus organic narrative behavior.
Invoked when ranking channels by SNC without providing external validation labels.

pith-pipeline@v0.9.0 · 5775 in / 1315 out tokens · 34036 ms · 2026-05-22T01:26:07.207582+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SNC(C) = α Ĥ(C) + β B̂(C) + γ R̂(C) − δ D̂(C) with equal weights 0.25; IntelSlava leads on four windows via low MATTR, high B, high trigram Jaccard
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

H(C) computed as mean pairwise cosine similarity of paraphrase-multilingual-MiniLM embeddings

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

En- hanced propaganda detection in public social media discussions using a fine-tuned deep learning model: A diffusion of innovation perspective

Pir Noman Ahmad, Adnan Muhammad Shah, and KangYoon Lee. En- hanced propaganda detection in public social media discussions using a fine-tuned deep learning model: A diffusion of innovation perspective. Future Internet, 17(5):212, 2025

work page 2025
[2]

An information retrieval language model-based zero- and few-shot learn- ing for propaganda detection in social media content.Knowledge and Information Systems, 68(1):88, 2026

Pir Noman Ahmad, Adnan Muhammad Shah, and KangYoon Lee. An information retrieval language model-based zero- and few-shot learn- ing for propaganda detection in social media content.Knowledge and Information Systems, 68(1):88, 2026

work page 2026
[3]

PhD thesis, Columbia University, 2025

Lin Ai.Towards Trustworthy AI: Detecting, Understanding, and Miti- gating Information Disorder. PhD thesis, Columbia University, 2025

work page 2025
[4]

Too focused on accuracy to notice the fallout: Towards socially responsible fake news 41 detection

Esma Aïmeur, Gilles Brassard, and Dorsaf Sallami. Too focused on accuracy to notice the fallout: Towards socially responsible fake news 41 detection. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, volume 8, pages 55–65, 2025

work page 2025
[5]

A survey of twitter research: Data model, graph structure, sentiment analysis and attacks.Expert Systems with Applications, 164:114006, 2021

Despoina Antonakaki, Paraskevi Fragopoulou, and Sotiris Ioannidis. A survey of twitter research: Data model, graph structure, sentiment analysis and attacks.Expert Systems with Applications, 164:114006, 2021

work page 2021
[6]

Cross-platform digital dis- course analysis of the israel-hamas conflict: Sentiment, topics, and event dynamics, 2025

Despoina Antonakaki and Sotiris Ioannidis. Cross-platform digital dis- course analysis of the israel-hamas conflict: Sentiment, topics, and event dynamics, 2025

work page 2025
[7]

Israel-hamas war through telegram, reddit and twitter, 2025

Despoina Antonakaki and Sotiris Ioannidis. Israel-hamas war through telegram, reddit and twitter, 2025

work page 2025
[8]

Coordinated information dissemination on telegram and reddit during political turbulence: A case study of venezuela in global news channels, 2026

Despoina Antonakaki and Sotiris Ioannidis. Coordinated information dissemination on telegram and reddit during political turbulence: A case study of venezuela in global news channels, 2026

work page 2026
[9]

Cross-platform digital dis- course analysis of iran: Topics, sentiment, polarization, and event vali- dation on telegram and reddit, 2026

Despoina Antonakaki and Sotiris Ioannidis. Cross-platform digital dis- course analysis of iran: Topics, sentiment, polarization, and event vali- dation on telegram and reddit, 2026

work page 2026
[10]

Samaras, Sotiris Ioannidis, and Paraskevi Fragopoulou

Despoina Antonakaki, Dimitris Spiliotopoulos, Christos V. Samaras, Sotiris Ioannidis, and Paraskevi Fragopoulou. Investigating the com- plete corpus of referendum and elections tweets. In2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 100–105. IEEE, 2016

work page 2016
[11]

Samaras, Polyvios Pratikakis, Sotiris Ioannidis, and Paraskevi Fragopoulou

Despoina Antonakaki, Dimitris Spiliotopoulos, Christos V. Samaras, Polyvios Pratikakis, Sotiris Ioannidis, and Paraskevi Fragopoulou. Social media analysis during political turbulence.PLOS ONE, 12(10):e0186836, 2017

work page 2017
[12]

Propaganda by prompt: Trac- ing hidden linguistic strategies in large language models.Information Processing & Management, 63(2):104403, 2026

Arash Barfar and Lee Sommerfeldt. Propaganda by prompt: Trac- ing hidden linguistic strategies in large language models.Information Processing & Management, 63(2):104403, 2026

work page 2026
[13]

Shalin, Nitin Agarwal, and Esra Akbas

Apaar Bawa, Ugur Kursuncu, Dilshod Achilov, Valerie L. Shalin, Nitin Agarwal, and Esra Akbas. Telegram as a battlefield: Kremlin-related communications during the russia-ukraine conflict.Proceedings of the International AAAI Conference on Web and Social Media, 19(1):2361– 2370, 2025. 42

work page 2025
[14]

Generative ai and dis- information: Recent advances, challenges, and opportunities

Kalina Bontcheva, Symeon Papadopoulous, Filareti Tsalakanidou, Riccardo Gallotti, Lidia Dutkiewicz, Noémie Krack, Denis Teyssou, Francesco Severio Nucci, Jochen Spangenberg, Ivan Srba, Patrick Aichroth, Luca Cuccovillo, and Luisa Verdoliva. Generative ai and dis- information: Recent advances, challenges, and opportunities. Technical report, European Digit...

work page 2024
[15]

Influencing belief in llm-based agent networks: An empirically validated simulation of bot-driven manipulation

Julian Burmester. Influencing belief in llm-based agent networks: An empirically validated simulation of bot-driven manipulation. Master’s thesis, Leuphana University Lüneburg, 2026

work page 2026
[16]

Expos- ing cross-platform coordinated inauthentic activity in the run-up to the 2024 u.s

Federico Cinus, Marco Minici, Luca Luceri, and Emilio Ferrara. Expos- ing cross-platform coordinated inauthentic activity in the run-up to the 2024 u.s. election. InProceedings of the ACM Web Conference 2025. ACM, 2025

work page 2024
[17]

Combating disinformation in the age of generativeai: Fromwatermarkingllmstopersuasionanalysisofmemes

Amirhossein Dabiriaghdam. Combating disinformation in the age of generativeai: Fromwatermarkingllmstopersuasionanalysisofmemes. Master’s thesis, University of British Columbia, 2025

work page 2025
[18]

C. Donner. Misinformation detection methods using large language models and evaluation of application programming interfaces. Master’s thesis, Oklahoma State University, 2024. Metadata verified from public index snippets and secondary references

work page 2024
[19]

PhD thesis, University of Bologna, 2024

Margherita Gambini.Digital Sentinels: Unraveling the Societal Impli- cations and Social Media Defence Strategies Against Large Language Models. PhD thesis, University of Bologna, 2024

work page 2024
[20]

Studying disinformation narratives on social media with llms and semantic similarity, 2025

Chaytan Inman. Studying disinformation narratives on social media with llms and semantic similarity, 2025

work page 2025
[21]

Under the influence: A survey of large language models in fake news detection.IEEE Transactions on Artificial Intelligence, 6(2):458– 476, 2025

Soveatin Kuntur, Anna Wróblewska, Marcin Paprzycki, and Maria Ganzha. Under the influence: A survey of large language models in fake news detection.IEEE Transactions on Artificial Intelligence, 6(2):458– 476, 2025

work page 2025
[22]

A comprehensive survey of fake text detection on misinformation and lm-generated texts.IEEE Access, 13:25301–25324, 2025

Soonchan Kwon and Beakcheol Jang. A comprehensive survey of fake text detection on misinformation and lm-generated texts.IEEE Access, 13:25301–25324, 2025

work page 2025
[23]

Exploring crisis-driven social media 43 patterns: A twitter dataset of usage during the russo-ukrainian war

Ioannis Lamprou, Alexander Shevtsov, Despoina Antonakaki, Polyvios Pratikakis, and Sotiris Ioannidis. Exploring crisis-driven social media 43 patterns: A twitter dataset of usage during the russo-ukrainian war. In International Conference on Advances in Social Networks Analysis and Mining, pages 70–85. Springer Nature Switzerland, 2024

work page 2024
[24]

Simulatingmisinformationpropagation in social networks using large language models, 2025

Raj Gaurav Maurya, Vaibhav Shukla, Raj Abhijit Dandekar, Rajat Dandekar, andSreedathPanat. Simulatingmisinformationpropagation in social networks using large language models, 2025

work page 2025
[25]

A survey on the use of large language models (llms) in fake news.Future Internet, 16(8):298, 2024

Eleftheria Papageorgiou, Christos Chronis, Iraklis Varlamis, and Yas- sine Himeur. A survey on the use of large language models (llms) in fake news.Future Internet, 16(8):298, 2024

work page 2024
[26]

Polychronis

C. Polychronis. The anatomy of a propaganda machine: How state- sponsored agents collaborate to pollute and manipulate the information environment. ProQuest, 2025. Dataset details not publicly verified in this draft

work page 2025
[27]

Sentence-bert: Sentence embeddings using siamese bert-networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2019

work page 2019
[28]

From deception to detection: The dual roles of large language models in fake news, 2024

Dorsaf Sallami, Yuan-Chen Chang, and Esma Aïmeur. From deception to detection: The dual roles of large language models in fake news, 2024

work page 2024
[29]

The role of llms in curtailing the spread of disinfor- mation

Diksha Saxena. The role of llms in curtailing the spread of disinfor- mation. Master’s thesis, University at Buffalo, State University of New York, 2024

work page 2024
[30]

Navigating the web of disinformation and misinfor- mation: Large language models as double-edged swords.IEEE Access, 2024

Siddhant Bikram Shah, Surendrabikram Thapa, Ashish Acharya, Kritesh Rauniyar, Sweta Poudel, Sandesh Jain, Anum Masood, and Usman Naseem. Navigating the web of disinformation and misinfor- mation: Large language models as double-edged swords.IEEE Access, 2024

work page 2024
[31]

Russo- ukrainian war: Prediction and explanation of twitter suspension

Alexander Shevtsov, Despoina Antonakaki, Ioannis Lamprou, Ioan- nis Kontogiorgakis, Polyvios Pratikakis, and Sotiris Ioannidis. Russo- ukrainian war: Prediction and explanation of twitter suspension. In Proceedings of the International Conference on Advances in Social Net- works Analysis and Mining, pages 348–355, 2023. 44

work page 2023
[32]

Discovery and classification of twitter bots.SN Computer Science, 3(3):255, 2022

Alexander Shevtsov, Maria Oikonomidou, Despoina Antonakaki, Polyvios Pratikakis, and Sotiris Ioannidis. Discovery and classification of twitter bots.SN Computer Science, 3(3):255, 2022

work page 2022
[33]

Silalahi, Jonson Manurung, and Bagus Hendra Saputra

Nick Holson M. Silalahi, Jonson Manurung, and Bagus Hendra Saputra. Multimodal deep learning framework for detection and attribution of adversarial information operations on social media platforms.Journal of Defense Technology and Engineering, 2026. 45

work page 2026

[1] [1]

En- hanced propaganda detection in public social media discussions using a fine-tuned deep learning model: A diffusion of innovation perspective

Pir Noman Ahmad, Adnan Muhammad Shah, and KangYoon Lee. En- hanced propaganda detection in public social media discussions using a fine-tuned deep learning model: A diffusion of innovation perspective. Future Internet, 17(5):212, 2025

work page 2025

[2] [2]

An information retrieval language model-based zero- and few-shot learn- ing for propaganda detection in social media content.Knowledge and Information Systems, 68(1):88, 2026

Pir Noman Ahmad, Adnan Muhammad Shah, and KangYoon Lee. An information retrieval language model-based zero- and few-shot learn- ing for propaganda detection in social media content.Knowledge and Information Systems, 68(1):88, 2026

work page 2026

[3] [3]

PhD thesis, Columbia University, 2025

Lin Ai.Towards Trustworthy AI: Detecting, Understanding, and Miti- gating Information Disorder. PhD thesis, Columbia University, 2025

work page 2025

[4] [4]

Too focused on accuracy to notice the fallout: Towards socially responsible fake news 41 detection

Esma Aïmeur, Gilles Brassard, and Dorsaf Sallami. Too focused on accuracy to notice the fallout: Towards socially responsible fake news 41 detection. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, volume 8, pages 55–65, 2025

work page 2025

[5] [5]

A survey of twitter research: Data model, graph structure, sentiment analysis and attacks.Expert Systems with Applications, 164:114006, 2021

Despoina Antonakaki, Paraskevi Fragopoulou, and Sotiris Ioannidis. A survey of twitter research: Data model, graph structure, sentiment analysis and attacks.Expert Systems with Applications, 164:114006, 2021

work page 2021

[6] [6]

Cross-platform digital dis- course analysis of the israel-hamas conflict: Sentiment, topics, and event dynamics, 2025

Despoina Antonakaki and Sotiris Ioannidis. Cross-platform digital dis- course analysis of the israel-hamas conflict: Sentiment, topics, and event dynamics, 2025

work page 2025

[7] [7]

Israel-hamas war through telegram, reddit and twitter, 2025

Despoina Antonakaki and Sotiris Ioannidis. Israel-hamas war through telegram, reddit and twitter, 2025

work page 2025

[8] [8]

Coordinated information dissemination on telegram and reddit during political turbulence: A case study of venezuela in global news channels, 2026

Despoina Antonakaki and Sotiris Ioannidis. Coordinated information dissemination on telegram and reddit during political turbulence: A case study of venezuela in global news channels, 2026

work page 2026

[9] [9]

Cross-platform digital dis- course analysis of iran: Topics, sentiment, polarization, and event vali- dation on telegram and reddit, 2026

Despoina Antonakaki and Sotiris Ioannidis. Cross-platform digital dis- course analysis of iran: Topics, sentiment, polarization, and event vali- dation on telegram and reddit, 2026

work page 2026

[10] [10]

Samaras, Sotiris Ioannidis, and Paraskevi Fragopoulou

Despoina Antonakaki, Dimitris Spiliotopoulos, Christos V. Samaras, Sotiris Ioannidis, and Paraskevi Fragopoulou. Investigating the com- plete corpus of referendum and elections tweets. In2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 100–105. IEEE, 2016

work page 2016

[11] [11]

Samaras, Polyvios Pratikakis, Sotiris Ioannidis, and Paraskevi Fragopoulou

Despoina Antonakaki, Dimitris Spiliotopoulos, Christos V. Samaras, Polyvios Pratikakis, Sotiris Ioannidis, and Paraskevi Fragopoulou. Social media analysis during political turbulence.PLOS ONE, 12(10):e0186836, 2017

work page 2017

[12] [12]

Propaganda by prompt: Trac- ing hidden linguistic strategies in large language models.Information Processing & Management, 63(2):104403, 2026

Arash Barfar and Lee Sommerfeldt. Propaganda by prompt: Trac- ing hidden linguistic strategies in large language models.Information Processing & Management, 63(2):104403, 2026

work page 2026

[13] [13]

Shalin, Nitin Agarwal, and Esra Akbas

Apaar Bawa, Ugur Kursuncu, Dilshod Achilov, Valerie L. Shalin, Nitin Agarwal, and Esra Akbas. Telegram as a battlefield: Kremlin-related communications during the russia-ukraine conflict.Proceedings of the International AAAI Conference on Web and Social Media, 19(1):2361– 2370, 2025. 42

work page 2025

[14] [14]

Generative ai and dis- information: Recent advances, challenges, and opportunities

Kalina Bontcheva, Symeon Papadopoulous, Filareti Tsalakanidou, Riccardo Gallotti, Lidia Dutkiewicz, Noémie Krack, Denis Teyssou, Francesco Severio Nucci, Jochen Spangenberg, Ivan Srba, Patrick Aichroth, Luca Cuccovillo, and Luisa Verdoliva. Generative ai and dis- information: Recent advances, challenges, and opportunities. Technical report, European Digit...

work page 2024

[15] [15]

Influencing belief in llm-based agent networks: An empirically validated simulation of bot-driven manipulation

Julian Burmester. Influencing belief in llm-based agent networks: An empirically validated simulation of bot-driven manipulation. Master’s thesis, Leuphana University Lüneburg, 2026

work page 2026

[16] [16]

Expos- ing cross-platform coordinated inauthentic activity in the run-up to the 2024 u.s

Federico Cinus, Marco Minici, Luca Luceri, and Emilio Ferrara. Expos- ing cross-platform coordinated inauthentic activity in the run-up to the 2024 u.s. election. InProceedings of the ACM Web Conference 2025. ACM, 2025

work page 2024

[17] [17]

Combating disinformation in the age of generativeai: Fromwatermarkingllmstopersuasionanalysisofmemes

Amirhossein Dabiriaghdam. Combating disinformation in the age of generativeai: Fromwatermarkingllmstopersuasionanalysisofmemes. Master’s thesis, University of British Columbia, 2025

work page 2025

[18] [18]

C. Donner. Misinformation detection methods using large language models and evaluation of application programming interfaces. Master’s thesis, Oklahoma State University, 2024. Metadata verified from public index snippets and secondary references

work page 2024

[19] [19]

PhD thesis, University of Bologna, 2024

Margherita Gambini.Digital Sentinels: Unraveling the Societal Impli- cations and Social Media Defence Strategies Against Large Language Models. PhD thesis, University of Bologna, 2024

work page 2024

[20] [20]

Studying disinformation narratives on social media with llms and semantic similarity, 2025

Chaytan Inman. Studying disinformation narratives on social media with llms and semantic similarity, 2025

work page 2025

[21] [21]

Under the influence: A survey of large language models in fake news detection.IEEE Transactions on Artificial Intelligence, 6(2):458– 476, 2025

Soveatin Kuntur, Anna Wróblewska, Marcin Paprzycki, and Maria Ganzha. Under the influence: A survey of large language models in fake news detection.IEEE Transactions on Artificial Intelligence, 6(2):458– 476, 2025

work page 2025

[22] [22]

A comprehensive survey of fake text detection on misinformation and lm-generated texts.IEEE Access, 13:25301–25324, 2025

Soonchan Kwon and Beakcheol Jang. A comprehensive survey of fake text detection on misinformation and lm-generated texts.IEEE Access, 13:25301–25324, 2025

work page 2025

[23] [23]

Exploring crisis-driven social media 43 patterns: A twitter dataset of usage during the russo-ukrainian war

Ioannis Lamprou, Alexander Shevtsov, Despoina Antonakaki, Polyvios Pratikakis, and Sotiris Ioannidis. Exploring crisis-driven social media 43 patterns: A twitter dataset of usage during the russo-ukrainian war. In International Conference on Advances in Social Networks Analysis and Mining, pages 70–85. Springer Nature Switzerland, 2024

work page 2024

[24] [24]

Simulatingmisinformationpropagation in social networks using large language models, 2025

Raj Gaurav Maurya, Vaibhav Shukla, Raj Abhijit Dandekar, Rajat Dandekar, andSreedathPanat. Simulatingmisinformationpropagation in social networks using large language models, 2025

work page 2025

[25] [25]

A survey on the use of large language models (llms) in fake news.Future Internet, 16(8):298, 2024

Eleftheria Papageorgiou, Christos Chronis, Iraklis Varlamis, and Yas- sine Himeur. A survey on the use of large language models (llms) in fake news.Future Internet, 16(8):298, 2024

work page 2024

[26] [26]

Polychronis

C. Polychronis. The anatomy of a propaganda machine: How state- sponsored agents collaborate to pollute and manipulate the information environment. ProQuest, 2025. Dataset details not publicly verified in this draft

work page 2025

[27] [27]

Sentence-bert: Sentence embeddings using siamese bert-networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2019

work page 2019

[28] [28]

From deception to detection: The dual roles of large language models in fake news, 2024

Dorsaf Sallami, Yuan-Chen Chang, and Esma Aïmeur. From deception to detection: The dual roles of large language models in fake news, 2024

work page 2024

[29] [29]

The role of llms in curtailing the spread of disinfor- mation

Diksha Saxena. The role of llms in curtailing the spread of disinfor- mation. Master’s thesis, University at Buffalo, State University of New York, 2024

work page 2024

[30] [30]

Navigating the web of disinformation and misinfor- mation: Large language models as double-edged swords.IEEE Access, 2024

Siddhant Bikram Shah, Surendrabikram Thapa, Ashish Acharya, Kritesh Rauniyar, Sweta Poudel, Sandesh Jain, Anum Masood, and Usman Naseem. Navigating the web of disinformation and misinfor- mation: Large language models as double-edged swords.IEEE Access, 2024

work page 2024

[31] [31]

Russo- ukrainian war: Prediction and explanation of twitter suspension

Alexander Shevtsov, Despoina Antonakaki, Ioannis Lamprou, Ioan- nis Kontogiorgakis, Polyvios Pratikakis, and Sotiris Ioannidis. Russo- ukrainian war: Prediction and explanation of twitter suspension. In Proceedings of the International Conference on Advances in Social Net- works Analysis and Mining, pages 348–355, 2023. 44

work page 2023

[32] [32]

Discovery and classification of twitter bots.SN Computer Science, 3(3):255, 2022

Alexander Shevtsov, Maria Oikonomidou, Despoina Antonakaki, Polyvios Pratikakis, and Sotiris Ioannidis. Discovery and classification of twitter bots.SN Computer Science, 3(3):255, 2022

work page 2022

[33] [33]

Silalahi, Jonson Manurung, and Bagus Hendra Saputra

Nick Holson M. Silalahi, Jonson Manurung, and Bagus Hendra Saputra. Multimodal deep learning framework for detection and attribution of adversarial information operations on social media platforms.Journal of Defense Technology and Engineering, 2026. 45

work page 2026