pith. sign in

arxiv: 2601.18622 · v2 · submitted 2026-01-26 · 💻 cs.CY · cs.SI

Brazilian Social Media Anti-vaccine Information Disorder Dataset -- Telegram (2020-2025)

Pith reviewed 2026-05-16 10:43 UTC · model grok-4.3

classification 💻 cs.CY cs.SI
keywords anti-vaccine misinformationTelegram datasetBrazil vaccinationsocial mediapublic health datamisinformation spreadvaccine hesitancy
0
0 comments X

The pith

This paper releases a dataset of roughly four million Telegram posts from Brazilian anti-vaccine channels collected between 2020 and 2025.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors introduce a large collection of posts gathered from 119 prominent Brazilian Telegram channels focused on anti-vaccine topics. The dataset contains the full text of each message along with posting metadata, attached media files, and labels noting their connection to vaccines. It allows researchers to track the spread, evolution, and effects of misleading vaccine information on public opinion. Providing this resource supports efforts to create strategies that counter misinformation and rebuild confidence in vaccination programs.

Core claim

The paper introduces a curated dataset of about four million Telegram posts collected from 119 prominent Brazilian anti-vaccine channels between 2020 and 2025, including message content, metadata, associated media, and classification related to vaccine posts, to enable examination of how false or misleading information spreads and influences public sentiment.

What carries the argument

The dataset of Telegram posts with content, metadata, media, and vaccine classifications that serves as the resource for analyzing misinformation patterns.

Load-bearing premise

The selected 119 channels represent the main sources of anti-vaccine content on Brazilian Telegram and the collection avoids selection bias or privacy violations.

What would settle it

Discovery of a large volume of anti-vaccine posts from Brazilian Telegram channels outside the 119 included ones would show the dataset does not fully capture the landscape.

Figures

Figures reproduced from arXiv: 2601.18622 by Ana Carolina Monari, Anderson Rocha, Jo\~ao Phillipe Cardenuto, Leopoldo Lusquino Filho, Michelle Diniz Lopes.

Figure 1
Figure 1. Figure 1: Distribution of fact-checking articles related to vaccines per agency published during [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of reported Brazilian vaccine-related fake news by social media platform [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Data collection pipeline. Telethon collects information for each monitored channel, [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Language distribution of the collected posts [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Number of messages collected per month during the data collection period. The last [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
read the original abstract

Over the past decade, Brazil has experienced a decline in vaccination coverage, reversing decades of public health progress achieved through the National Immunization Program (PNI). Growing evidence points to the widespread circulation of vaccine-related misinformation -- particularly on social media platforms -- as a key factor driving this decline. Among these platforms, Telegram remains the only major platform permitting accessible and ethical data collection, offering insight into public channels where vaccine misinformation circulates extensively. This data paper introduces a curated dataset of about four million Telegram posts collected from 119 prominent Brazilian anti-vaccine channels between 2020 and 2025. The dataset includes message content, metadata, associated media, and classification related to vaccine posts, enabling researchers to examine how false or misleading information spreads, evolves, and influences public sentiment. By providing this resource, our aim is to support the scientific and public health community in developing evidence-based strategies to counter misinformation, promote trust in vaccination, and engage compassionately with individuals and communities affected by false narratives. The dataset and documentation are openly available for non-commercial research, under strict ethical and privacy guidelines at https://doi.org/10.25824/redu/5JIVDT

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces a dataset of approximately four million Telegram posts collected from 119 prominent Brazilian anti-vaccine channels spanning 2020-2025. It includes message content, metadata, associated media, and vaccine-related classifications, with the resource released openly under ethical and privacy guidelines at a specified DOI to support research on misinformation spread and its effects on vaccination coverage.

Significance. If the collection is representative, the dataset would offer a substantial, timely resource for studying vaccine-related information disorder on Telegram in Brazil, where declining immunization rates have been linked to social media content. The scale, multi-year window, inclusion of media, and open ethical release under non-commercial terms strengthen its potential utility for public health and computational social science research.

major comments (1)
  1. [Data collection] Data collection section: The manuscript states that posts come from 119 'prominent' Brazilian anti-vaccine channels but provides no explicit inclusion criteria (e.g., subscriber thresholds, search terms, activity filters, manual verification, or overlap with known channel lists). Without these details, systematic selection bias cannot be assessed, directly weakening the central claim that the corpus enables representative study of anti-vaccine content circulation.
minor comments (2)
  1. [Abstract] Abstract and methods: The description of 'classification related to vaccine posts' lacks detail on the labeling process, inter-annotator agreement, or validation steps; adding a brief summary would improve reproducibility.
  2. [Dataset availability] Dataset documentation: Confirm that the released materials include channel metadata (e.g., subscriber counts at collection time) and a clear statement of any temporal or geographic coverage limitations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and recommendation for major revision. We agree that explicit documentation of channel selection is necessary to assess potential biases and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: The manuscript states that posts come from 119 'prominent' Brazilian anti-vaccine channels but provides no explicit inclusion criteria (e.g., subscriber thresholds, search terms, activity filters, manual verification, or overlap with known channel lists). Without these details, systematic selection bias cannot be assessed, directly weakening the central claim that the corpus enables representative study of anti-vaccine content circulation.

    Authors: We acknowledge this limitation in the current version. The 119 channels were selected via a multi-step process: (1) initial identification using Telegram search with terms such as 'anti-vacina Brasil', 'vacina não', 'imunização falsa' and related Portuguese keywords; (2) filtering to channels with at least 5,000 subscribers and average monthly activity of 30+ posts during 2020-2025; (3) manual verification by two domain experts confirming primary focus on anti-vaccine content; and (4) cross-referencing against lists from Brazilian public health reports and prior misinformation studies. We will add a dedicated 'Channel Selection Criteria' subsection with these details, including the exact search strings, subscriber threshold rationale, verification protocol, and any channels excluded, enabling readers to evaluate selection bias and representativeness. revision: yes

Circularity Check

0 steps flagged

Data release paper exhibits no circularity

full rationale

The manuscript is a data paper that describes the curation and release of a Telegram corpus. No derivations, predictions, fitted parameters, or equations are presented anywhere in the text. The central contribution is the external availability of the dataset itself rather than any internal model or claim that reduces to its own inputs by construction. Channel selection is described at a high level but is not part of any derivation chain, so the absence of detailed inclusion rules constitutes a methodological limitation rather than circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that public Telegram channels can be collected ethically and that the chosen channels capture the relevant misinformation ecosystem; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Public Telegram channel data can be collected and shared for non-commercial research under ethical and privacy guidelines
    Stated in the abstract as the basis for release

pith-pipeline@v0.9.0 · 5529 in / 1250 out tokens · 24588 ms · 2026-05-16T10:43:12.544722+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

  1. [1]

    The Brazilian Health System at Crossroads: Progress, Crisis and Resilience

    Adriano Massuda et al. “The Brazilian Health System at Crossroads: Progress, Crisis and Resilience”. In:BMJ Global Health3.4 (2018), e000829.doi:10.1136/bmjgh-2018-000829

  2. [2]

    Brasília, DF: Ministério da Saúde, 2003.url: https : / / bvsms

    Ministério da Saúde.Programa Nacional de Imunizações: 30 Anos do PNI. Brasília, DF: Ministério da Saúde, 2003.url: https : / / bvsms . saude . gov . br / bvs / publicacoes / livro_30_anos_pni.pdf(visited on 10/19/2025)

  3. [3]

    Vaccine Coverage in the Tropics: Sharp Decline in Immunization and Implications for Disease X Preparedness and the UN 2030 Agenda

    Cristina Possas, João Baptista Risi, and Akira Homma. “Vaccine Coverage in the Tropics: Sharp Decline in Immunization and Implications for Disease X Preparedness and the UN 2030 Agenda”. In:Frontiers in Tropical Diseases5 (2024), p. 1441970.doi:10.3389/fitd. 2024.1441970

  4. [4]

    The Impact of COVID-19 on Routine Paediatric Vaccination Delivery in Brazil

    Carolina Braga Moura et al. “The Impact of COVID-19 on Routine Paediatric Vaccination Delivery in Brazil”. In:Vaccine40.15 (2022), pp. 2292–2298.doi:10.1016/j.vaccine. 2022.02.076

  5. [5]

    Instituto Butantan.Como a Hesitação Vacinal Impactou a Rotina de Imunização no Brasil?2024.url: https://butantan.gov.br/noticias/como-a-hesitacao-vacinal- impactou-a-rotina-de-imunizacao-no-brasil(visited on 10/20/2025)

  6. [6]

    Adherence to COVID-19 Vaccination during the Pandemic: The Influence of Fake News

    Luana Cristina Roberto Borges et al. “Adherence to COVID-19 Vaccination during the Pandemic: The Influence of Fake News”. In:Revista Brasileira de Enfermagem77.1 (2024), e20230284.doi:10.1590/0034-7167-2023-0284

  7. [7]

    Impacto das Fake News sobre Vacinação na Mortalidade por COVID-19: Uma Análise Epidemiológica no Brasil

    Adriana Rodrigues da Cunha et al. “Impacto das Fake News sobre Vacinação na Mortalidade por COVID-19: Uma Análise Epidemiológica no Brasil”. In:Revista de Enfermagem da UFPI14.1 (2025), e6151.doi:10.26694/reufpi.v14i1.6151

  8. [8]

    Claire Wardle and Hossein Derakhshan.Information Disorder: Toward an Interdisci- plinary Framework for Research and Policymaking. Tech. rep. Council of Europe, 2017. url: https : / / edoc . coe . int / en / media / 7495 - information - disorder - toward - an - interdisciplinary - framework - for - research - and - policymaking . html(visited on 10/20/2025)

  9. [9]

    Analysis of Causal Relations between Vaccine Hesitancy for COVID-19 Vaccines and Ideological Orientations in Brazil

    Eanes Torres Pereira, Sylvia Iasulaitis, and Bruno Cardoso Greco. “Analysis of Causal Relations between Vaccine Hesitancy for COVID-19 Vaccines and Ideological Orientations in Brazil”. In:Vaccine42.13 (2024), pp. 3263–3271.doi:10.1016/j.vaccine.2024.04.022

  10. [10]

    Fake News and Vaccine Hesitancy in the COVID-19 Pandemic in Brazil

    Claudia Pereira Galhardi et al. “Fake News and Vaccine Hesitancy in the COVID-19 Pandemic in Brazil”. In:Ciência & Saúde Coletiva27.5 (2022), pp. 1849–1858.doi: 10.1590/1413-81232022275.24092021EN

  11. [11]

    Fake News Mediate the Relationship between Sociopolitical Factors and Vaccination Intent in Brazil

    Priscila Muniz de Medeiros and Patrícia Muniz de Medeiros. “Fake News Mediate the Relationship between Sociopolitical Factors and Vaccination Intent in Brazil”. In:Health Promotion International37.6 (2022), daac110.doi:10.1093/heapro/daac110

  12. [12]

    Hugo Abonizio et al.Sabiá-3 Technical Report. 2024. arXiv:2410.12049.url: https: //arxiv.org/abs/2410.12049(visited on 03/09/2025). 13

  13. [13]

    Ergon Cugler de Moraes Silva.Antivax and off-label medication communities on Brazilian Telegram: between esotericism as a gateway and the monetization of false miraculous cures. 2024. arXiv:2408.15308.url: https://arxiv.org/abs/2408.15308 (visited on 03/09/2025)

  14. [14]

    Who Consumes Fake News Consumes Fake News? The Uses and Meanings Attributed to Science and Journalism in Channels about COVID-19 on Telegram

    Ana Carolina Pontalti Monari. “Who Consumes Fake News Consumes Fake News? The Uses and Meanings Attributed to Science and Journalism in Channels about COVID-19 on Telegram”. PhD thesis. Rio de Janeiro, Brazil: Institute of Communication, Scientific, and Technological Information in Health, Oswaldo Cruz Foundation (Fiocruz), 2024. 284 pp

  15. [15]

    2025.url:https://github.com/LonamiWebs/Telethon(visited on 03/09/2025)

    LonamiWebs.Telethon: A Python 3 MTProto Library to Interact with Telegram’s API. 2025.url:https://github.com/LonamiWebs/Telethon(visited on 03/09/2025)

  16. [16]

    Shuyo Nakatani and contributors.langdetect: Language Detection Library for Python. 2020. url:https://pypi.org/project/langdetect/(visited on 10/17/2025)

  17. [17]

    org/faq_channels#q-what-does-the-eye-icon-mean(visited on 10/19/2025)

    Telegram Messenger LLP.What Does the Eye Icon Mean?2025.url: https://telegram. org/faq_channels#q-what-does-the-eye-icon-mean(visited on 10/19/2025)

  18. [18]

    European Union Agency for Cybersecurity (ENISA).Pseudonymisation Techniques and Best Practices: Recommendations on Shaping Technology According to Data Protection and Privacy Provisions. Tech. rep. Publications Office of the European Union, 2019.url: https://www.enisa.europa.eu/publications/pseudonymisation- techniques- and- best-practices(visited on 10/19/2025)

  19. [19]

    2024.url: https://github.com/microsoft/presidio(visited on 10/17/2025)

    Microsoft Corporation.Presidio: Data Protection and Anonymization SDK. 2024.url: https://github.com/microsoft/presidio(visited on 10/17/2025). 14