pith. sign in

arxiv: 2605.20967 · v1 · pith:EC66PA43new · submitted 2026-05-20 · 💻 cs.CL

ArPoMeme: An Annotated Arabic Multimodal Dataset for Political Ideology and Polarization

Pith reviewed 2026-05-21 05:37 UTC · model grok-4.3

classification 💻 cs.CL
keywords Arabic political memesideological polarizationmultimodal datasethostility analysismobilization cuesus vs them framingsatirical memesFacebook data collection
0
0 comments X

The pith

Arabic political memes dataset reveals Islamist and satirical ones carry highest hostility and mobilization cues.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ArPoMeme, a dataset of roughly 7,300 Arabic political memes collected from public Facebook pages and groups. Memes receive ideological labels for Leftist, Islamist, Pan-Arabist, and Satirical orientations drawn from the self-identification of their source pages. Text is extracted from images using a vision-language model and then manually annotated for three polarization features: us-versus-them framing, hostility toward out-groups, and calls to action. Quantitative results show clear differences across ideological categories, with Islamist and satirical memes displaying the strongest antagonistic framing and mobilization signals. The resource supports detailed examination of how image and text combine to express political positions in Arabic online spaces.

Core claim

The paper establishes ArPoMeme as a multimodal dataset linking visual content, extracted text, ideological categories based on source-group self-identification, and annotations across three polarization dimensions, with analysis demonstrating that Islamist and satirical memes exhibit the highest levels of hostility and mobilization cues.

What carries the argument

ArPoMeme dataset built from Facebook-sourced memes, with ideological labels from page self-identification and manual annotations for Us vs. Them framing, out-group hostility, and calls to action.

If this is right

  • The dataset supports development of automated tools for identifying ideological content and polarization in Arabic multimodal media.
  • Researchers can compare communication tactics such as hostility levels and mobilization language across different political orientations.
  • The annotated resource enables reproducible studies of how humor and imagery function in Arabic political discourse.
  • The collection and annotation pipeline can be extended to larger volumes or additional languages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The observed asymmetries might be tested by collecting similar memes during specific political events to see whether hostility patterns intensify or shift.
  • Cross-comparison with political meme datasets in other languages could identify whether high hostility in satirical and religious-ideological content is language-specific or more general.
  • The hostility and call-to-action annotations could serve as training signals for systems that monitor escalation risks in online political content.

Load-bearing premise

The self-identification of public Facebook pages and groups accurately reflects the ideological orientation of the memes they produce and disseminate.

What would settle it

Re-annotation of a random sample of memes by labelers given only the image and text, without knowledge of the original source page or group, followed by comparison to the existing ideological and polarization labels.

Figures

Figures reproduced from arXiv: 2605.20967 by Fadhl Eryani, Kais Attia, Md. Rafiul Biswas, Wajdi Zaghouani.

Figure 6
Figure 6. Figure 6: Share of non-political memes in each ideological category (%). are prevalent features, explicit mobilization remains rare. The presence of non-political memes within certain categories highlights the nuanced bound￾The ArPoMeme dataset offers an important re￾source for studying how visual and textual ele￾ments interact to shape political communication in the Arabic-speaking world. By providing a trans￾paren… view at source ↗
read the original abstract

Memes have become a prominent medium of political communication in the Arab world, reflecting how humor, imagery, and text interact to express ideological and cultural positions. Despite the centrality of memes to online political discourse, there is a lack of systematically curated resources for analyzing their multimodal and ideological dimensions in Arabic. This paper presents ArPoMeme, a large-scale dataset of approximately 7,300 Arabic political memes categorized by ideological orientation, including Leftist, Islamist, Pan-Arabist, and Satirical perspectives. The dataset captures the diversity of Arabic meme ecosystems by grounding classification in the self-identification of public Facebook pages and groups that produce and disseminate these memes. To ensure both scale and accuracy, we designed a semi-automated data collection pipeline combining Playwright-based Facebook scraping with Google Drive synchronization, followed by text extraction using the Qwen2.5-VL-7B vision language model. The extracted text was manually verified and annotated for three polarization dimensions: Us vs. Them framing, Hostility toward out-groups, and Calls to action. Annotation was conducted through a custom Streamlit-based interface supporting distributed labeling, real-time tracking, and version control. The resulting dataset links visual content, textual messages, and ideological orientation, enabling fine-grained analysis of political antagonism, mobilization, and humor. Quantitative analysis of the annotated corpus reveals strong asymmetries in antagonistic framing across ideological groups, with Islamist and satirical memes exhibiting the highest levels of hostility and mobilization cues. The dataset and the annotation tool offers a reproducible and publicly available resource for studying Arabic political discourse, multimodal ideology detection, and polarization dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces ArPoMeme, a dataset of approximately 7,300 Arabic political memes scraped from Facebook pages and groups. Memes are categorized into four ideological orientations (Leftist, Islamist, Pan-Arabist, Satirical) based solely on the self-identification of the source pages. A semi-automated pipeline using Playwright scraping, Qwen2.5-VL-7B for text extraction, and a custom Streamlit interface for manual annotation of three polarization dimensions (Us vs. Them framing, Hostility toward out-groups, Calls to action) is described. Quantitative analysis reports strong asymmetries, with Islamist and satirical memes showing the highest levels of hostility and mobilization cues. The dataset and annotation tool are made publicly available.

Significance. This resource addresses a clear gap in multimodal datasets for Arabic political discourse and polarization analysis. The scale, public release, and combination of visual content with ideological and polarization annotations could support downstream work in computational social science, ideology detection, and cross-cultural NLP if the labels prove reliable. The semi-automated collection and annotation tooling are practical strengths.

major comments (2)
  1. [Data collection and ideological categorization] The ideological buckets used for all quantitative comparisons are defined exclusively by self-declared page/group orientation (see abstract and data collection description). No sample-based content validation, manual spot-checks, or inter-rater review is reported to confirm that individual memes actually express the assigned ideology. If a non-negligible fraction of content deviates from the page's stated stance, the reported asymmetries in Us-vs-Them framing, hostility, and mobilization cues across the four groups become difficult to interpret.
  2. [Annotation process] The manual annotation process for the three polarization dimensions is described at a high level but provides no inter-annotator agreement statistics (e.g., Cohen's or Fleiss' kappa), number of annotators per meme, adjudication procedure, or any reported validation metrics. This information is necessary to evaluate the reliability of the labels that underpin the central quantitative claims.
minor comments (2)
  1. The exact total count and per-category breakdown of the 7,300 memes should be stated precisely in the main text or a summary table rather than approximated in the abstract.
  2. A dedicated limitations section discussing scraping biases, potential page mislabeling, and the subjectivity of hostility/mobilization annotations would strengthen the manuscript.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify important aspects of our data curation and annotation methodology. We respond to each major comment below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Data collection and ideological categorization] The ideological buckets used for all quantitative comparisons are defined exclusively by self-declared page/group orientation (see abstract and data collection description). No sample-based content validation, manual spot-checks, or inter-rater review is reported to confirm that individual memes actually express the assigned ideology. If a non-negligible fraction of content deviates from the page's stated stance, the reported asymmetries in Us-vs-Them framing, hostility, and mobilization cues across the four groups become difficult to interpret.

    Authors: We agree that explicit validation of alignment between page-level self-identification and meme content would improve interpretability. Our categorization follows established practices in large-scale social media studies, where source accounts are selected for their documented ideological positions to capture contextual discourse. To address the concern directly, we will add a new subsection describing a post-hoc manual validation: a stratified random sample of 300 memes (75 per ideological category) was independently reviewed by two annotators for consistency with the assigned label, with agreement rates and discrepancy examples reported in the revised manuscript. revision: yes

  2. Referee: [Annotation process] The manual annotation process for the three polarization dimensions is described at a high level but provides no inter-annotator agreement statistics (e.g., Cohen's or Fleiss' kappa), number of annotators per meme, adjudication procedure, or any reported validation metrics. This information is necessary to evaluate the reliability of the labels that underpin the central quantitative claims.

    Authors: We acknowledge that additional details on the annotation protocol are needed for assessing label quality. Annotation was performed by three native Arabic-speaking annotators with political science expertise using the Streamlit interface. Each meme received annotations from at least two annotators, with final labels determined by majority vote and adjudication by a lead annotator for disagreements. We will expand the annotation section in the revision to include these specifics and report Fleiss' kappa on the subset of memes with full overlap to quantify agreement. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset curation and annotation pipeline is self-contained

full rationale

The paper presents a resource-creation effort: scraping memes from public Facebook pages whose self-declared ideological labels (Leftist, Islamist, Pan-Arabist, Satirical) are taken as given, followed by VLM text extraction, manual verification, and annotation for three polarization dimensions via a Streamlit interface. No equations, fitted parameters, or statistical predictions appear; the reported asymmetries in hostility and mobilization cues are direct tabulations over the resulting annotated corpus. Because the classification step is an input assumption rather than a derived output, and no subsequent claim reduces to that assumption by algebraic or statistical construction, the derivation chain contains no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central contribution rests on assumptions about accurate ideological labeling from source self-identification and reliable manual annotation after automated text extraction; no free parameters or new invented entities are introduced.

axioms (1)
  • domain assumption Self-identification of public Facebook pages and groups accurately reflects the ideological orientation of the memes they produce and disseminate
    This premise directly grounds the categorization into Leftist, Islamist, Pan-Arabist, and Satirical groups.

pith-pipeline@v0.9.0 · 5833 in / 1332 out tokens · 48641 ms · 2026-05-21T05:37:11.854851+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 1 internal anchor

  1. [1]

    self -identification,

    Introduction The rapid growth of digital communication has transformed memes into a central medium of cul - tural and political expression. Originally circulated primarily for humor, Internet memes have increas- ingly become politicized, serving as vehicles for commentary, criticism, and ideological positioning (Johann and Bülow, 2019; SHIFMAN, 2014). Dra...

  2. [2]

    Within the Ara- bic context, this line of inquiry is still emerging, with recent efforts focusing mainly on harmful or manipu- lative dimensions of meme content

    Related Work Research on memes as multimodal texts combin- ing language and imagery has expanded rapidly across NLP and social computing. Within the Ara- bic context, this line of inquiry is still emerging, with recent efforts focusing mainly on harmful or manipu- lative dimensions of meme content. The first major contribution is ArMeme (Alam et al., 2024...

  3. [3]

    In- filtrated leftist cadres

    Data Collection and Processing Our data collection process involves several steps as highlighted in Figure 2. The process followed a semi-automatic pipeline designed to target ide - ologically explicit meme sources. In contrast to large-scale scraping of heterogeneous content, we focused on Facebook pages that explicitly self - identified with a particula...

  4. [4]

    Session Authentication: A secure login ses - sion was saved locally, enabling automated navigation across Facebook albums without repeated manual logins

  5. [5]

    This ensured that screenshots consistently captured only the relevant content area, ex - cluding surrounding interface elements

    Bounding Box Selection: At the start of each run, a bounding box was manually drawn over the screen region containing meme content. This ensured that screenshots consistently captured only the relevant content area, ex - cluding surrounding interface elements

  6. [6]

    Screen- shots were automatically named using UUIDs to prevent duplication

    Automated Navigation and Capture: The script then navigated through the album us - ing keyboard commands, capturing sequential 1www.facebook.com/AlHudoodNet/ 2www.facebook.com/Kimorganization/ 3www.facebook.com/profile.php?id= 100089592260983 4www.facebook.com/profile.php?id= 100063604605426 screenshots of all memes displayed. Screen- shots were automatic...

  7. [7]

    This ensured centralized, organized storage and facilitated subsequent annotation and processing

    Cloud Storage Integration: Each screenshot was uploaded to a dedicated subfolder on Google Drive corresponding to the ideologi - cal source page. This ensured centralized, organized storage and facilitated subsequent annotation and processing

  8. [8]

    Extract and transcribe all the text you can see in this image

    Logging and Resume Functionality: A log was maintained for each album, recording the last captured URL. This enabled resumption of in- terrupted collection sessions without redun - dancy. This hybrid manual–automated workflow offered both precision (via manual bounding box calibra - tion) and scalability (via automated capture and upload), allowing us to ...

  9. [9]

    Us -vs-Them

    Annotation Guidelines This annotation task aims to measure political po- larization in Arabic political memes through a mul- Figure 2: Data curation pipeline timodal lens that jointly considers textual and vi - sual elements. Polarization is defined as com - munication that intensifies political division, fos - ters antagonism between groups, or encourage...

  10. [10]

    does not show an explicit us-vs-them dynamic

    Data Annotation Quality Control This section describes data annotation quality and measurement across manual annotation and model performance. 5.1. Inter-Annotator Agreement To assess annotation reliability, we measured inter- annotator agreement (IAA) on a stratified sample of 100 memes (25 from each ideological category), in- dependently annotated by th...

  11. [11]

    Over- all, the results indicate that polarization is most salient in Islamic memes, where 66% exhibit clear Us-vs-Them framing and 39% display Hostility To- ward an Out -Group

    Analysis Figure 5 illustrates the distribution of polarization dimensions across ideological categories. Over- all, the results indicate that polarization is most salient in Islamic memes, where 66% exhibit clear Us-vs-Them framing and 39% display Hostility To- ward an Out -Group. In contrast, Left Wing and Pan-Arabist memes show moderate levels of an - t...

  12. [12]

    Ac- cess to the dataset requires completion of a request form1

    Data and Code Availability The ArPoMeme dataset is publicly available. Ac- cess to the dataset requires completion of a request form1. The dataset is released strictly for research purposes. All materials are distributed under the Creative Commons Attribution –NonCommercial– ShareAlike 4.0 International License (CC BY -NC- SA 4.0). Users must comply with ...

  13. [13]

    Figure 6: Share of non -political memes in each ideological category (%)

    Broader Impact Figure 5: Polarization dimensions across ideologi- cal categories (%). Figure 6: Share of non -political memes in each ideological category (%). are prevalent features, explicit mobilization remains rare. The presence of non -political memes within certain categories highlights the nuanced bound- The ArPoMeme dataset offers an important re ...

  14. [14]

    Conclusion This paper introduced ArPoMeme, a large -scale dataset of Arabic political memes designed to ad - vance the study of ideological polarization in multi- modal discourse. By combining automated collec- tion, human verification, and structured annotation, the dataset provides a comprehensive resource for analyzing the intersections of language, im...

  15. [15]

    To address this, we employed an automatic filtering step based on text detection: images that did not contain detectable overlaid text were classified as non -memes and excluded

    Limitations A key challenge in the construction of ArPoMeme lies in distinguishing memes from non -meme im- ages within the collected data. To address this, we employed an automatic filtering step based on text detection: images that did not contain detectable overlaid text were classified as non -memes and excluded. While this method provides a scalable ...

  16. [16]

    No pri- vate data, user comments, or personal iden - tifiers were accessed or stored at any point in the collection process

    Ethical Considerations All content included in the ArPoMeme dataset was collected exclusively from publicly accessible Facebook pages that explicitly identify themselves as political, activist, or satirical sources. No pri- vate data, user comments, or personal iden - tifiers were accessed or stored at any point in the collection process. Each meme was ca...

  17. [17]

    Qwen2.5-VL Technical Report

    Bibliographical References Firoj Alam, Md. Rafiul Biswas, Uzair Shah, Wajdi Zaghouani, and Georgios Mikros. 2025. Propa- ganda to hate: A multimodal analysis of arabic memes with multi-agent llms. In Web Information Systems Engineering – WISE 2024, pages 380– 390, Singapore. Springer Nature Singapore. Firoj Alam, Abul Hasnat, Fatema Ahmad, Md Arid Hasan, ...

  18. [18]

    Anatolii Shestakov and Wajdi Zaghouani

    Association for Computational Linguistics (ACL). Anatolii Shestakov and Wajdi Zaghouani. 2024. An- alyzing conflict through data: A dataset on the digital framing of sheikh jarrah evictions. In Pro- ceedings of the Second Workshop on Natural Language Processing for Political Sciences @ LREC-COLING 2024, pages 55–67, Torino, Italia. ELRA and ICCL. LIMOR SH...

  19. [19]

    In Proceedings of the 45th In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, page 2887–2899, New York, NY, USA

    Met-meme: A multimodal meme dataset rich in metaphors. In Proceedings of the 45th In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, page 2887–2899, New York, NY, USA. As- sociation for Computing Machinery. Wajdi Zaghouani, Hamdy Mubarak, and Md. Rafiul Biswas. 2024. So hateful! building a multi - label h...