StanceNakba Shared Task: Actor and Topic-Aware Stance Detection in Public Discourse
Pith reviewed 2026-06-27 10:02 UTC · model grok-4.3
The pith
Transformer models reach 0.962 Macro F1 on English actor stance detection and 0.872 on Arabic cross-topic stance for the Palestinian-Israeli conflict.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The best-performing systems achieved a Macro F1 of 0.9620 on Subtask A and 0.8724 on Subtask B, demonstrating that transformer-based approaches are highly effective for conflict-domain stance detection while highlighting persistent challenges in cross-topic generalization and neutral class prediction.
What carries the argument
Fine-tuned transformer models (MARBERT, AraBERT, DeBERTa-v3 variants) applied to the 2,606 annotated posts across the two subtasks of actor-level and cross-topic stance classification.
If this is right
- Transformer models can be adapted effectively for stance detection in conflict-related social media.
- Ensemble methods and topic-conditioned architectures boost results on both subtasks.
- Cross-topic generalization stays difficult, especially in the Arabic subtask.
- Neutral class prediction remains weaker than pro or against predictions.
- The dataset provides a foundation for building actor-aware and topic-aware stance systems.
Where Pith is reading between the lines
- The same task design could be repeated for other geopolitical conflicts to create comparable test beds.
- Better neutral-class handling might need revised annotation schemes or multi-label output formats.
- High scores open the door to automated monitoring of real-time public discourse on social platforms.
- Testing the same models on additional languages would reveal how far multilingual transfer works here.
Load-bearing premise
The 2,606 social media posts carry stance labels that accurately match the intended categories without substantial annotator bias or collection artifacts.
What would settle it
Re-annotating the same posts with a fresh set of annotators and finding that the top systems then score markedly lower would show the reported performance depends on the original labels.
read the original abstract
We present StanceNakba 2026, a shared task on stance detection in polarized social media discourse related to the Palestinian-Israeli conflict, organized as part of Nakba-NLP 2026 at LREC-COLING 2026. The task introduces two subtasks: Subtask A (Actor-Level Stance Detection), which classifies English social media posts as Pro-Palestine, Pro-Israel, or Neutral; and Subtask B (Cross-Topic Stance Detection), which identifies Favor, Against, or Neither stances in Arabic posts toward two conflict-related topics, normalization with Israel and refugee presence in Jordan. The task is grounded in an annotated dataset of 2,606 social media posts. A total of 7 teams participated in Subtask A and 6 teams in Subtask B. Participating systems primarily fine-tuned Arabic and multilingual transformer-based models, including MARBERT, AraBERT, and DeBERTa-v3 variants, with several teams employing cross-validation, ensemble methods, and topic-conditioned architectures. The best-performing systems achieved a Macro F1 of 0.9620 on Subtask A and 0.8724 on Subtask B, demonstrating that transformer-based approaches are highly effective for conflict-domain stance detection while highlighting persistent challenges in cross-topic generalization and neutral class prediction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents StanceNakba 2026, a shared task on stance detection in polarized social media discourse on the Palestinian-Israeli conflict. It defines two subtasks—Subtask A (actor-level classification of English posts into Pro-Palestine, Pro-Israel, or Neutral) and Subtask B (cross-topic classification of Arabic posts into Favor, Against, or Neither toward normalization and refugee topics)—and reports results from 7 and 6 participating teams, respectively. The best systems, primarily fine-tuned transformers such as MARBERT and DeBERTa-v3, achieve Macro F1 of 0.9620 on Subtask A and 0.8724 on Subtask B on a dataset of 2,606 annotated posts.
Significance. If the gold labels are reliable, the reported scores establish that transformer-based models can achieve strong performance on conflict-domain stance detection and supply a useful benchmark dataset and baseline results for the community. The work also surfaces concrete remaining difficulties in cross-topic generalization and neutral-class prediction.
major comments (2)
- [Abstract] Abstract: the central performance claims rest on Macro F1 scores of 0.9620 and 0.8724, yet the abstract supplies no annotation guidelines, inter-annotator agreement figures, sampling procedure, annotator demographics, or statistical significance tests, leaving the robustness of all reported results impossible to evaluate.
- [Dataset] Dataset description (implied by the statement that the task is 'grounded in an annotated dataset of 2,606 social media posts'): without IAA, annotation protocol, or external validation, the load-bearing assumption that the labels accurately reflect the intended Pro-Palestine/Pro-Israel/Neutral and Favor/Against/Neither categories cannot be assessed, directly undermining the conclusion that the models are 'highly effective'.
Simulated Author's Rebuttal
We thank the referee for these comments. We agree that the abstract and dataset description should provide more information on the annotation process to support the reported results. We will revise the manuscript to address this.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central performance claims rest on Macro F1 scores of 0.9620 and 0.8724, yet the abstract supplies no annotation guidelines, inter-annotator agreement figures, sampling procedure, annotator demographics, or statistical significance tests, leaving the robustness of all reported results impossible to evaluate.
Authors: We agree. The abstract will be revised to include a short statement on the annotation process, including that the posts were sampled from public social media and annotated by native speakers following detailed guidelines. We will also ensure that statistical significance is reported in the results section. revision: yes
-
Referee: [Dataset] Dataset description (implied by the statement that the task is 'grounded in an annotated dataset of 2,606 social media posts'): without IAA, annotation protocol, or external validation, the load-bearing assumption that the labels accurately reflect the intended Pro-Palestine/Pro-Israel/Neutral and Favor/Against/Neither categories cannot be assessed, directly undermining the conclusion that the models are 'highly effective'.
Authors: We agree that the dataset description needs to be more comprehensive. In the revised manuscript, we will add a detailed subsection on the annotation protocol, inter-annotator agreement, sampling procedure, annotator demographics, and any validation steps to allow readers to assess the label quality. revision: yes
Circularity Check
No circularity: descriptive shared-task report with no derivations
full rationale
The paper is a report on organizing a shared task and summarizing external team submissions. It contains no equations, derivations, first-principles predictions, fitted parameters renamed as outputs, or self-citation chains that reduce the central claims to inputs by construction. The Macro F1 scores are reported results from participating systems on an annotated dataset; the effectiveness conclusion follows directly from those external results rather than any internal modeling step. Annotation quality is a standard data assumption but does not constitute circularity under the defined patterns, as no derivation reduces to it.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Catalan Speecon database
Speecon Consortium. Catalan Speecon database. 2011
2011
-
[2]
The EMILLE/CIIL Corpus
Anthony McEnery and others. The EMILLE/CIIL Corpus. 2004
2004
-
[3]
The OrienTel Moroccan MCA (Modern Colloquial Arabic) database
Khalid Choukri and Niklas Paullson. The OrienTel Moroccan MCA (Modern Colloquial Arabic) database. 2004
2004
-
[4]
ItalWordNet v.2
Roventini, Adriana and Marinelli, Rita and Bertagna, Francesca. ItalWordNet v.2
-
[5]
Proceedings of LREC-COLING 2024 , year=
The FIGNEWS Shared Task on News Media Narratives , author=. Proceedings of LREC-COLING 2024 , year=
2024
-
[6]
Proceedings of the Workshop on NLP for Political Sciences at LREC-COLING 2024 , year=
Analyzing Conflict Through Data: A Dataset on the Digital Framing of Sheikh Jarrah Evictions , author=. Proceedings of the Workshop on NLP for Political Sciences at LREC-COLING 2024 , year=
2024
-
[7]
Proceedings of ICWSM 2025 , year=
Analyzing Digital Polarization on Hijab: A Dataset of Annotated YouTube Comments , author=. Proceedings of ICWSM 2025 , year=
2025
-
[8]
Proceedings of LREC-COLING 2024 , year=
So Hateful! Building a Multi-Label Hate Speech Annotated Arabic Dataset , author=. Proceedings of LREC-COLING 2024 , year=
2024
-
[9]
CLEF 2024 Working Notes , year=
Overview of the CLEF-2024 CheckThat! Lab Task on Subjectivity in News Articles , author=. CLEF 2024 Working Notes , year=
2024
-
[10]
Stance detection in Arabic with a multi-dialectal cross-domain stance corpus , volume =
Charfi, Anis and Bessghaier, Mabrouka and Atalla, Andria and Akasheh, Raghda and Al-Emadi, Sara and Zaghouani, Wajdi , year =. Stance detection in Arabic with a multi-dialectal cross-domain stance corpus , volume =. Social Network Analysis and Mining , doi =
-
[11]
The Limits of Interpretation
Umberto Eco. The Limits of Interpretation
-
[12]
Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards
Jannik Strötgen and Michael Gertz. Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). 2012
2012
-
[13]
Chercheur
J.L. Chercheur. Case-Based Reasoning. 1994
1994
-
[14]
Castor and L
A. Castor and L. E. Pollux. The use of user modelling to guide inference and learning. Applied Intelligence. 1992
1992
-
[15]
Superman and B
S. Superman and B. Batman and C. Catwoman and S. Spiderman. Superheroes experiences with books. Journal journal journal
-
[16]
Elementary Statistics
Paul Gerhard Hoel. Elementary Statistics. 1971
1971
-
[17]
1954--58
A history of technology. 1954--58
1954
-
[18]
N. Chomsky. Conditions on Transformations. A festschrift for Morris Halle. 1973
1973
-
[19]
Natural Fibre Twines
BSI. Natural Fibre Twines. 1973
1973
-
[20]
Language: Its Nature, Development, and Origin
Otto Jespersen. Language: Its Nature, Development, and Origin
-
[21]
Advances in Neural Information Processing Systems , volume=
Beavertails: Towards improved safety alignment of llm via a human-preference dataset , author=. Advances in Neural Information Processing Systems , volume=
-
[22]
0: A diverse ai safety dataset and risks taxonomy for alignment of llm guardrails , author=
Aegis2. 0: A diverse ai safety dataset and risks taxonomy for alignment of llm guardrails , author=. arXiv preprint arXiv:2501.09004 , year=
-
[23]
arXiv preprint arXiv:2401.13136 , year=
The language barrier: Dissecting safety challenges of llms in multilingual contexts , author=. arXiv preprint arXiv:2401.13136 , year=
-
[24]
Findings of the Association for Computational Linguistics: ACL 2024 , pages=
ArabicMMLU: Assessing massive multitask language understanding in Arabic , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=
2024
-
[25]
Proceedings of the 31st International Conference on Computational Linguistics , pages=
Aradice: Benchmarks for dialectal and cultural capabilities in llms , author=. Proceedings of the 31st International Conference on Computational Linguistics , pages=
-
[26]
Wang, Wenxuan and Tu, Zhaopeng and Chen, Chang and Yuan, Youliang and Huang, Jen-tse and Jiao, Wenxiang and Lyu, Michael , editor =. All. Findings of the. 2024 , pages =. doi:10.18653/v1/2024.findings-acl.349 , abstract =
-
[27]
Chiang, Cheng-Han and Lee, Hung-yi , month = may, year =. Can. doi:10.48550/arXiv.2305.01937 , abstract =
-
[28]
Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016) , pages=
Semeval-2016 task 6: Detecting stance in tweets , author=. Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016) , pages=
2016
-
[29]
Information Processing & Management , volume=
Stance detection on social media: State of the art and trends , author=. Information Processing & Management , volume=. 2021 , publisher=
2021
-
[30]
Bert: Pre-training of deep bidirectional transformers for language understanding , author=. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) , pages=
2019
-
[31]
Proceedings of the 4th workshop on open-source arabic corpora and processing tools, with a shared task on offensive language detection , pages=
Arabert: Transformer-based model for arabic language understanding , author=. Proceedings of the 4th workshop on open-source arabic corpora and processing tools, with a shared task on offensive language detection , pages=
-
[32]
ARBERT & MARBERT: Deep bidirectional transformers for Arabic , author=. Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers) , pages=
-
[33]
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP) , pages=
Mawqif: A multi-label arabic dataset for target-specific stance detection , author=. Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP) , pages=
-
[34]
Proceedings of the Second Arabic Natural Language Processing Conference , pages=
Stanceeval 2024: The first arabic stance detection shared task , author=. Proceedings of the Second Arabic Natural Language Processing Conference , pages=
2024
-
[35]
ArXiv , year=
Taking sides: Public Opinion over the Israel-Palestine Conflict in 2021 , author=. ArXiv , year=
2021
-
[36]
2024 IEEE International Conference on Big Data (BigData) , pages=
In the Eyes of the Bystander: Are the Stances on Different Conflicts Correlated? , author=. 2024 IEEE International Conference on Big Data (BigData) , pages=. 2024 , organization=
2024
-
[37]
Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing , author=. arXiv preprint arXiv:2111.09543 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[38]
Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
Le, Minh-Hoang , title =. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
-
[39]
Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
Nairat, Alaa and Nairat, Aysar Mahmoud , title =. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
-
[40]
Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
Zayet, Tasnim and Hamed, Osama and Duridi, Tasneem , title =. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
-
[41]
Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
Bahgat, Mohammed and Salah, Doaa and Yassine, Sarah , title =. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
-
[42]
Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
Hamdan, Nancy and Jouni, Aya and Sa. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
-
[43]
and Khalil, Enas A
El-Kassas, Wafaa S. and Khalil, Enas A. Hakim and El Houby, Enas M.F. , title =. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
-
[44]
Milon and Hossain, Sk
Al Shafi, Abdullah and Islam, Md. Milon and Hossain, Sk. Imran and Hasan, K. M. Azharul , title =. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
-
[45]
Shakhoyat Rahman and Jim, MD Jahid Hasan and Islam, Md
Shujon, Md. Shakhoyat Rahman and Jim, MD Jahid Hasan and Islam, Md. Milon and Haque, Md Rezwanul and Karray, Fakhri , title =. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
-
[46]
Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
Gabr, Shrouk Anwar and Ragab, Mohamed Ibrahim , title =. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
-
[47]
Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
Qindeel, Asmaa and Khaled, Toka and Balah, Batool and Elrefai, Eman and Fawzi, Mahmoud , title =. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC'26) , month = May, year =
-
[48]
2026 , address =
P roceedings of the second I nternational W orkshop on N akba N arratives as L anguage R esources. 2026 , address =
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.