arxiv: 2605.06231 · v2 · submitted 2026-05-07 · 💻 cs.CL

Recognition: no theorem link

YEZE at SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization via Heterogeneous Ensembling

Fengze Guo , Yue Chang (University of T\"ubingen)

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:12 UTC · model grok-4.3

classification 💻 cs.CL

keywords multilingual polarization detectionSemEval taskheterogeneous ensembleclass weightingindependent task modelingsocial media analysis

0 comments

The pith

Independent task modeling combined with class weighting outperforms multi-task learning for detecting online polarization in 22 languages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents a system for SemEval-2026 Task 9 that detects polarized social media content in 22 languages. It uses a heterogeneous ensemble of pretrained models and explores multi-task learning, data augmentation, and class weighting to address challenges like label imbalance. The key finding is that modeling each of the three subtasks independently and applying class weighting leads to better performance than combined multi-task approaches.

Core claim

The authors establish that independent task modeling combined with class weighting is more effective than multi-task learning for the subtasks of binary polarization detection, target classification, and manifestation identification in a multilingual setting.

What carries the argument

A heterogeneous ensemble of XLM-RoBERTa-large and mDeBERTa-v3-base models, applied with independent per-subtask training and class weighting to counter severe label imbalance.

Load-bearing premise

The performance improvements from independent task modeling and class weighting will hold on the official test set and generalize to other data distributions.

What would settle it

Evaluating the independent modeling system against the multi-task system on the official SemEval-2026 Task 9 test set and finding that the latter performs better would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.06231 by Fengze Guo, Yue Chang (University of T\"ubingen).

**Figure 1.** Figure 1: Positive rates for Subtask 1 across languages in the merged train+dev set. view at source ↗

**Figure 2.** Figure 2: Dataset imbalance analysis on the merged train+dev set. (a) Task-level skew (Language view at source ↗

**Figure 3.** Figure 3: Post-hoc Macro-F1 by language and subtask on the released test gold labels. view at source ↗

**Figure 4.** Figure 4: Supplementary post-hoc visualizations for Subtask 2. view at source ↗

**Figure 5.** Figure 5: Supplementary post-hoc visualizations for Subtask 3. view at source ↗

read the original abstract

This paper presents our system for SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization, which identifies polarized social media content in 22 languages through three subtasks: binary detection, target classification, and manifestation identification. We propose a heterogeneous ensemble of multilingual pretrained models, combining XLM-RoBERTa-large and mDeBERTa-v3-base. We investigate techniques such as multi-task learning, translation-based data augmentation, and class weighting to improve classification performance under severe label imbalance. Our findings indicate that independent task modeling combined with class weighting is more effective.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper presents the YEZE system for SemEval-2026 Task 9 on detecting multilingual, multicultural, and multievent online polarization in 22 languages across three subtasks (binary detection, target classification, and manifestation identification). It proposes a heterogeneous ensemble combining XLM-RoBERTa-large and mDeBERTa-v3-base, and examines multi-task learning, translation-based data augmentation, and class weighting to address severe label imbalance. The central empirical finding is that independent task modeling combined with class weighting outperforms the multi-task and other variants tested.

Significance. If the results hold on the test set, the work offers a practical, reproducible system description for a multilingual classification task with class imbalance. It applies standard techniques (heterogeneous ensembling of pretrained models and class weighting) without unsupported generalizations, providing a useful reference for similar shared-task settings in computational linguistics.

minor comments (1)

[Abstract] The abstract states the key finding but provides no quantitative metrics, baselines, ablation results, or statistical tests, which weakens immediate assessment of the claim's strength (though the full manuscript presumably contains these in the experiments section).

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and recommendation of minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; purely empirical system description

full rationale

The manuscript is a standard SemEval shared-task system paper. It reports experiments applying heterogeneous ensembling of XLM-RoBERTa and mDeBERTa, multi-task learning, translation augmentation, and class weighting on the provided task data, then states the empirical observation that independent modeling plus class weighting performed best. No equations, derivations, or theoretical claims appear. No load-bearing self-citations or uniqueness theorems are invoked. The central finding is a direct experimental result on the authors' runs and does not reduce to any fitted parameter or prior self-citation by construction. This is the expected non-circular outcome for empirical system descriptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard assumption that pretrained multilingual transformers transfer useful features to polarization detection and that class weighting corrects for label imbalance without introducing new bias.

axioms (2)

domain assumption Pretrained multilingual models capture relevant features for polarization detection across languages
Invoked implicitly by the choice of XLM-RoBERTa and mDeBERTa as base models.
domain assumption Class weighting improves performance under severe label imbalance without harming generalization
Stated as part of the effective configuration in the abstract.

pith-pipeline@v0.9.0 · 5408 in / 1208 out tokens · 42868 ms · 2026-05-12T01:12:32.300762+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

[3]

2018 , eprint =

Focal Loss for Dense Object Detection , author =. 2018 , eprint =

work page 2018
[4]

2026 , publisher =

Naseem, Usman and Geislinger, Robert and Ren, Juan and Kohail, Sarah and Garrido Veliz, Rudy and Sam Sahil, P and Zhang, Yiran and Stranisci, Marco Antonio and Abdulmumin, Idris and Alacam, Özge and Acarürk, Cengiz and Jabr, Aisha and Anwar, Saba and Ayele, Abinew Ali and Tutubalina, Elena and Htet, Aung Kyaw and Wang, Xintong and Thapa, Surendrabikram an...

work page 2026
[5]

2026 , eprint=

POLAR: A Benchmark for Multilingual, Multicultural, and Multi-Event Online Polarization , author=. 2026 , eprint=

work page 2026
[6]

ECML/PKDD , year =

On the Stratification of Multi-label Data , author =. ECML/PKDD , year =

work page
[12]

Gradient Surgery for Multi-Task Learning , url =

Yu, Tianhe and Kumar, Saurabh and Gupta, Abhishek and Levine, Sergey and Hausman, Karol and Finn, Chelsea , booktitle =. Gradient Surgery for Multi-Task Learning , url =

work page
[13]

, title =

Howard, Jeffrey W. , title =. Annual Review of Political Science , volume =. 2019 , month = may, doi =

work page 2019
[16]

2018 , eprint=

Political Discourse on Social Media: Echo Chambers, Gatekeepers, and the Price of Bipartisanship , author=. 2018 , eprint=

work page 2018
[17]

Bail and Lisa P

Christopher A. Bail and Lisa P. Argyle and Taylor W. Brown and John P. Bumpus and Haohan Chen and M. B. Fallin Hunzaker and Jaemin Lee and Marcus Mann and Friedolin Merhout and Alexander Volfovsky , title =. Proceedings of the National Academy of Sciences , volume =. 2018 , doi =

work page 2018
[18]

Silenced voices: social media polarization and women’s marginalization in peacebuilding during the

Adem Chanie Ali and Seid Muhie Yimam and Abinew Ali Ayele and Chris Biemann and Martin Semmann , pages =. Silenced voices: social media polarization and women’s marginalization in peacebuilding during the. i-com , doi =

work page
[19]

2025 , month =

Polarization Footprint Europe Report , institution =. 2025 , month =

work page 2025
[20]

Proceedings of the Eleventh International AAAI Conference on Web and Social Media (ICWSM) , pages=

Automated hate speech detection and the problem of offensive language , author=. Proceedings of the Eleventh International AAAI Conference on Web and Social Media (ICWSM) , pages=. 2017 , doi=

work page 2017
[22]

R. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , pages=. 2021 , url=

work page 2021
[25]

, journal=

He, Haibo and Garcia, Edwardo A. , journal=. Learning from Imbalanced Data , year=

work page
[26]

Attention is All you Need , url =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =

work page
[28]

Machine Learning , volume=

Multitask Learning , author=. Machine Learning , volume=. 1997 , doi=

work page 1997
[29]

Adem Chanie Ali, Seid Muhie Yimam, Abinew Ali Ayele, Chris Biemann, and Martin Semmann. 2025. https://doi.org/doi:10.1515/icom-2025-0007 Silenced voices: social media polarization and women’s marginalization in peacebuilding during the N orthern E thiopia W ar . i-com, 24(2):407--432

work page doi:10.1515/icom-2025-0007 2025
[30]

doi:10.1073/pnas.1804840115 , author =

Christopher A. Bail, Lisa P. Argyle, Taylor W. Brown, John P. Bumpus, Haohan Chen, M. B. Fallin Hunzaker, Jaemin Lee, Marcus Mann, Friedolin Merhout, and Alexander Volfovsky. 2018. https://doi.org/10.1073/pnas.1804840115 Exposure to opposing views on social media can increase political polarization . Proceedings of the National Academy of Sciences, 115(37...

work page doi:10.1073/pnas.1804840115 2018
[31]

Build Up . 2025. https://howtobuildup.org/wp-content/uploads/2025/11/Polarization-footrpint-Europe-report-.pdf Polarization footprint europe report . Technical report, Build Up

work page 2025
[32]

Rich Caruana. 1997. https://doi.org/10.1023/A:1007379606734 Multitask learning . Machine Learning, 28(1):41--75

work page doi:10.1023/a:1007379606734 1997
[33]

Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzm \'a n, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. https://doi.org/10.18653/v1/2020.acl-main.747 Unsupervised cross-lingual representation learning at scale . In Proceedings of the 58th Annual Meeting of the Association for Comp...

work page doi:10.18653/v1/2020.acl-main.747 2020
[34]

Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. https://doi.org/10.1609/icwsm.v11i1.14955 Automated hate speech detection and the problem of offensive language . In Proceedings of the Eleventh International AAAI Conference on Web and Social Media (ICWSM), pages 512--515

work page doi:10.1609/icwsm.v11i1.14955 2017
[35]

Shrey Desai and Greg Durrett. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.21 Calibration of pre-trained transformers . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 295--302, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.emnlp-main.21 2020
[36]

Jacob Devlin, Ming - Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/V1/N19-1423 BERT: pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, ...

work page doi:10.18653/v1/n19-1423 2019
[37]

Charles Elkan. 2001. https://doi.org/10.5555/1642194.1642224 The foundations of cost-sensitive learning . In Proceedings of the 17th International Joint Conference on Artificial Intelligence - Volume 2, IJCAI'01, page 973–978, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc

work page doi:10.5555/1642194.1642224 2001
[38]

Kiran Garimella, Gianmarco De Francisci Morales, Aristides Gionis, and Michael Mathioudakis. 2018. https://arxiv.org/abs/1801.01665 Political discourse on social media: Echo chambers, gatekeepers, and the price of bipartisanship . Preprint, arXiv:1801.01665

work page arXiv 2018
[39]

Haibo He and Edwardo A. Garcia. 2009. https://doi.org/10.1109/TKDE.2008.239 Learning from imbalanced data . IEEE Transactions on Knowledge and Data Engineering, 21(9):1263--1284

work page doi:10.1109/tkde.2008.239 2009
[40]

Pengcheng He, Jianfeng Gao, and Weizhu Chen. 2023. https://arxiv.org/abs/2111.09543 DeBERTaV3 : Improving DeBERTa using ELECTRA -style pre-training with gradient-disentangled embedding sharing . Preprint, arXiv:2111.09543

work page arXiv 2023
[41]

Jeffrey W. Howard. 2019. https://doi.org/10.1146/annurev-polisci-051517-012343 Free Speech and Hate Speech . Annual Review of Political Science, 22:93--109

work page doi:10.1146/annurev-polisci-051517-012343 2019
[42]

Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, and Melvin Johnson. 2020. https://arxiv.org/abs/2003.11080 XTREME : A massively multilingual multi-task benchmark for evaluating cross-lingual generalization . Preprint, arXiv:2003.11080

work page arXiv 2020
[43]

Meng Ji. 2023. https://doi.org/10.1017/9781108938976.005 Cultural and linguistic bias of neural machine translation technology . In Translation Technology in Accessible Health Communication, pages 100--128. Cambridge University Press

work page doi:10.1017/9781108938976.005 2023
[44]

Anne Lauscher, Vinit Ravishankar, Ivan Vuli \'c , and Goran Glava s . 2020. https://doi.org/10.18653/v1/2020.emnlp-main.363 From zero to hero: O n the limitations of zero-shot language transfer with multilingual T ransformers . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing ( EMNLP ) , pages 4483--4499, Online. A...

work page doi:10.18653/v1/2020.emnlp-main.363 2020
[45]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll \'a r. 2018. https://arxiv.org/abs/1708.02002 Focal loss for dense object detection . Preprint, arXiv:1708.02002

work page Pith review arXiv 2018
[46]

Usman Naseem, Robert Geislinger, Juan Ren, Sarah Kohail, Rudy Garrido Veliz, P Sam Sahil, Yiran Zhang, Marco Antonio Stranisci, Idris Abdulmumin, Özge Alacam, Cengiz Acarürk, Aisha Jabr, Saba Anwar, Abinew Ali Ayele, Elena Tutubalina, Aung Kyaw Htet, Xintong Wang, Surendrabikram Thapa, Tanmoy Chakraborty, Dheeraj Kodati, Sahar Moradizeyveh, Firoj Alam, Ye...

work page 2026
[47]

Usman Naseem, Robert Geislinger, Juan Ren, Sarah Kohail, Rudy Garrido Veliz, P Sam Sahil, Yiran Zhang, Marco Antonio Stranisci, Idris Abdulmumin, Özge Alacam, Cengiz Acartürk, Aisha Jabr, Saba Anwar, Abinew Ali Ayele, Simona Frenda, Alessandra Teresa Cignarella, Elena Tutubalina, Oleg Rogov, Aung Kyaw Htet, Xintong Wang, Surendrabikram Thapa, Kritesh Raun...

work page arXiv 2026
[48]

Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. https://doi.org/10.18653/v1/P19-1493 How multilingual is multilingual BERT ? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4996--5001, Florence, Italy. Association for Computational Linguistics

work page doi:10.18653/v1/p19-1493 2019
[49]

Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2009. https://doi.org/10.1007/978-3-642-04174-7_17 Classifier chains for multi-label classification . In Machine Learning and Knowledge Discovery in Databases, pages 254--269, Berlin, Heidelberg. Springer Berlin Heidelberg

work page doi:10.1007/978-3-642-04174-7_17 2009
[50]

Paul R \"o ttger, Bertie Vidgen, Dong Nguyen, Zeerak Waseem, Helen Margetts, and Janet Pierrehumbert. 2021. https://aclanthology.org/2021.acl-long.4 H ate C heck: Functional tests for hate speech detection models . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natu...

work page 2021
[51]

Phillip Rust, Jonas Pfeiffer, Ivan Vuli \'c , Sebastian Ruder, and Iryna Gurevych. 2021. https://doi.org/10.18653/v1/2021.acl-long.243 How good is your tokenizer? O n the monolingual performance of multilingual language models . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conf...

work page doi:10.18653/v1/2021.acl-long.243 2021
[52]

Vlahavas

Konstantinos Sechidis, Grigorios Tsoumakas, and Ioannis P. Vlahavas. 2011. https://doi.org/10.1007/978-3-642-23808-6_10 On the stratification of multi-label data . In ECML/PKDD

work page doi:10.1007/978-3-642-23808-6_10 2011
[53]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf Attention is all you need . In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc

work page 2017
[54]

Zeerak Waseem and Dirk Hovy. 2016. https://doi.org/10.18653/v1/N16-2013 Hateful symbols or hateful people? predictive features for hate speech detection on T witter . In Proceedings of the NAACL Student Research Workshop , pages 88--93, San Diego, California. Association for Computational Linguistics

work page doi:10.18653/v1/n16-2013 2016
[55]

Shijie Wu and Mark Dredze. 2019. https://doi.org/10.18653/v1/D19-1077 Beto, B entz, B ecas: The surprising cross-lingual effectiveness of BERT . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing ( EMNLP-IJCNLP ) , pages 833--844, Hong Kong, Ch...

work page doi:10.18653/v1/d19-1077 2019
[56]

Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. 2020. https://proceedings.neurips.cc/paper_files/paper/2020/file/3fe78a8acf5fda99de95303940a2420c-Paper.pdf Gradient surgery for multi-task learning . In Advances in Neural Information Processing Systems, volume 33, pages 5824--5836. Curran Associates, Inc

work page 2020