arxiv: 2604.07066 · v1 · submitted 2026-04-08 · 💻 cs.CL

Recognition: unknown

SemEval-2026 Task 3: Dimensional Aspect-Based Sentiment Analysis (DimABSA)

Liang-Chih Yu , Jonas Becker , Shamsuddeen Hassan Muhammad , Idris Abdulmumin , Lung-Hao Lee , Ying-Lung Lin , Jin Wang , Jan Philip Wahle

show 9 more authors

Terry Ruas Natalia Loukachevitch Alexander Panchenko Ilseyar Alimova Lilian Wanzare Nelson Odhiambo Bela Gipp Kai-Wei Chang Saif M. Mohammad

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:18 UTC · model grok-4.3

classification 💻 cs.CL

keywords SemEvalAspect-Based Sentiment AnalysisDimensional SentimentValence-ArousalStance DetectionShared TaskRegressionNatural Language Processing

0 comments

The pith

The SemEval-2026 shared task models aspect-based sentiment and stance using continuous valence-arousal dimensions instead of categorical labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces SemEval-2026 Task 3 on Dimensional Aspect-Based Sentiment Analysis. It replaces traditional polarity categories with regression over valence and arousal values to capture finer sentiment gradations. The work adds a parallel track for Dimensional Stance Analysis that applies the same dimensional treatment to stance targets in public-issue texts. It defines three extraction subtasks plus a new continuous F1 metric that scores both structure and dimensional accuracy together. The task supplies data, baselines, and participant results to support further development of dimensional sentiment systems.

Core claim

The SemEval-2026 shared task on Dimensional Aspect-Based Sentiment Analysis improves traditional ABSA by modeling sentiment along valence-arousal dimensions rather than categorical polarity labels. Track A covers dimensional aspect sentiment regression, triplet extraction, and quadruplet extraction; Track B reformulates stance detection as valence-arousal regression over stance targets. A continuous F1 metric jointly evaluates structured output and dimensional accuracy. The organizers report baselines, top system performance, and design insights from 112 submissions.

What carries the argument

Valence-arousal (VA) dimensional regression applied to aspects and stance targets, evaluated with a continuous F1 metric that combines extraction structure and numeric VA accuracy.

If this is right

Aspect sentiment can be expressed as numeric coordinates rather than positive/negative/neutral classes.
Stance detection on political or climate topics becomes a regression problem in the same VA space.
A single continuous F1 score can rank systems on both extraction structure and dimensional precision.
Public-issue discourse gains a uniform dimensional treatment alongside consumer-review ABSA.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Dimensional outputs may integrate more naturally with psychological or physiological models of emotion.
Regression-based training could reduce label noise that arises when annotators force borderline cases into discrete categories.
The same VA framework might later support cross-lingual or multimodal extensions where categorical labels are harder to align.

Load-bearing premise

That representing sentiment and stance in continuous valence-arousal space will yield more useful models than traditional categorical labels for both consumer reviews and public-issue discourse.

What would settle it

An experiment on an existing downstream task such as review summarization or stance-based prediction that shows equivalent or worse performance when systems are trained on the new VA-annotated data versus standard categorical ABSA annotations.

Figures

Figures reproduced from arXiv: 2604.07066 by Alexander Panchenko, Bela Gipp, Idris Abdulmumin, Ilseyar Alimova, Jan Philip Wahle, Jin Wang, Jonas Becker, Kai-Wei Chang, Liang-Chih Yu, Lilian Wanzare, Lung-Hao Lee, Natalia Loukachevitch, Nelson Odhiambo, Saif M. Mohammad, Shamsuddeen Hassan Muhammad, Terry Ruas, Ying-Lung Lin.

**Figure 1.** Figure 1: Valence–Arousal (VA) space. sentiment polarity, individually or jointly. For example, given the sentence The food was excellent., an ABSA system is expected to extract the aspect term food, the opinion term excellent, assign the aspect category FOOD#QUALITY from a predefined set, and predict Positive sentiment polarity. Following the success of prior SemEval tasks (Pontiki et al., 2014, 2015, 2016), ABSA … view at source ↗

**Figure 2.** Figure 2: Countries of official affiliations of participants. Larger dots indicate more participants. A total of 24 countries are represented. and organized a Q&A session and a writing tutorial for junior researchers. Participants came from different parts of the world, as shown in [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Model architectures used by participants. 0 5 10 15 20 25 30 Count Bert Roberta Gemma Gemini GPT Llama Qwen 2 (7%) 2 (7%) 3 (10%) 4 (14%) 6 (21%) 6 (21%) 18 (62%) 1 (11%) 1 (11%) 0 (0%) 2 (22%) 1 (11%) 1 (11%) 6 (67%) Top LLMs used Track A Track B [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗

**Figure 4.** Figure 4: LLMs used by participants. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗

**Figure 5.** Figure 5: Training techniques used by participants. 0 10 20 30 40 Count Other Language-adaptive fine-tuning Partial fine-tuning No fine-tuning Task-adaptive finetuning Parameter-efficient fine-tuning Full fine-tuning 1 (2%) 2 (4%) 3 (6%) 6 (11%) 6 (11%) 23 (43%) 35 (65%) 2 (11%) 1 (5%) 2 (11%) 1 (5%) 2 (11%) 5 (26%) 13 (68%) Fine-tuning strategy Track A Track B [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: Fine-tuning strategies used by participants. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 7.** Figure 7: Prompting strategies used by participants. 0 10 20 30 40 50 60 Count yes no 6 (11%) 48 (89%) 1 (5%) 18 (95%) External data usage Track A Track B [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗

**Figure 8.** Figure 8: External data used by participants. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

read the original abstract

We present the SemEval-2026 shared task on Dimensional Aspect-Based Sentiment Analysis (DimABSA), which improves traditional ABSA by modeling sentiment along valence-arousal (VA) dimensions rather than using categorical polarity labels. To extend ABSA beyond consumer reviews to public-issue discourse (e.g., political, energy, and climate issues), we introduce an additional task, Dimensional Stance Analysis (DimStance), which treats stance targets as aspects and reformulates stance detection as regression in the VA space. The task consists of two tracks: Track A (DimABSA) and Track B (DimStance). Track A includes three subtasks: (1) dimensional aspect sentiment regression, (2) dimensional aspect sentiment triplet extraction, and (3) dimensional aspect sentiment quadruplet extraction, while Track B includes only the regression subtask for stance targets. We also introduce a continuous F1 (cF1) metric to jointly evaluate structured extraction and VA regression. The task attracted more than 400 participants, resulting in 112 final submissions and 42 system description papers. We report baseline results, discuss top-performing systems, and analyze key design choices to provide insights into dimensional sentiment analysis at the aspect and stance-target levels. All resources are available on our GitHub repository.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a SemEval task proposal that sets up dimensional VA regression for ABSA plus a stance extension, but the improvement claim over categorical labels rests on an assumption without supporting comparisons in the paper.

read the letter

This paper proposes a new SemEval-2026 shared task on Dimensional Aspect-Based Sentiment Analysis. The core idea is to move from discrete polarity labels to continuous valence-arousal regression while adding structured extraction subtasks and a stance version for public-issue texts. That setup, plus the continuous F1 metric, is the actual new piece. The authors also ran the task, collected over 400 participants, 112 submissions, and 42 system papers, and they release baselines and data on GitHub. That level of organization and community uptake is useful for anyone who wants standardized benchmarks in this corner of sentiment work. The tracks are clearly laid out: Track A covers regression, triplet, and quadruplet extraction under VA dimensions; Track B applies the regression subtask to stance targets. The cF1 metric tries to score both the extraction structure and the numeric VA values in one go. Those choices look reasonable on paper and give participants concrete things to optimize. The main soft spot is the repeated claim that the VA approach improves traditional ABSA. The manuscript defines the tasks and reports baselines, but it contains no controlled comparison on the same texts and aspects that shows higher human agreement, better downstream performance, or richer insights than a categorical setup. The improvement is presented as motivation rather than a result backed by evidence internal to this document. That is common in task-description papers, but it leaves the central rationale untested here. This paper is mainly for researchers who run or participate in shared tasks on sentiment and stance, or who want to build systems against the new data and metric. It is not a methods paper with novel algorithms or large-scale findings. It deserves a serious referee because SemEval task papers help set evaluation standards for the field, and reviewers can usefully check the task design, metric, and data quality even if they do not expect empirical proof of superiority. I would send it to review with that scope in mind.

Referee Report

2 major / 2 minor

Summary. The manuscript presents the SemEval-2026 shared task on Dimensional Aspect-Based Sentiment Analysis (DimABSA). It claims to improve traditional ABSA by replacing categorical polarity labels with continuous valence-arousal (VA) regression, introduces DimStance for stance targets in public-issue discourse, defines two tracks (Track A with regression/triplet/quadruplet subtasks; Track B with regression only), a continuous F1 (cF1) metric, and reports baselines, top systems, and design insights from 112 submissions by 400+ participants.

Significance. If the dimensional VA formulation yields richer aspect-level representations than categorical labels, the task could advance nuanced sentiment and stance analysis beyond consumer reviews into political and issue-based discourse. The release of resources, large participation, and analysis of system design choices provide a useful benchmark and community resource for the field.

major comments (2)

[Abstract, §1] Abstract and §1: The central claim that DimABSA 'improves traditional ABSA by modeling sentiment along valence-arousal (VA) dimensions rather than using categorical polarity labels' is not supported by any controlled comparison. The manuscript defines the task, subtasks, and cF1 metric and reports participant results, but contains no experiments measuring whether VA regression produces higher downstream utility, better human agreement, or richer insights than an equivalent categorical ABSA setup on the same aspects and texts.
[§4] §4 (or metric definition section): The continuous F1 (cF1) metric is introduced to jointly evaluate structured extraction and VA regression, but the manuscript does not provide a formal justification or sensitivity analysis showing why this particular continuous formulation is preferable to standard F1 on discretized VA bins or to existing ABSA metrics; this choice is load-bearing for all reported results.

minor comments (2)

[§3] Ensure all dataset statistics (e.g., number of aspects, VA distribution) are reported with exact counts and splits in a dedicated table.
[§2] Clarify whether the DimStance track reuses the same texts and annotation guidelines as DimABSA or introduces new data; this affects claims about extending to public-issue discourse.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed review of our SemEval-2026 DimABSA task paper. We respond to the major comments below, noting that this is a shared task description rather than an empirical methods paper.

read point-by-point responses

Referee: [Abstract, §1] The central claim that DimABSA 'improves traditional ABSA by modeling sentiment along valence-arousal (VA) dimensions rather than using categorical polarity labels' is not supported by any controlled comparison. The manuscript defines the task, subtasks, and cF1 metric and reports participant results, but contains no experiments measuring whether VA regression produces higher downstream utility, better human agreement, or richer insights than an equivalent categorical ABSA setup on the same aspects and texts.

Authors: This manuscript is the official description of the SemEval-2026 shared task. The phrasing in the abstract and §1 is motivational, drawing on prior literature showing dimensional models capture nuance better than categorical labels. We do not include a controlled comparison because the paper's scope is task definition, resource release, and analysis of the 112 submissions from 400+ participants. The scale of participation and design insights provide community-level validation. We will revise the introduction to clarify that direct comparative experiments are encouraged as future work using the released data. revision: partial
Referee: [§4] The continuous F1 (cF1) metric is introduced to jointly evaluate structured extraction and VA regression, but the manuscript does not provide a formal justification or sensitivity analysis showing why this particular continuous formulation is preferable to standard F1 on discretized VA bins or to existing ABSA metrics; this choice is load-bearing for all reported results.

Authors: We agree a more detailed justification strengthens the paper. cF1 was designed to preserve the continuous VA information without arbitrary discretization thresholds and to unify extraction and regression evaluation via distance-based partial matching. In the revision we will add a sensitivity analysis subsection (or appendix) comparing cF1 against binned F1 variants and standard ABSA metrics on the development and test sets to show robustness and rationale. revision: yes

Circularity Check

0 steps flagged

Task proposal paper contains no derivations, predictions, or self-referential reductions

full rationale

The manuscript is a shared-task description that defines subtasks (regression, triplet/quadruplet extraction), a cF1 metric, tracks, and reports baselines plus participant outcomes. No equations, fitted parameters, or predictions appear that could reduce to their own inputs by construction. The motivational claim that VA modeling 'improves' categorical ABSA is an untested assumption for task design, not a derived result. No self-citations function as load-bearing uniqueness theorems or ansatzes. The paper is self-contained as a definitional proposal and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a task description and introduces no free parameters, mathematical axioms, or postulated entities; it relies on standard practices in shared task organization and NLP evaluation.

pith-pipeline@v0.9.0 · 5605 in / 1118 out tokens · 61521 ms · 2026-05-10T17:18:40.149903+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 9 canonical work pages · 6 internal anchors

[2]

Rafif Alshawi, Amit Raj, Aleksey Kudelya, and Alexan- der Shirnin

Association for Computational Linguistics. Rafif Alshawi, Amit Raj, Aleksey Kudelya, and Alexan- der Shirnin. 2026. The Classics at SemEval-2026 9 Task 3: Combining Transformer Models and LLM- Generated Annotations for Dimensional Aspect- Based Sentiment Analysis. InProceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026), San ...

2026
[3]

Qwen Technical Report

Politikweli: A swahili-english code-switched twitter political misinformation classification dataset. InSpeech and Language Technologies for Low- Resource Languages, pages 3–17, Cham. Springer Nature Switzerland. Georgios Arampatzis and Avi Arampatzis. 2026. DUTH at SemEval-2026 Task 3: Multilingual Trans- former Models for Dimensional Stance Prediction A...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[4]

Knowmis-absa: an overview and a reference model for applications of sentiment analysis and aspect-based sentiment analysis.Artificial Intelli- gence Review, 55(7):5543–5574. A.J.W. De Vink, Filippos Karolos Ventirozos, Natalia Amat-Lefort, and Lifeng Han. 2026. QuadAI at SemEval-2026 Task 3: Ensemble Learning of Hy- brid RoBERTa and LLMs for Dimensional A...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[5]

LoRA: Low-Rank Adaptation of Large Language Models

NYCU Speech Lab at SemEval-2026 Task 3: Heterogeneous Model Ensemble with Adap- tive Weighted V oting for Dimensional Aspect Senti- ment Quadruplet Extraction. InProceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026), San Diego, California. Association for Computational Linguistics. 11 Edward J. Hu, Yelong Shen, Phillip Walli...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[6]

In Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026), San Diego, California

UNF-BMI at SemEval-2026 Task 3: Research Domain Criteria-Guided Large Language Models for Dimensional Aspect-Based Sentiment Analysis. In Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026), San Diego, California. Association for Computational Linguis- tics. Svetlana Kiritchenko, Saif Mohammad, and Moham- mad Salameh. 2016...

2026
[7]

Chinese emobank: Building valence-arousal resources for dimensional sentiment analysis.ACM Transactions on Asian and Low-Resource Language Information Processing, 21(4):65. Lung-Hao Lee, Liang-Chih Yu, Natalia Loukashe- vich, Ilseyar Alimova, Alexander Panchenko, Tzu- Mi Lin, Zhe-Yu Xu, Jian-Yu Zhou, Guangmin Zheng, Jin Wang, Sharanya Awasthi, Jonas Becke...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

Mohammad

RPI Team at SemEval-2026 Task 3: An LLM-Encoder Ensemble for Coarse-to-Fine Valence- Arousal Sentiment Prediction. InProceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026), San Diego, California. Association for Computational Linguistics. Saif Mohammad. 2018. Obtaining reliable human rat- ings of valence, arousal, and dominan...

work page arXiv 2026
[9]

Qwen2.5 Technical Report

AfriSenti: A Twitter sentiment analysis bench- mark for African languages. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13968–13981, Singa- pore. Association for Computational Linguistics. Shamsuddeen Hassan Muhammad, Nedjma Ousid- houm, Idris Abdulmumin, Jan Philip Wahle, Terry Ruas, Meriem Beloucif, Chr...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

InProceedings of the 9th International WorkshoponSemantic Evaluation, pages 451–463

SemEval-2015 task 10: Sentiment analysis in twitter. InProceedings of the 9th International WorkshoponSemantic Evaluation, pages 451–463. Zhihao Ruan, Kaifeng Yang, Cheng Chen, Wenwen Dai, and Wenjia Mao. 2026. PAI at SemEval-2026 Task 3: An LLM and Data Redistribution Adaptation-Based Predictive Strategy for Valence-Arousal Scores. In Proceedings of the ...

2015
[11]

Journal of the Association for Information Science and Technology, 63(1):163–173

Sentiment strength detection for the social web. Journal of the Association for Information Science and Technology, 63(1):163–173. Vishal Thenuwara, Widanalage Mario Yomal De Mel, and Nisansa De Silva. 2026. Team VYN at SemEval- 2026 Task 3: Dimensional Aspect-Based Sentiment Analysis. InProceedings of the 20th International Workshop on Semantic Evaluatio...

work page arXiv 2026
[12]

Qwen3 Technical Report

Takoyaki at SemEval-2026 Task 3: En- sembling LLM Predictions using Demonstration Retrieval for Dimensional Aspect-based Sentiment 14 Analysis. InProceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026), San Diego, California. Association for Computa- tional Linguistics. An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan ...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[13]

InProceedings of the 2025 Conference on Empirical Methods in Natural Lan- guage Processing, pages 580–595, Suzhou, China

T-MAD: Target-driven multimodal alignment for stance detection. InProceedings of the 2025 Conference on Empirical Methods in Natural Lan- guage Processing, pages 580–595, Suzhou, China. Association for Computational Linguistics. Chenye Zhao and Cornelia Caragea. 2024. EZ- STANCE: A large dataset for English zero-shot stance detection. InProceedings of the...

2025
[14]

InFindings of the Association for Computa- tional Linguistics: EMNLP 2025, pages 5337–5356

What media frames reveal about stance: A dataset and study about memes in climate change dis- course. InFindings of the Association for Computa- tional Linguistics: EMNLP 2025, pages 5337–5356. Yan Zhou, Wangshicheng Shicheng Wang, Shiquan Wang, Mengjiao Bao, Ruiyu Fang, Shuangyong Song, Yongxiang Li, and Xuelong Li. 2026a. TeleAI at SemEval-2026 Task 3: ...

work page arXiv 2025