Task-Routed Mixture-of-Experts with Cognitive Appraisal for Implicit Sentiment Analysis

Haoran Xie; Joe S. Qin; Yaping Chai

arxiv: 2605.20916 · v1 · pith:ZKDXGI2Jnew · submitted 2026-05-20 · 💻 cs.CL

Task-Routed Mixture-of-Experts with Cognitive Appraisal for Implicit Sentiment Analysis

Yaping Chai , Haoran Xie , Joe S. Qin This is my paper

Pith reviewed 2026-05-21 04:38 UTC · model grok-4.3

classification 💻 cs.CL

keywords implicit sentiment analysismixture of expertsmulti-task learningcognitive appraisaltask routingencoder-decodersentiment classification

0 comments

The pith

Task-routed mixture-of-experts with cognitive appraisal tasks improves implicit sentiment analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that an appraisal-aware multi-task learning framework can improve implicit sentiment analysis by adding auxiliary tasks for detection and rationale generation while using mixture-of-experts routing to limit interference. Standard models learn only from final polarity labels, which give weak signals when sentiment must be inferred from events or context rather than explicit opinion words. The approach replaces selected blocks in an encoder-decoder backbone with task-level mixtures, where task identity drives a conditioned router and a separated routing objective encourages each task to select distinct expert combinations. Experiments report gains over recent methods, particularly on the implicit sentiment subset.

Core claim

Motivated by cognitive appraisal theory, the authors propose an appraisal-aware multi-task learning framework for implicit sentiment analysis that supplies polarity prediction with two auxiliary tasks: implicit sentiment detection and cognitive rationale generation. To avoid interference when multiple objectives share one backbone, they introduce task-level mixture-of-experts models in which all tasks share a common expert pool and task identity controls the sparse expert selection. The method replaces a subset of encoder and decoder blocks with these mixtures, employs a task-conditioned router, and adds a task-separated routing objective that pushes different tasks toward distinct selection

What carries the argument

Task-level mixture-of-experts in which task identity selects sparse expert combinations via a task-conditioned router and a task-separated routing objective within an encoder-decoder architecture.

If this is right

The model outperforms recently proposed approaches on implicit sentiment analysis.
Gains are strongest on the implicit sentiment subset where sentiment must be inferred from context.
Auxiliary tasks of detection and rationale generation supply additional signals for polarity reasoning.
Task-conditioned sparse routing limits negative transfer among related but distinct objectives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same routing pattern could reduce interference in other multi-task NLP setups that combine detection, generation, and classification objectives.
Cognitive-rationale generation may prove useful for any inference task where models must articulate unstated information before predicting a label.
Conditioning expert selection on task identity offers a general knob for controlling capacity allocation when objectives compete.

Load-bearing premise

The two auxiliary tasks supply complementary guidance that improves reasoning about sentiment from context, and task-conditioned routing plus a task-separated routing objective will reduce interference among the objectives.

What would settle it

An ablation that removes the task-separated routing objective or drops one or both auxiliary tasks and measures whether accuracy on the implicit sentiment subset stops improving or declines.

Figures

Figures reproduced from arXiv: 2605.20916 by Haoran Xie, Joe S. Qin, Yaping Chai.

**Figure 1.** Figure 1: Overview of our framework. A. Multi-task Data Construction: each aspect-level instance is formulated into three text-to-text tasks. B. Rationale [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Effect of the hyperparameter λsep on two benchmarks, where performance is measured by F1 score. Both datasets achieve their best performance at λsep = 0.4, and all λ values remain above the THOR baseline. We mark the highest value with a ⋆ [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Routing entropy H(·) for the three tasks on Rest14 and Lap14 under different λ. Each point is the mean over routed MoE layers. outperforms THOR [5] in all λ settings, demonstrating the effectiveness of our method. Both benchmarks achieve their best performance at an intermediate value of λsep = 0.4, indicating that a balanced routing separation weight improves model performance. If λsep is too small, tasks… view at source ↗

**Figure 4.** Figure 4: An example of the routing probabilities heatmap at the MoE decoder’s [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Implicit sentiment analysis is challenging because sentiment toward an aspect is often inferred from events rather than expressed through explicit opinion words. Existing models typically learn from the final polarity label, which provides limited guidance for reasoning about sentiment from the context. Motivated by cognitive appraisal theory, we propose an appraisal-aware multi-task learning (MTL) framework for implicit sentiment analysis that provides polarity prediction with two complementary auxiliary tasks: implicit sentiment detection and cognitive rationale generation. However, training several objectives with different targets and sharing a single backbone across tasks in MTL limits flexibility and can lead to task interference. To reduce interference among these related but distinct objectives, we adopt task-level mixture-of-experts models in which all tasks share a common set of experts, and task identity controls the sparse combination of these experts. Our method builds on an encoder-decoder architecture and replaces a subset of encoder and decoder blocks with these sparse mixtures. We use a task-conditioned router to select sparse expert mixtures for each task, and a task-separated routing objective to encourage different tasks to learn distinct expert-selection patterns. Experimental results show that our model outperforms recently proposed approaches, with strong gains on the implicit sentiment subset. Our code is available at https://github.com/yaping166/TRMoE-ISA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pairs cognitive-appraisal auxiliary tasks with task-routed MoE and a separated routing loss to cut interference in implicit sentiment analysis, but the ablations do not yet isolate whether the routing itself produces the reported lift.

read the letter

The core move here is to take an encoder-decoder backbone, add two auxiliary heads motivated by cognitive appraisal (implicit sentiment detection and rationale generation), and then replace selected blocks with task-conditioned sparse MoE layers plus a routing objective that pushes different tasks toward distinct expert selections. That combination is not in the prior implicit-sentiment or standard MoE literature they cite, so the architectural recipe counts as new. The motivation is clear: shared-backbone MTL often produces interference when the objectives pull in slightly different directions, and letting task identity control which experts fire is a direct way to give each objective its own subspace without duplicating the whole model. They report stronger results on the implicit subset than recent baselines, which aligns with the goal of better context reasoning when explicit opinion words are absent. The code release is also a plus for anyone who wants to check the implementation details. The main gap is the one the stress-test flags. Nothing in the abstract or the described experiments shows an ablation that keeps the auxiliary tasks and capacity but removes or replaces the task-separated routing objective. Without that control, it remains possible that the gains come mainly from the extra supervision signals or from the added parameters rather than from the routing separation. If the full paper has only overall comparisons and no routing-pattern analysis or capacity-matched baselines, the claim that the MoE machinery is what reduces interference stays under-supported. Minor issues include the usual need for error analysis on the implicit cases and clearer reporting of how the router is trained. This is the kind of incremental architecture paper that would interest people working on multi-task sentiment or sparse models in NLP. A reader already running MTL baselines on implicit datasets could pick up the routing trick and test it quickly. It is grounded enough and addresses a real practical pain point, so it should go to referees rather than a desk reject; the reviewers will likely ask for the missing ablation but the underlying idea is worth checking.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces a Task-Routed Mixture-of-Experts with Cognitive Appraisal for Implicit Sentiment Analysis. It builds an appraisal-aware multi-task learning framework incorporating two auxiliary tasks—implicit sentiment detection and cognitive rationale generation—alongside the main polarity prediction task. To address potential task interference in a shared encoder-decoder backbone, the authors replace select blocks with sparse task-level mixture-of-experts layers controlled by a task-conditioned router and trained with a task-separated routing objective to foster distinct expert selection patterns per task. The paper reports that this model outperforms recently proposed approaches, with particularly strong gains on the implicit sentiment analysis subset.

Significance. If the performance gains are robust and specifically attributable to the task-routed MoE components rather than the auxiliary tasks or other factors, the work could provide a useful technique for managing multiple related objectives in NLP models without interference. The grounding in cognitive appraisal theory adds an interesting interdisciplinary angle to task design for implicit sentiment tasks. Releasing the code is a positive step for reproducibility.

major comments (1)

[Experimental section] The central claim of strong gains on the implicit-sentiment subset is attributed to the appraisal-aware MTL combined with task-routed MoE that reduces interference via the task-conditioned router and task-separated routing objective. However, the manuscript does not include an ablation that isolates the effect of the task-separated routing objective against a standard multi-task learning baseline with the same auxiliary tasks. Without this, it remains unclear whether the routing machinery is the key driver of improvement or if the gains could be achieved through the auxiliary tasks alone, which is load-bearing for the paper's specific contribution.

minor comments (1)

[Abstract] The abstract claims 'strong gains' without providing any numerical results or specific baseline names, limiting the reader's ability to immediately gauge the magnitude of the reported improvements.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the contribution of our task-routed MoE design. We address the major comment below and commit to strengthening the experimental section accordingly.

read point-by-point responses

Referee: [Experimental section] The central claim of strong gains on the implicit-sentiment subset is attributed to the appraisal-aware MTL combined with task-routed MoE that reduces interference via the task-conditioned router and task-separated routing objective. However, the manuscript does not include an ablation that isolates the effect of the task-separated routing objective against a standard multi-task learning baseline with the same auxiliary tasks. Without this, it remains unclear whether the routing machinery is the key driver of improvement or if the gains could be achieved through the auxiliary tasks alone, which is load-bearing for the paper's specific contribution.

Authors: We agree that isolating the contribution of the task-separated routing objective is important for substantiating our central claim. Our current ablations compare the full model against variants without the auxiliary tasks and without the MoE layers, but we do not directly contrast the task-separated routing objective against a plain MTL baseline that shares the same auxiliary tasks and backbone. We will add this ablation in the revised version, reporting performance on both the full test set and the implicit-sentiment subset, along with router utilization statistics to show distinct expert selection patterns. This will clarify whether the routing mechanism provides gains beyond the auxiliary tasks alone. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external benchmarks, not self-referential derivations

full rationale

The paper proposes an appraisal-aware MTL framework augmented with task-routed MoE, using a task-conditioned router and task-separated routing objective to mitigate interference. All performance claims are grounded in experimental comparisons against external baselines on implicit sentiment datasets, with no mathematical derivation, fitted parameter renamed as prediction, or self-citation chain that reduces the central result to its own inputs by construction. The auxiliary tasks and routing design are presented as architectural choices whose value is assessed via ablation-style experiments and outperformance metrics, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that cognitive appraisal theory supplies useful auxiliary signals for implicit sentiment; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Cognitive appraisal theory supplies two complementary auxiliary tasks (implicit sentiment detection and cognitive rationale generation) that improve reasoning about sentiment from context.
Invoked in the motivation and framework description to justify the multi-task setup.

pith-pipeline@v0.9.0 · 5753 in / 1223 out tokens · 19976 ms · 2026-05-21T04:38:10.894539+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 3 internal anchors

[1]

Cognitive- inspired deep learning models for aspect-based sentiment analysis: A retrospective overview and bibliometric analysis,

X. Chen, H. Xie, S. J. Qin, Y . Chai, X. Tao, and F. L. Wang, “Cognitive- inspired deep learning models for aspect-based sentiment analysis: A retrospective overview and bibliometric analysis,”Cogn. Comput., vol. 16, no. 6, pp. 3518–3556, 2024

work page 2024
[2]

Learning implicit sentiment in aspect-based sentiment analysis with supervised contrastive pre-training,

Z. Li, Y . Zou, C. Zhang, Q. Zhang, and Z. Wei, “Learning implicit sentiment in aspect-based sentiment analysis with supervised contrastive pre-training,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, 2021, pp. 246–256

work page 2021
[3]

Relational graph attention network for aspect-based sentiment analysis,

K. Wang, W. Shen, Y . Yang, X. Quan, and R. Wang, “Relational graph attention network for aspect-based sentiment analysis,” inProceedings of the 58th annual meeting of the association for computational linguistics, 2020, pp. 3229–3238

work page 2020
[4]

Aspect- based sentiment analysis with explicit sentiment augmentations,

J. Ouyang, Z. Yang, S. Liang, B. Wang, Y . Wang, and X. Li, “Aspect- based sentiment analysis with explicit sentiment augmentations,” in Proceedings of the AAAI conference on artificial intelligence, vol. 38, no. 17, 2024, pp. 18 842–18 850

work page 2024
[5]

Reasoning implicit sentiment with chain-of-thought prompting,

H. Fei, B. Li, Q. Liu, L. Bing, F. Li, and T.-S. Chua, “Reasoning implicit sentiment with chain-of-thought prompting,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023, pp. 1171–1182

work page 2023
[6]

Implicit sentiment analysis based on chain of thought prompting,

Z. Duan and J. Wang, “Implicit sentiment analysis based on chain of thought prompting,” arXiv:2211.10986, 2024

work page arXiv 2024
[7]

Beyond text: Leveraging multi- task learning and cognitive appraisal theory for post-purchase intention analysis,

G. Yeo, S. Furniturewala, and K. Jaidka, “Beyond text: Leveraging multi- task learning and cognitive appraisal theory for post-purchase intention analysis,” inFindings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 12 353–12 360

work page 2024
[8]

R. S. Lazarus,Emotion and Adaptation. New York: Oxford University Press, 1991

work page 1991
[9]

Appraisal considered as a process of multilevel sequen- tial checking,

K. R. Scherer, “Appraisal considered as a process of multilevel sequen- tial checking,” inAppraisal Processes in Emotion: Theory, Methods, Research, K. R. Scherer, A. Schorr, and T. Johnstone, Eds. New York: Oxford University Press, 2001, pp. 92–120

work page 2001
[10]

Which tasks should be learned together in multi-task learning?

T. Standley, A. R. Zamir, D. Chen, L. Guibas, J. Malik, and S. Savarese, “Which tasks should be learned together in multi-task learning?” in Proceedings of the 37th International Conference on Machine Learning (ICML), ser. Proceedings of Machine Learning Research, vol. 119. PMLR, 2020, pp. 9120–9132

work page 2020
[11]

Mitigating task interference in multi-task learning via explicit task routing with non-learnable primitives,

C. Ding, Z. Lu, S. Wang, R. Cheng, and V . N. Boddeti, “Mitigating task interference in multi-task learning via explicit task routing with non-learnable primitives,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7756–7765

work page 2023
[12]

arXiv preprint arXiv:2503.07137 , year=

S. Mu and S. Lin, “A comprehensive survey of mixture-of-experts: Algorithms, theory, and applications,” arXiv:2503.07137, 2025

work page arXiv 2025
[13]

A survey on mixture of experts in large language models,

W. Cai, J. Jiang, F. Wang, J. Tang, S. Kim, and J. Huang, “A survey on mixture of experts in large language models,”IEEE Transactions on Knowledge and Data Engineering, 2025

work page 2025
[14]

Mod-Squad: Designing mixtures of experts as modular multi- task learners,

Z. Chen, Y . Shen, M. Ding, Z. Chen, H. Zhao, E. G. Learned-Miller, and C. Gan, “Mod-Squad: Designing mixtures of experts as modular multi- task learners,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 11 828–11 837

work page 2023
[15]

Eliciting and understanding cross-task skills with task-level mixture-of-experts,

Q. Ye, J. Zha, and X. Ren, “Eliciting and understanding cross-task skills with task-level mixture-of-experts,” inFindings of the Association for Computational Linguistics: EMNLP 2022. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, 2022, pp. 2567– 2592

work page 2022
[16]

Task-aware contrastive mixture of experts for quadruple extraction in conversations with code-like replies and non-opinion detection,

C. He, F. Gao, H. Liu, S. Zhu, Y . Jia, H. Zan, and M. Peng, “Task-aware contrastive mixture of experts for quadruple extraction in conversations with code-like replies and non-opinion detection,” inProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Asso- ciation for Computational Linguistics: Human Language Technologies (NAA...

work page 2025
[17]

Scaling instruction-finetuned language models,

H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y . Tay, W. Fedus, Y . Li, X. Wang, M. Dehghani, S. Brahmaet al., “Scaling instruction-finetuned language models,”J. Mach. Learn. Res., vol. 25, no. 70, pp. 1–53, 2024

work page 2024
[18]

A contrastive cross-channel data augmentation framework for aspect-based sentiment analysis,

B. Wang, L. Ding, Q. Zhong, X. Li, and D. Tao, “A contrastive cross-channel data augmentation framework for aspect-based sentiment analysis,” inProceedings of the 29th international conference on com- putational linguistics, 2022, pp. 6691–6704

work page 2022
[19]

Text data augmentation for large language models: a comprehensive survey of methods, challenges, and opportunities,

Y . Chai, H. Xie, and S. J. Qin, “Text data augmentation for large language models: a comprehensive survey of methods, challenges, and opportunities,”Artif. Intell. Rev., vol. 59, no. 1, p. 35, 2026

work page 2026
[20]

Causal intervention improves implicit sentiment analysis,

S. Wang, J. Zhou, C. Sun, J. Ye, T. Gui, Q. Zhang, and X.-J. Huang, “Causal intervention improves implicit sentiment analysis,” inProceed- ings of the 29th international conference on computational linguistics, 2022, pp. 6966–6977

work page 2022
[21]

Multitask learning,

R. Caruana, “Multitask learning,”Machine learning, vol. 28, no. 1, pp. 41–75, 1997

work page 1997
[22]

Adapt or get left behind: Domain adaptation through bert language model finetuning for aspect- target sentiment classification,

A. Rietzler, S. Stabinger, P. Opitz, and S. Engl, “Adapt or get left behind: Domain adaptation through bert language model finetuning for aspect- target sentiment classification,” inProceedings of the twelfth language resources and evaluation conference, 2020, pp. 4933–4941

work page 2020
[23]

Multi-task learning with llms for implicit sentiment analysis: Data-level and task-level automatic weight learning,

W. Lai, H. Xie, G. Xu, and Q. Li, “Multi-task learning with llms for implicit sentiment analysis: Data-level and task-level automatic weight learning,”IEEE Transactions on Knowledge and Data Engineering, 2025. 8

work page 2025
[24]

Switch Transformers: Scaling to trillion parameter models with simple and efficient sparsity,

W. Fedus, B. Zoph, and N. Shazeer, “Switch Transformers: Scaling to trillion parameter models with simple and efficient sparsity,”Journal of Machine Learning Research, vol. 23, no. 120, pp. 1–39, 2022

work page 2022
[25]

Outrageously large neural networks: The sparsely- gated mixture-of-experts layer,

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V . Le, G. E. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely- gated mixture-of-experts layer,” inProceedings of the 5th International Conference on Learning Representations (ICLR), 2017

work page 2017
[26]

Uni-moe: Scaling unified multimodal llms with mixture of experts,

Y . Li, S. Jiang, B. Hu, L. Wang, W. Zhong, W. Luo, L. Ma, and M. Zhang, “Uni-moe: Scaling unified multimodal llms with mixture of experts,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2025

work page 2025
[27]

Semeval-2014 task 4: Aspect based sentiment analysis,

M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androut- sopoulos, and S. Manandhar, “Semeval-2014 task 4: Aspect based sentiment analysis,” inProceedings of the 8th International Workshop on Semantic Evaluation, SemEval@COLING 2014, Dublin, Ireland, August 23-24, 2014. The Association for Computer Linguistics, 2014, pp. 27– 35

work page 2014
[28]

Bert: Pre-training of deep bidirectional transformers for language understanding,

J. Devlin, M. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” inProc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Lang. Technol.Assoc. Comput. Linguistics, 2019, pp. 4171–4186

work page 2019
[29]

OpenAI GPT-5 System Card

A. Singh, A. Fry, A. Perelman, A. Tart, A. Ganesh, A. El-Kishky, A. McLaughlin, A. Low, A. Ostrow, A. Ananthramet al., “Openai gpt-5 system card,” arXiv:2601.03267, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

A. Liu, A. Mei, B. Lin, B. Xue, B. Wang, B. Xu, B. Wu, B. Zhang, C. Lin, C. Donget al., “Deepseek-v3. 2: Pushing the frontier of open large language models,” arXiv:2512.02556, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[31]

The Llama 3 Herd of Models

A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughanet al., “The llama 3 herd of models,” arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[32]

Instructabsa: Instruction learning for aspect based sentiment analysis,

K. Scaria, H. Gupta, S. Goyal, S. Sawant, S. Mishra, and C. Baral, “Instructabsa: Instruction learning for aspect based sentiment analysis,” inProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), 2024, pp. 720–736. Yaping Chaiis a Ph.D. cand...

work page 2024

[1] [1]

Cognitive- inspired deep learning models for aspect-based sentiment analysis: A retrospective overview and bibliometric analysis,

X. Chen, H. Xie, S. J. Qin, Y . Chai, X. Tao, and F. L. Wang, “Cognitive- inspired deep learning models for aspect-based sentiment analysis: A retrospective overview and bibliometric analysis,”Cogn. Comput., vol. 16, no. 6, pp. 3518–3556, 2024

work page 2024

[2] [2]

Learning implicit sentiment in aspect-based sentiment analysis with supervised contrastive pre-training,

Z. Li, Y . Zou, C. Zhang, Q. Zhang, and Z. Wei, “Learning implicit sentiment in aspect-based sentiment analysis with supervised contrastive pre-training,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, 2021, pp. 246–256

work page 2021

[3] [3]

Relational graph attention network for aspect-based sentiment analysis,

K. Wang, W. Shen, Y . Yang, X. Quan, and R. Wang, “Relational graph attention network for aspect-based sentiment analysis,” inProceedings of the 58th annual meeting of the association for computational linguistics, 2020, pp. 3229–3238

work page 2020

[4] [4]

Aspect- based sentiment analysis with explicit sentiment augmentations,

J. Ouyang, Z. Yang, S. Liang, B. Wang, Y . Wang, and X. Li, “Aspect- based sentiment analysis with explicit sentiment augmentations,” in Proceedings of the AAAI conference on artificial intelligence, vol. 38, no. 17, 2024, pp. 18 842–18 850

work page 2024

[5] [5]

Reasoning implicit sentiment with chain-of-thought prompting,

H. Fei, B. Li, Q. Liu, L. Bing, F. Li, and T.-S. Chua, “Reasoning implicit sentiment with chain-of-thought prompting,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023, pp. 1171–1182

work page 2023

[6] [6]

Implicit sentiment analysis based on chain of thought prompting,

Z. Duan and J. Wang, “Implicit sentiment analysis based on chain of thought prompting,” arXiv:2211.10986, 2024

work page arXiv 2024

[7] [7]

Beyond text: Leveraging multi- task learning and cognitive appraisal theory for post-purchase intention analysis,

G. Yeo, S. Furniturewala, and K. Jaidka, “Beyond text: Leveraging multi- task learning and cognitive appraisal theory for post-purchase intention analysis,” inFindings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 12 353–12 360

work page 2024

[8] [8]

R. S. Lazarus,Emotion and Adaptation. New York: Oxford University Press, 1991

work page 1991

[9] [9]

Appraisal considered as a process of multilevel sequen- tial checking,

K. R. Scherer, “Appraisal considered as a process of multilevel sequen- tial checking,” inAppraisal Processes in Emotion: Theory, Methods, Research, K. R. Scherer, A. Schorr, and T. Johnstone, Eds. New York: Oxford University Press, 2001, pp. 92–120

work page 2001

[10] [10]

Which tasks should be learned together in multi-task learning?

T. Standley, A. R. Zamir, D. Chen, L. Guibas, J. Malik, and S. Savarese, “Which tasks should be learned together in multi-task learning?” in Proceedings of the 37th International Conference on Machine Learning (ICML), ser. Proceedings of Machine Learning Research, vol. 119. PMLR, 2020, pp. 9120–9132

work page 2020

[11] [11]

Mitigating task interference in multi-task learning via explicit task routing with non-learnable primitives,

C. Ding, Z. Lu, S. Wang, R. Cheng, and V . N. Boddeti, “Mitigating task interference in multi-task learning via explicit task routing with non-learnable primitives,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7756–7765

work page 2023

[12] [12]

arXiv preprint arXiv:2503.07137 , year=

S. Mu and S. Lin, “A comprehensive survey of mixture-of-experts: Algorithms, theory, and applications,” arXiv:2503.07137, 2025

work page arXiv 2025

[13] [13]

A survey on mixture of experts in large language models,

W. Cai, J. Jiang, F. Wang, J. Tang, S. Kim, and J. Huang, “A survey on mixture of experts in large language models,”IEEE Transactions on Knowledge and Data Engineering, 2025

work page 2025

[14] [14]

Mod-Squad: Designing mixtures of experts as modular multi- task learners,

Z. Chen, Y . Shen, M. Ding, Z. Chen, H. Zhao, E. G. Learned-Miller, and C. Gan, “Mod-Squad: Designing mixtures of experts as modular multi- task learners,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 11 828–11 837

work page 2023

[15] [15]

Eliciting and understanding cross-task skills with task-level mixture-of-experts,

Q. Ye, J. Zha, and X. Ren, “Eliciting and understanding cross-task skills with task-level mixture-of-experts,” inFindings of the Association for Computational Linguistics: EMNLP 2022. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, 2022, pp. 2567– 2592

work page 2022

[16] [16]

Task-aware contrastive mixture of experts for quadruple extraction in conversations with code-like replies and non-opinion detection,

C. He, F. Gao, H. Liu, S. Zhu, Y . Jia, H. Zan, and M. Peng, “Task-aware contrastive mixture of experts for quadruple extraction in conversations with code-like replies and non-opinion detection,” inProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Asso- ciation for Computational Linguistics: Human Language Technologies (NAA...

work page 2025

[17] [17]

Scaling instruction-finetuned language models,

H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y . Tay, W. Fedus, Y . Li, X. Wang, M. Dehghani, S. Brahmaet al., “Scaling instruction-finetuned language models,”J. Mach. Learn. Res., vol. 25, no. 70, pp. 1–53, 2024

work page 2024

[18] [18]

A contrastive cross-channel data augmentation framework for aspect-based sentiment analysis,

B. Wang, L. Ding, Q. Zhong, X. Li, and D. Tao, “A contrastive cross-channel data augmentation framework for aspect-based sentiment analysis,” inProceedings of the 29th international conference on com- putational linguistics, 2022, pp. 6691–6704

work page 2022

[19] [19]

Text data augmentation for large language models: a comprehensive survey of methods, challenges, and opportunities,

Y . Chai, H. Xie, and S. J. Qin, “Text data augmentation for large language models: a comprehensive survey of methods, challenges, and opportunities,”Artif. Intell. Rev., vol. 59, no. 1, p. 35, 2026

work page 2026

[20] [20]

Causal intervention improves implicit sentiment analysis,

S. Wang, J. Zhou, C. Sun, J. Ye, T. Gui, Q. Zhang, and X.-J. Huang, “Causal intervention improves implicit sentiment analysis,” inProceed- ings of the 29th international conference on computational linguistics, 2022, pp. 6966–6977

work page 2022

[21] [21]

Multitask learning,

R. Caruana, “Multitask learning,”Machine learning, vol. 28, no. 1, pp. 41–75, 1997

work page 1997

[22] [22]

Adapt or get left behind: Domain adaptation through bert language model finetuning for aspect- target sentiment classification,

A. Rietzler, S. Stabinger, P. Opitz, and S. Engl, “Adapt or get left behind: Domain adaptation through bert language model finetuning for aspect- target sentiment classification,” inProceedings of the twelfth language resources and evaluation conference, 2020, pp. 4933–4941

work page 2020

[23] [23]

Multi-task learning with llms for implicit sentiment analysis: Data-level and task-level automatic weight learning,

W. Lai, H. Xie, G. Xu, and Q. Li, “Multi-task learning with llms for implicit sentiment analysis: Data-level and task-level automatic weight learning,”IEEE Transactions on Knowledge and Data Engineering, 2025. 8

work page 2025

[24] [24]

Switch Transformers: Scaling to trillion parameter models with simple and efficient sparsity,

W. Fedus, B. Zoph, and N. Shazeer, “Switch Transformers: Scaling to trillion parameter models with simple and efficient sparsity,”Journal of Machine Learning Research, vol. 23, no. 120, pp. 1–39, 2022

work page 2022

[25] [25]

Outrageously large neural networks: The sparsely- gated mixture-of-experts layer,

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V . Le, G. E. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely- gated mixture-of-experts layer,” inProceedings of the 5th International Conference on Learning Representations (ICLR), 2017

work page 2017

[26] [26]

Uni-moe: Scaling unified multimodal llms with mixture of experts,

Y . Li, S. Jiang, B. Hu, L. Wang, W. Zhong, W. Luo, L. Ma, and M. Zhang, “Uni-moe: Scaling unified multimodal llms with mixture of experts,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2025

work page 2025

[27] [27]

Semeval-2014 task 4: Aspect based sentiment analysis,

M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androut- sopoulos, and S. Manandhar, “Semeval-2014 task 4: Aspect based sentiment analysis,” inProceedings of the 8th International Workshop on Semantic Evaluation, SemEval@COLING 2014, Dublin, Ireland, August 23-24, 2014. The Association for Computer Linguistics, 2014, pp. 27– 35

work page 2014

[28] [28]

Bert: Pre-training of deep bidirectional transformers for language understanding,

J. Devlin, M. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” inProc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Lang. Technol.Assoc. Comput. Linguistics, 2019, pp. 4171–4186

work page 2019

[29] [29]

OpenAI GPT-5 System Card

A. Singh, A. Fry, A. Perelman, A. Tart, A. Ganesh, A. El-Kishky, A. McLaughlin, A. Low, A. Ostrow, A. Ananthramet al., “Openai gpt-5 system card,” arXiv:2601.03267, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

A. Liu, A. Mei, B. Lin, B. Xue, B. Wang, B. Xu, B. Wu, B. Zhang, C. Lin, C. Donget al., “Deepseek-v3. 2: Pushing the frontier of open large language models,” arXiv:2512.02556, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[31] [31]

The Llama 3 Herd of Models

A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughanet al., “The llama 3 herd of models,” arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[32] [32]

Instructabsa: Instruction learning for aspect based sentiment analysis,

K. Scaria, H. Gupta, S. Goyal, S. Sawant, S. Mishra, and C. Baral, “Instructabsa: Instruction learning for aspect based sentiment analysis,” inProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), 2024, pp. 720–736. Yaping Chaiis a Ph.D. cand...

work page 2024