pith. machine review for the scientific record. sign in

arxiv: 2604.18003 · v1 · submitted 2026-04-20 · 💻 cs.AI

Recognition: unknown

SELF-EMO: Emotional Self-Evolution from Recognition to Consistent Expression

Faqiang Qian, Kang An, Mengya Gao, Shaowei Zhang, Yan Chen, Yichao Wu, Yong Dai, Ziliang Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 04:59 UTC · model grok-4.3

classification 💻 cs.AI
keywords emotional self-evolutionemotion recognition in conversationself-playself-supervised learningreinforcement learningdialogue systemslarge language modelsconsistent expression
0
0 comments X

The pith

Language models can self-evolve both accurate emotion recognition and consistent responses by generating and filtering their own dialogue data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a self-evolution framework in which large language models improve emotional capabilities through repeated cycles of self-play instead of depending on static labeled datasets. The model alternates between recognizing emotions in a conversation and producing responses that match those emotions, creating fresh training trajectories at scale. A reward based on smoothed overlap between predicted and expressed emotions selects the best samples, which are then fed back to refine the model further. This setup targets the core limitation of scarce high-quality emotional data for conversational AI. If the loop holds, models gain both higher recognition accuracy and more coherent emotional tone without external annotators.

Core claim

SELF-EMO shows that a role-based self-play paradigm, in which the model simultaneously acts as emotion recognizer and dialogue responder, generates diverse conversational trajectories that a smoothed IoU-based reward can filter into high-quality self-supervised data; when optimized with the SELF-GRPO algorithm that incorporates multi-label alignment and group consistency, the process yields state-of-the-art emotion recognition accuracy on IEMOCAP, MELD, and EmoryNLP while producing more consistent emotional expressions.

What carries the argument

The data flywheel that uses a smoothed IoU-based reward to filter self-generated role-play trajectories and returns only the highest-quality samples for continued self-improvement.

If this is right

  • Large language models can generate and curate their own training data for emotional tasks at arbitrary scale.
  • Recognition accuracy on standard conversation datasets rises by several percentage points across model sizes.
  • The same self-play and filtering loop produces measurable gains in emotional consistency without added human labels.
  • The approach works for both 4B and 8B parameter models, indicating generalization beyond a single size.
  • Continuous self-improvement becomes possible because each iteration supplies new filtered trajectories for the next round.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same self-play structure could transfer to other dialogue skills where recognition and generation must stay aligned, such as intent tracking or sarcasm handling.
  • Over long conversations the method may reduce drift in emotional tone that currently appears in extended interactions with language models.
  • If the reward signal proves robust, similar autonomous loops could lower the cost of building socially aware AI systems in domains that lack large annotated corpora.
  • The framework suggests a route toward models that adapt their emotional style based on ongoing self-generated experience rather than fixed training sets.

Load-bearing premise

Better emotion prediction accuracy will directly produce more consistent and appropriate emotional expressions in the model's own generated responses.

What would settle it

A side-by-side comparison in which models trained under SELF-EMO achieve the reported accuracy gains on the three benchmarks yet human raters judge their dialogue responses as no more emotionally coherent or consistent than those of the original baseline models.

Figures

Figures reproduced from arXiv: 2604.18003 by Faqiang Qian, Kang An, Mengya Gao, Shaowei Zhang, Yan Chen, Yichao Wu, Yong Dai, Ziliang Wang.

Figure 1
Figure 1. Figure 1: Comparison of prior work and SELF-EMO, a multi-task self-evolution framework that unifies emotion recognition, emotional understanding, and emotion-consistent response generation within a closed-loop self-play paradigm. or auxiliary LLM supervision [6, 7, 8] improve in-distribution performance, but introduce a strong dependency on fixed datasets and additional models. As a result, these methods fail to gen… view at source ↗
Figure 2
Figure 2. Figure 2: The overall framework of SELF-EMO.During SELF-GRPO tuning, the model explicitly [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance improvement on two self-supervised tasks. DeepSeek and Seed serve as [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Personality Category. C Multi-task Training Metrics and Analysis C.1 Consistent Growth of Reward As shown in [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Steady Increase in Reward. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Entropy reduction dynamics during multi-task training. [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
read the original abstract

Emotion Recognition in Conversation (ERC) has become a fundamental capability for large language models (LLMs) in human-centric interaction. Beyond accurate recognition, coherent emotional expression is also crucial, yet both are limited by the scarcity and static nature of high-quality annotated data. In this work, we propose SELF-EMO, a self-evolution framework grounded in the hypothesis that better emotion prediction leads to more consistent emotional responses. We introduce two auxiliary tasks, emotional understanding and emotional expression, and design a role-based self-play paradigm where the model acts as both an emotion recognizer and a dialogue responder. Through iterative interactions, the model generates diverse conversational trajectories, enabling scalable data generation. To ensure quality, we adopt a data flywheel mechanism that filters candidate predictions and responses using a smoothed IoU-based reward and feeds selected samples back for continuous self-improvement without external supervision. We further develop SELF-GRPO, a reinforcement learning algorithm that stabilizes optimization with multi-label alignment rewards and group-level consistency signals. Experiments on IEMOCAP, MELD, and EmoryNLP show that SELF-EMO achieves state-of-the-art performance, improving accuracy by +6.33% on Qwen3-4B and +8.54% on Qwen3-8B, demonstrating strong effectiveness and generalization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes SELF-EMO, a self-evolution framework for LLMs that jointly improves emotion recognition in conversation (ERC) and consistent emotional expression. It introduces auxiliary tasks for emotional understanding and expression, employs role-based self-play to generate conversational trajectories, applies a data flywheel that filters samples via a smoothed IoU-based reward, and optimizes with the SELF-GRPO reinforcement learning algorithm using multi-label alignment and group-level consistency signals. The central claim is that this closed-loop process, without external supervision, yields state-of-the-art ERC performance on IEMOCAP, MELD, and EmoryNLP, with accuracy gains of +6.33% on Qwen3-4B and +8.54% on Qwen3-8B.

Significance. If the self-reinforcement mechanism can be shown to produce genuine consistency gains rather than amplified prediction errors, the approach would provide a scalable, annotation-free route to training emotionally coherent LLMs, directly addressing data scarcity in human-centric dialogue systems.

major comments (3)
  1. [Experiments] Experiments section (and abstract): the reported SOTA improvements lack any description of baselines, ablation controls, run counts, error bars, or statistical significance tests, leaving the quantitative claims unverifiable and the contribution of the self-evolution components unisolated.
  2. [§3.2] §3.2 (Data Flywheel mechanism): the smoothed IoU-based reward is computed solely from the model's own emotion predictions and generated responses; no external label, human judgment, or held-out validation set is used to break potential circular reinforcement of systematic mislabelings (e.g., consistent confusion between “frustrated” and “sad”).
  3. [§4] §4 (SELF-GRPO): the multi-label alignment rewards and group-level consistency signals are introduced at a high level without explicit equations or comparison to standard GRPO variants, so it is unclear how they prevent the optimization from simply reinforcing the model's initial biases rather than improving true emotional consistency.
minor comments (1)
  1. [Abstract] Abstract: the claimed accuracy deltas are given without the underlying metric (e.g., weighted F1, accuracy per emotion class) or the precise comparison models beyond the Qwen3 family.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major point below, indicating revisions where the manuscript will be updated to improve clarity and verifiability while preserving the core self-evolution approach.

read point-by-point responses
  1. Referee: [Experiments] Experiments section (and abstract): the reported SOTA improvements lack any description of baselines, ablation controls, run counts, error bars, or statistical significance tests, leaving the quantitative claims unverifiable and the contribution of the self-evolution components unisolated.

    Authors: We agree that the current manuscript provides insufficient experimental detail. In the revised version we will expand the Experiments section to explicitly list all baselines (including prior SOTA methods on IEMOCAP, MELD, and EmoryNLP), present ablation studies isolating the data flywheel, SELF-GRPO, and auxiliary tasks, report the number of independent runs, include error bars, and add statistical significance tests (e.g., paired t-tests or McNemar’s test) for the reported accuracy gains. These additions will make the claims verifiable and clarify the isolated contribution of each self-evolution component. revision: yes

  2. Referee: [§3.2] §3.2 (Data Flywheel mechanism): the smoothed IoU-based reward is computed solely from the model's own emotion predictions and generated responses; no external label, human judgment, or held-out validation set is used to break potential circular reinforcement of systematic mislabelings (e.g., consistent confusion between “frustrated” and “sad”).

    Authors: The absence of external labels is intentional, as the framework is designed to operate without additional supervision. The smoothed IoU reward measures consistency between the model’s own emotion predictions and the responses it generates within the same trajectory; the smoothing term and iterative filtering are meant to penalize inconsistent pairs and gradually improve both recognition and expression. We acknowledge the risk of reinforcing initial label confusions. In the revision we will add a dedicated paragraph in §3.2 explaining the smoothing mechanism in more detail, include an analysis of label-distribution stability across iterations on a held-out subset of the training dialogues, and discuss failure modes such as persistent confusion between similar emotions. No external labels will be introduced, as that would alter the annotation-free premise. revision: partial

  3. Referee: [§4] §4 (SELF-GRPO): the multi-label alignment rewards and group-level consistency signals are introduced at a high level without explicit equations or comparison to standard GRPO variants, so it is unclear how they prevent the optimization from simply reinforcing the model's initial biases rather than improving true emotional consistency.

    Authors: We will revise §4 to include the full mathematical definitions of the multi-label alignment reward (formulated as a sum of per-emotion binary cross-entropy terms aligned across the group) and the group-level consistency signal (computed as the variance of emotion-label distributions within each GRPO group). We will also add a direct comparison table and discussion contrasting SELF-GRPO with vanilla GRPO, highlighting how the group-level term discourages collapse to the model’s initial bias by enforcing intra-group agreement on both recognition and expression. These equations and comparisons will be placed in the main text rather than the appendix. revision: yes

Circularity Check

1 steps flagged

Self-generated trajectories filtered by model-derived IoU reward reduce self-improvement to internal outputs by construction

specific steps
  1. self definitional [Abstract]
    "we adopt a data flywheel mechanism that filters candidate predictions and responses using a smoothed IoU-based reward and feeds selected samples back for continuous self-improvement without external supervision"

    The reward and selection criteria are computed from the model's self-generated trajectories and emotion predictions (IoU between recognizer outputs and responder outputs), so the 'high-quality' subset fed back is defined in terms of the same model's internal quantities; the claimed self-evolution therefore reduces to its own inputs by construction.

full rationale

The core derivation chain in the abstract is a closed self-play loop: the model generates trajectories, applies its own smoothed IoU reward for filtering, and retrains on the selected samples. This is self-definitional because both the data and the quality signal are defined entirely in terms of the model's predictions and responses, with the 'improvement' therefore equivalent to reinforcing its own outputs rather than an independent external signal. Final benchmark numbers do not break the circularity in the training derivation itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on one explicit domain assumption and the unverified effectiveness of the internal reward and filtering steps; no free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption Better emotion prediction leads to more consistent emotional responses
    Explicitly stated as the grounding hypothesis of the framework.

pith-pipeline@v0.9.0 · 5549 in / 1269 out tokens · 46925 ms · 2026-05-10T04:59:14.927387+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 10 canonical work pages · 5 internal anchors

  1. [1]

    Gpt-4 technical report, 2024

    OpenAI. Gpt-4 technical report, 2024

  2. [2]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report. arXiv preprint arXiv:2505.09388, 2025

  3. [3]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

  4. [4]

    DeepSeek-V3 Technical Report

    Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437, 2024

  5. [5]

    Modeling both context-and speaker-sensitive dependence for emotion detection in multi-speaker conversations

    Dong Zhang, Liangqing Wu, Changlong Sun, Shoushan Li, Qiaoming Zhu, and Guodong Zhou. Modeling both context-and speaker-sensitive dependence for emotion detection in multi-speaker conversations. In IJCAI, pages 5415–5421. Macao, 2019

  6. [6]

    Instructerc: Reforming emotion recognition in conversation with multi-task retrieval-augmented large language models, 2024

    Shanglin Lei, Guanting Dong, Xiaoping Wang, Keheng Wang, Runqi Qiao, and Sirui Wang. Instructerc: Reforming emotion recognition in conversation with multi-task retrieval-augmented large language models, 2024

  7. [7]

    Do llms feel? teaching emotion recognition with prompts, retrieval, and curriculum learning, 2025

    Xinran Li, Yu Liu, Jiaqi Qiao, and Xiujuan Xu. Do llms feel? teaching emotion recognition with prompts, retrieval, and curriculum learning, 2025. 9

  8. [8]

    BiosERC: Integrating Biography Speakers Supported by LLMs for ERC Tasks, page 277–292

    Jieying Xue, Minh-Phuong Nguyen, Blake Matheny, and Le-Minh Nguyen. BiosERC: Integrating Biography Speakers Supported by LLMs for ERC Tasks, page 277–292. Springer Nature Switzerland, 2024

  9. [9]

    preprint arXiv:1506.00019 , year=

    Zachary C Lipton, John Berkowitz, and Charles Elkan. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019, 2015

  10. [10]

    The graph neural network model

    Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE transactions on neural networks, 20(1):61–80, 2008

  11. [11]

    Bert: a review of applications in natural language processing and understanding.arXiv preprint arXiv:2103.11943,

    Mikhail V Koroteev. Bert: a review of applications in natural language processing and under- standing. arXiv preprint arXiv:2103.11943, 2021

  12. [12]

    Dialoguernn: An attentive rnn for emotion detection in conversations, 2019

    Navonil Majumder, Soujanya Poria, Devamanyu Hazarika, Rada Mihalcea, Alexander Gelbukh, and Erik Cambria. Dialoguernn: An attentive rnn for emotion detection in conversations, 2019

  13. [13]

    Dialoguecrn: Contextual reasoning networks for emotion recognition in conversations

    Dou Hu, Lingwei Wei, and Xiaoyong Huai. Dialoguecrn: Contextual reasoning networks for emotion recognition in conversations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (V olume1: Long Papers), pages 7042–7052, 2021

  14. [14]

    Dialoguegcn: A graph convolutional neural network for emotion recognition in conversation, 2019

    Deepanway Ghosal, Navonil Majumder, Soujanya Poria, Niyati Chhaya, and Alexander Gelbukh. Dialoguegcn: A graph convolutional neural network for emotion recognition in conversation, 2019

  15. [15]

    Directed acyclic graph network for conversational emotion recognition, 2021

    Weizhou Shen, Siyue Wu, Yunyi Yang, and Xiaojun Quan. Directed acyclic graph network for conversational emotion recognition, 2021

  16. [16]

    Bert-erc: Fine-tuning bert is enough for emotion recognition in conversation

    Xiangyu Qin, Zhiyu Wu, Tingting Zhang, Yanran Li, Jian Luan, Bin Wang, Li Wang, and Jinshi Cui. Bert-erc: Fine-tuning bert is enough for emotion recognition in conversation. In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 13492–13500, 2023

  17. [17]

    Lora: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. Iclr, 1(2):3, 2022

  18. [18]

    Serl: Self-play reinforcement learning for large language models with limited data, 2026

    Wenkai Fang, Shunyu Liu, Yang Zhou, Kongcheng Zhang, Tongya Zheng, Kaixuan Chen, Mingli Song, and Dacheng Tao. Serl: Self-play reinforcement learning for large language models with limited data, 2026

  19. [19]

    A survey on self-play methods in reinforcement learning, 2025

    Ruize Zhang, Zelai Xu, Chengdong Ma, Chao Yu, Wei-Wei Tu, Wenhao Tang, Shiyu Huang, Deheng Ye, Wenbo Ding, Yaodong Yang, and Yu Wang. A survey on self-play methods in reinforcement learning, 2025

  20. [20]

    Spell: Self-play reinforcement learning for evolving long-context language models

    Ziyi Yang, Weizhou Shen, Chenliang Li, Ruijun Chen, Fanqi Wan, Ming Yan, Xiaojun Quan, and Fei Huang. Spell: Self-play reinforcement learning for evolving long-context language models. arXiv preprint arXiv:2509.23863, 2025

  21. [21]

    Absolute Zero: Reinforced Self-play Reasoning with Zero Data

    Andrew Zhao, Yiran Wu, Yang Yue, Tong Wu, Quentin Xu, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, and Gao Huang. Absolute zero: Reinforced self-play reasoning with zero data. arXiv preprint arXiv:2505.03335, 2025

  22. [22]

    Beyond pass@ 1: Self-play with variational problem synthesis sustains rlvr.arXiv preprint arXiv:2508.14029, 2025

    Xiao Liang, Zhongzhi Li, Yeyun Gong, Yelong Shen, Ying Nian Wu, Zhijiang Guo, and Weizhu Chen. Beyond pass@ 1: Self-play with variational problem synthesis sustains rlvr. arXiv preprint arXiv:2508.14029, 2025

  23. [23]

    Promptcot: Align prompt distribution via adapted chain-of-thought

    Junyi Yao, Yijiang Liu, Zhen Dong, Mingfei Guo, Helan Hu, Kurt Keutzer, Li Du, Daquan Zhou, and Shanghang Zhang. Promptcot: Align prompt distribution via adapted chain-of-thought. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7027–7037, 2024

  24. [24]

    Levels of emotional awareness: a cognitive- developmental theory and its application to psychopathology

    Richard D Lane and Gary E Schwartz. Levels of emotional awareness: a cognitive- developmental theory and its application to psychopathology. The American journal of psychiatry, 144(2):133–143, 1987

  25. [25]

    Affect regulation, mentalization and the development of the self

    Peter Fonagy, Gyorgy Gergely, and Elliot L Jurist. Affect regulation, mentalization and the development of the self. Routledge, 2018

  26. [26]

    Disorders of affect regulation: Alexithymia in medical and psychiatric illness

    Graeme J Taylor, R Michael Bagby, and James DA Parker. Disorders of affect regulation: Alexithymia in medical and psychiatric illness. Cambridge University Press, 1999. 10

  27. [27]

    The social power of regret: the effect of social appraisal and anticipated emotions on fair and unfair allocations in resource dilemmas

    Job van der Schalk, Toon Kuppens, Martin Bruder, and Antony SR Manstead. The social power of regret: the effect of social appraisal and anticipated emotions on fair and unfair allocations in resource dilemmas. Journal of Experimental Psychology: General, 144(1):151, 2015

  28. [28]

    One- shot entropy minimization,

    Zitian Gao, Lynx Chen, Haoming Luo, Joey Zhou, and Bryan Dai. One-shot entropy minimiza- tion. arXiv preprint arXiv:2505.20282, 2025

  29. [29]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024

  30. [30]

    Iemocap: Interactive emotional dyadic motion capture database

    Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42(4):335–359, 2008

  31. [31]

    Meld: A multimodal multi-party dataset for emotion recognition in con- versations

    Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihalcea. Meld: A multimodal multi-party dataset for emotion recognition in con- versations. In Proceedings of the 57th annual meeting of the association for computational linguistics, pages 527–536, 2019

  32. [32]

    Emotion detection on tv show transcripts with sequence- based convolutional neural networks

    Sayyed M Zahiri and Jinho D Choi. Emotion detection on tv show transcripts with sequence- based convolutional neural networks. In AAAI Workshops, volume 18, pages 44–52, 2018

  33. [33]

    ICON: Interactive conversational memory network for multimodal emotion detection

    Devamanyu Hazarika, Soujanya Poria, Rada Mihalcea, Erik Cambria, and Roger Zimmermann. ICON: Interactive conversational memory network for multimodal emotion detection. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2594–2604, Bru...

  34. [34]

    Cosmic: Commonsense knowledge for emotion identification in conversations, 2020

    Deepanway Ghosal, Navonil Majumder, Alexander Gelbukh, Rada Mihalcea, and Soujanya Poria. Cosmic: Commonsense knowledge for emotion identification in conversations, 2020

  35. [35]

    Mmgcn: Multimodal fusion via deep graph convolution network for emotion recognition in conversation, 2021

    Jingwen Hu, Yuchen Liu, Jinming Zhao, and Qin Jin. Mmgcn: Multimodal fusion via deep graph convolution network for emotion recognition in conversation, 2021

  36. [36]

    Lr-gcn: Latent relation- aware graph convolutional network for conversational emotion recognition

    Minjie Ren, Xiangdong Huang, Wenhui Li, Dan Song, and Weizhi Nie. Lr-gcn: Latent relation- aware graph convolutional network for conversational emotion recognition. IEEE Transactions on Multimedia, 24:4422–4432, 2022

  37. [37]

    Curriculum learning meets directed acyclic graph for multimodal emotion recognition, 2024

    Cam-Van Thi Nguyen, Cao-Bach Nguyen, Quang-Thuy Ha, and Duc-Trong Le. Curriculum learning meets directed acyclic graph for multimodal emotion recognition, 2024

  38. [38]

    Deep imbalanced learning for multimodal emotion recognition in conversations, 2023

    Tao Meng, Yuntao Shou, Wei Ai, Nan Yin, and Keqin Li. Deep imbalanced learning for multimodal emotion recognition in conversations, 2023

  39. [39]

    Der-gcn: Dialogue and event relation- aware graph convolutional neural network for multimodal dialogue emotion recognition, 2024

    Wei Ai, Yuntao Shou, Tao Meng, Nan Yin, and Keqin Li. Der-gcn: Dialogue and event relation- aware graph convolutional neural network for multimodal dialogue emotion recognition, 2024

  40. [40]

    Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

    Xinran Li, Xiujuan Xu, and Jiaqi Qiao. Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation. IOS Press, October 2025. 11 A Prompts In this section, we provide the prompts given to the expert models during the data cold-start stage, including the scenario&personality extraction prompt(Table4) an...

  41. [41]

    The analysis should consider (but is not limited to) the following aspects:

    Analysis Dimensions(Inference-based, no fabrication) For each speaker, infer their characteristics ONLY based on the dialogue content. The analysis should consider (but is not limited to) the following aspects:

  42. [42]

    20–30s",

    Personality traits (e.g., rational vs. emotional, introverted vs. extroverted, cautious vs. impulsive, gentle vs. assertive) 2. Estimated age range (use vague ranges such as "20–30s", "30+"; do NOT provide a specific age) 3. Possible gender (if unclear, explicitly state "gender not obvious") 4. Likely education or cognitive background (e.g., higher-educat...

  43. [43]

    - The sentence should naturally integrate multiple dimensions rather than listing attributes mechanically

    Personality Summary Requirement- For each speaker, compress the analysis into ONE concise but information-dense sentence. - The sentence should naturally integrate multiple dimensions rather than listing attributes mechanically. Example style (for reference only, do NOT copy): "The speaker appears to be a cautious and rational individual, likely in their ...

  44. [44]

    - Themes should be semantic-level summaries, not surface-level restatements

    Theme Extraction- Abstract one or more core themes from the entire dialogue. - Themes should be semantic-level summaries, not surface-level restatements. - If conflicts or disagreements exist, reflect them in the theme when appropriate

  45. [45]

    Speaker A Name

    Output Format (Critical)After completing the analysis, output a Python code block containing a dictionary in the following format: ... analysis... { "Speaker A Name": "One-sentence personality profile", "Speaker B Name": "One-sentence personality profile", ... "Theme": "Core theme(s) of the conversation" } Output rules: You should first analyze each speak...

  46. [46]

    This suggests she would respond in a warm, empathetic, and conversational way, likely acknowl- edging others’ feelings and keeping the tone light and friendly

    **Personality analysis:** Monica is described as expressive, socially proactive, and using casual, engaging questions to foster group connection and share emotional observations. This suggests she would respond in a warm, empathetic, and conversational way, likely acknowl- edging others’ feelings and keeping the tone light and friendly

  47. [47]

    Monica’s previous line was “What?” — showing surprise or curiosity

    **Dialogue context:** Joey just said he felt like he was falling but isn’t actually falling. Monica’s previous line was “What?” — showing surprise or curiosity. A natural Monica-like response would acknowledge Joey’s strange feeling in a supportive, slightly humorous, or curious way, without being overly analytical or poetic

  48. [48]

    yeah, that sounds intense—

    **Candidate Answer A:** “yeah, that sounds intense—” - Tone: Casual, empathetic, conversational. - Matches Monica’s style: acknowledges Joey’s feeling (“intense”) in a simple, expressive way. - Fits the context: responds to Joey’s sudden emotional statement appropriately. - No personality mismatch

  49. [49]

    wisely, mark ’falling’ as a pseudonym for your favor

    **Candidate Answer B:** “wisely, mark ’falling’ as a pseudonym for your favor.” - Tone: Unnaturally formal, cryptic, and poetic. - Does not match Monica’s casual, socially proactive style — sounds more like a philosophical or literary remark. - Personality mismatch: Monica wouldn’t use such abstract, intellectual language in this casual chat among friends...

  50. [50]

    Answer B is out of character and disrupts the natural, friendly tone

    **Comparison:** Answer A fits Monica’s personality and the dialogue flow much better. Answer B is out of character and disrupts the natural, friendly tone

  51. [51]

    answer_A better than answer_B

    **Conclusion:** "answer_A better than answer_B" Table 9: Example of LLM-as-a-Judge Evaluation Comparing Two Responses. 19 Response Comparison Example 2 (GOLD-disgust:0.85, sadness:0.4) SYSTEM You are now a person named Chandler with the following characteristics: A rational and supportive person, likely in his 20s–30s, who employs structured, future-focus...

  52. [52]

    - Tone should be calm, reassuring, not overly forceful or sarcastic

    Personality analysis - Chandler is described as *rational, supportive, structured, future-focused, using gentle persuasion to defuse tension*. - Tone should be calm, reassuring, not overly forceful or sarcastic. - Language should be collaborative (“we”), not confrontational

  53. [53]

    Keep talking

    Dialogue context - Monica is upset about something wedding-related. - Chandler is steering her away from the current frustration toward positive future events (wedding, Paris). - Last line from Monica: “Keep talking” — she wants to hear more reassurance about the positive future

  54. [54]

    I get it

    Candidate Answer A - Starts with empathy (“I get it”). - Uses future-focused reminders (“we’ve got a wedding to plan. . . Paris is waiting”). - Gentle suggestion to focus on magic rather than drama. - Ends with a question to involve her (“what do you think?”). - Matches Chandler’s supportive, structured, future-oriented style well

  55. [55]

    you’re *not* going to believe this

    Candidate Answer B - Starts with “you’re *not* going to believe this” — slightly odd phrasing for Chandler; feels more like a setup for a joke or sarcasm. - “Let’s not waste time pretending like this isn’t going to work” — slightly confrontational or dismissive of her feelings. - “We’ll make it work” is okay, but the tone is more pushy (“let’s get you dre...

  56. [56]

    get moving

    Personality mismatch check - Answer A: No mismatch — calm, rational, future-focused, gentle. - Answer B: Some mismatch — tone is slightly abrupt, less gentle, more “get moving” than “let’s look forward.”

  57. [57]

    answer_A better than answer_B

    Conclusion Answer A fits Chandler’s personality and the dialogue context better. "answer_A better than answer_B" Table 11: Example of LLM-as-a-Judge Evaluation Comparing Two Responses. Hyperparameter Value Hyperparameter Value train_batch_size64rollout.n8 max_prompt_length4096rollout.max_num_batched_tokens12400 max_response_length8192rollout.gpu_memory_ut...