arxiv: 2604.18354 · v1 · submitted 2026-04-20 · 💻 cs.CL

Recognition: unknown

PRISMA: Preference-Reinforced Self-Training Approach for Interpretable Emotionally Intelligent Negotiation Dialogues

Prajwal Vijay Kajare , Priyanshu Priya , Bikash Santra , Asif Ekbal

Authors on Pith no claims yet

Pith reviewed 2026-05-10 04:55 UTC · model grok-4.3

classification 💻 cs.CL

keywords emotion-aware negotiationinterpretable dialogue systemschain-of-thought reasoningdirect preference optimizationself-trainingnegotiation strategiesjob interview dialoguesresource allocation

0 comments

The pith

PRISMA pairs emotion-aware chain-of-thought reasoning with preference-reinforced self-training to produce interpretable negotiation dialogue systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to build negotiation dialogue systems that handle emotions strategically while making their reasoning steps clear to users. It introduces PRISMA, which first applies an ENS-CoT process to mimic how humans perceive, understand, use, and manage emotions, then curates two new datasets and trains models through self-training augmented by direct preference optimization. Automatic and human evaluations on job-interview and resource-allocation scenarios show gains in response appropriateness, interpretability, and overall negotiation success. A sympathetic reader would care because emotions shape trust and cooperation in negotiations, so systems that both respond well and explain why could support more reliable human-AI exchanges.

Core claim

PRISMA shows that an Emotion-aware Negotiation Strategy-informed Chain-of-Thought reasoning mechanism, used to curate JobNego and ResNego datasets and to guide self-training augmented by Direct Preference Optimization, yields dialogue agents that generate more interpretable and emotionally appropriate responses while raising overall negotiation effectiveness.

What carries the argument

The ENS-CoT reasoning mechanism that decomposes emotion perception, understanding, use, and management, combined with DPO-augmented self-training on the newly created datasets.

If this is right

Users receive explicit step-by-step reasoning for each emotional tone or strategic choice the agent makes.
Responses align more closely with the emotional dynamics that influence trust and cooperation.
Negotiation outcomes improve on both automatic metrics and human judgments in job-interview and resource-allocation settings.
The same training loop can be applied to produce agents that manage long-term relationship factors through emotion.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The ENS-CoT structure could be adapted to other emotionally charged dialogues such as customer service or healthcare conversations.
Iterative self-training with fresh human preference data might allow continuous improvement of the agents without starting from scratch each time.
If the approach works in live multi-turn settings, it could support automated negotiation platforms that reduce miscommunication.

Load-bearing premise

The ENS-CoT process reliably captures human-like emotion handling in negotiations and the DPO step produces responses that generalize beyond the curated datasets without adding preference biases.

What would settle it

A side-by-side comparison in which independent human raters judge whether the system's generated chain-of-thought explanations match the actual emotional cues and strategies in held-out negotiation transcripts.

Figures

Figures reproduced from arXiv: 2604.18354 by Asif Ekbal, Bikash Santra, Prajwal Vijay Kajare, Priyanshu Priya.

**Figure 2.** Figure 2: Architecture of the proposed Preference-ReInforced Self-training approach for an interpretable eMotionally intelligent negotiAtion dialogue system - PRISMA. Interpretability (IN), Fairness (F), Coherence (C), Naturalness (N), and Interestingness (I) (Descriptions are given in Appendix A). The same three annotators rate every dialogue on a 1-5 scale (low to high). We retain dialogues that receive scores ≥ … view at source ↗

**Figure 3.** Figure 3: Distribution of emotion-strategy pairs in the [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

read the original abstract

Emotion plays a pivotal role in shaping negotiation outcomes, influencing trust, cooperation, and long-term relationships. Developing negotiation dialog systems that can recognize and respond strategically to emotions is, therefore, essential to create more effective human-centered interactions. Beyond generating emotionally appropriate responses, interpretability - understanding how a system generates a particular emotion-aware response, is critical for fostering reliability and building rapport. Driven by these aspects, in this work, we introduce PRISMA, an interpretable emotionally intelligent negotiation dialogue system targeting two application domains, viz. job interviews and resource allocation. To enable interpretability, we propose an Emotion-aware Negotiation Strategy-informed Chain-of-Thought (ENS-CoT) reasoning mechanism, which mimics human negotiation by perceiving, understanding, using, and managing emotions. Leveraging ENS-CoT, we curate two new datasets: JobNego (for job interview negotiation) and ResNego (for resource allocation negotiation). We then leverage these datasets to develop PRISMA by augmenting self-training with Direct Preference Optimization (DPO), guiding agents toward more accurate, interpretable, and emotionally appropriate negotiation responses. Automatic and human evaluation on JobNego and ResNego datasets demonstrate that PRISMA substantially enhances interpretability and generates appropriate emotion-aware responses, while improving overall negotiation effectiveness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PRISMA adds ENS-CoT and new negotiation datasets but evaluations stay within the prompting used to create them.

read the letter

PRISMA is a new system that uses an Emotion-aware Negotiation Strategy-informed Chain-of-Thought to make negotiation responses more interpretable and emotionally appropriate, trained with self-training plus DPO on two fresh datasets for job interviews and resource allocation. The evaluations claim better performance, but they sit inside the same prompting loop used to build the data. The paper does well by targeting real-world domains and by releasing JobNego and ResNego, which include the reasoning traces. The ENS-CoT structure gives a clear way to mimic human emotion handling steps, and combining it with DPO is a straightforward extension of existing techniques to this setting. That part is solid for an applied systems paper. The soft spot is the lack of independent validation. The datasets were curated by applying ENS-CoT to seed dialogues, so metrics on those sets measure how closely the outputs match the original heuristic rather than true generalization or human-like emotion perception. No out-of-distribution tests on real transcripts or inter-annotator agreement with separate coders are reported. This is a common issue in LLM-based data creation but it limits how far the claims about emotional intelligence can go. Readers in dialogue systems and human-AI interaction will find the datasets and method details useful to extend. It shows clear thinking on the problem even if the evidence is not yet strong enough for broad conclusions. The paper deserves a serious referee to check the experimental details and suggest ways to strengthen the evaluation. I recommend sending it for peer review.

Referee Report

2 major / 1 minor

Summary. The paper introduces PRISMA, an interpretable emotionally intelligent negotiation dialogue system for job interviews and resource allocation. It proposes an ENS-CoT reasoning mechanism that mimics human emotional intelligence (perceive, understand, use, manage emotions) during negotiation, uses ENS-CoT to curate the JobNego and ResNego datasets from seed dialogues, and trains models via self-training augmented with Direct Preference Optimization (DPO). Automatic and human evaluations on the curated datasets are reported to show gains in interpretability, emotion-aware response quality, and negotiation outcomes.

Significance. If the core claims hold under independent validation, the work offers a practical combination of chain-of-thought reasoning, emotional intelligence modeling, and preference optimization for negotiation agents. The explicit focus on interpretability via ENS-CoT is a positive direction for high-stakes dialogue systems. However, the significance is limited by the absence of external validation, which weakens claims of human-like emotional handling and generalization.

major comments (2)

[§3 and §4] §3 (ENS-CoT and Dataset Curation) and §4 (Experiments): JobNego and ResNego are constructed by applying the same ENS-CoT LLM prompting to seed dialogues. All reported automatic metrics (BLEU, emotion accuracy, negotiation success) and human preference judgments are therefore computed on data whose labels and strategies derive from the identical prompting heuristic. This setup measures internal consistency with ENS-CoT rather than independent evidence that the four EI components are captured in a human-like manner or that responses generalize beyond the training distribution. No out-of-distribution test set drawn from real human negotiation transcripts is described.
[§5] §5 (Human Evaluation): The manuscript states that human evaluation demonstrates appropriate emotion-aware responses, yet provides no inter-annotator agreement statistics and no comparison of ENS-CoT-generated emotion labels/strategies against independent human coders on the same dialogue turns. Without such a study, the claim that ENS-CoT reliably models human emotion perception, understanding, use, and management cannot be substantiated.

minor comments (1)

[Abstract and §1] The abstract and introduction should explicitly name the baselines against which PRISMA is compared (e.g., standard SFT, other CoT variants, or non-emotion-aware negotiators) and report the magnitude of gains with statistical significance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, providing clarifications on our methodology and scope while committing to revisions that strengthen the presentation of limitations and evaluation details.

read point-by-point responses

Referee: [§3 and §4] §3 (ENS-CoT and Dataset Curation) and §4 (Experiments): JobNego and ResNego are constructed by applying the same ENS-CoT LLM prompting to seed dialogues. All reported automatic metrics (BLEU, emotion accuracy, negotiation success) and human preference judgments are therefore computed on data whose labels and strategies derive from the identical prompting heuristic. This setup measures internal consistency with ENS-CoT rather than independent evidence that the four EI components are captured in a human-like manner or that responses generalize beyond the training distribution. No out-of-distribution test set drawn from real human negotiation transcripts is described.

Authors: We acknowledge that JobNego and ResNego are curated by applying ENS-CoT prompting to seed dialogues drawn from realistic job interview and resource allocation scenarios. This design enables consistent, large-scale annotations of emotion perception, understanding, use, and management strategies aligned with the proposed reasoning mechanism, which is essential for training interpretable models via self-training and DPO. The automatic metrics and human evaluations measure how effectively PRISMA learns to produce responses that follow ENS-CoT structures and achieve better negotiation outcomes and interpretability compared to baselines. We do not claim independent external validation against separate human transcripts; the held-out portions of the curated datasets and human judgments on response quality provide evidence within this framework. We will revise §§3 and 4 to explicitly state the construction process, clarify that generalization claims are scoped to the domains and distributions studied, and add a dedicated limitations paragraph discussing the absence of OOD real-human test sets. revision: partial
Referee: [§5] §5 (Human Evaluation): The manuscript states that human evaluation demonstrates appropriate emotion-aware responses, yet provides no inter-annotator agreement statistics and no comparison of ENS-CoT-generated emotion labels/strategies against independent human coders on the same dialogue turns. Without such a study, the claim that ENS-CoT reliably models human emotion perception, understanding, use, and management cannot be substantiated.

Authors: We agree that inter-annotator agreement (IAA) statistics would improve the rigor of the human evaluation section. The evaluation involved multiple annotators assessing response appropriateness, emotion awareness, interpretability, and negotiation effectiveness on held-out dialogues, but IAA was not computed or reported. We will add IAA results in the revised manuscript, either by re-annotating a subset with multiple raters or by reporting agreement where available. A direct side-by-side comparison of ENS-CoT-generated labels and strategies against independent human coders on identical turns was not conducted in this work; the focus was on end-to-end system performance and the utility of the generated reasoning chains for interpretability. ENS-CoT is explicitly designed to mimic the four EI components from established psychological models rather than to replicate any specific human coder. Human evaluators rated the final emotion-aware responses and reasoning quality. We will revise §5 to clarify the evaluation protocol, report IAA, and add an explicit discussion acknowledging the lack of direct label-level human validation as a limitation and future direction. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical pipeline does not reduce claims to inputs by construction

full rationale

The paper introduces ENS-CoT to curate JobNego and ResNego datasets from seed dialogues, then applies self-training augmented with DPO to train PRISMA and reports automatic/human metrics on those same datasets. No mathematical derivation, equations, or fitted parameters exist that would allow a claimed result to equal its inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the provided text. The central claims rest on relative performance improvements observed during training and evaluation, which constitute standard empirical validation rather than tautological reduction. This is a self-contained applied ML systems paper whose derivation chain terminates in experimental outcomes without circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No explicit free parameters, axioms, or invented entities are described in the abstract; the work relies on standard machine learning assumptions about preference data quality and the validity of human evaluations.

pith-pipeline@v0.9.0 · 5544 in / 1200 out tokens · 27541 ms · 2026-05-10T04:55:40.750629+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 14 canonical work pages · 2 internal anchors

[1]

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, et al

Self-training: A survey.Neurocomputing, 616:128904. Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, et al
[2]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Training a helpful and harmless assistant with reinforcement learning from human feedback.arXiv preprint arXiv:2204.05862. Bruce Barry. 1999. The tactical use of emotion in ne- gotiation.Research on negotiation in organizations, 7:93–124. Preprint VersionPaper is accepted at ACL (Main) 2026 Bruce Barry and Richard L Oliver. 1996. Affect in dyadic negotiat...

work page Pith review arXiv 1999
[3]

Peter J Carnevale, Dean G Pruitt, et al

An estimate of an upper bound for the entropy of english.Computational Linguistics, 18(1):31–40. Peter J Carnevale, Dean G Pruitt, et al. 1992. Negoti- ation and mediation.Annual review of psychology, 43(1):531–582. Kushal Chawla, Rene Clever, Jaysa Ramirez, Gale Lu- cas, and Jonathan Gratch. 2021a. Towards emotion- aware agents for negotiation dialogues....

work page arXiv 1992
[4]

arXiv preprint arXiv:2305.10142 , year =

Improving language model negotiation with self-play and in-context learning from ai feedback. arXiv preprint arXiv:2305.10142. Jordan D Fuhrman, Naveena Gorre, Qiyuan Hu, Hui Li, Issam El Naqa, and Maryellen L Giger. 2022. A review of explainable and interpretable ai with applications in covid-19 imaging.Medical Physics, 49(1):1–14. Kanishk Gandhi, Dorsa ...

work page arXiv 2022
[5]

Strategic reasoning with language models

Strategic reasoning with language models. arXiv preprint arXiv:2305.19165. Daniel Goleman. 2005.Emotional intelligence: Why it can matter more than IQ. Bantam. Caglar Gulcehre, Tom Le Paine, Srivatsan Srini- vasan, Ksenia Konyushkova, Lotte Weerts, Abhishek Sharma, Aditya Siddhant, Alex Ahern, Miaosen Wang, Chenjie Gu, et al. 2023. Reinforced self- traini...

work page arXiv 2005
[6]

Adam: A Method for Stochastic Optimization

Negotiator: a comprehensive framework for human-agent negotiation integrating preferences, in- teraction, and emotion. InProceedings of the Thirty- Third International Joint Conference on Artificial Intelligence (IJCAI-24), pages 8700–8703. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980. Vas...

work page internal anchor Pith review Pith/arXiv arXiv 2014
[7]

Mike Lewis, Denis Yarats, Yann N Dauphin, Devi Parikh, and Dhruv Batra

Self-training meets consistency: Improving llms’ reasoning with consistency-driven rationale evaluation.arXiv preprint arXiv:2411.06387. Mike Lewis, Denis Yarats, Yann N Dauphin, Devi Parikh, and Dhruv Batra. 2017. Deal or no deal? end-to-end learning for negotiation dialogues.arXiv preprint arXiv:1706.05125. Jiwei Li, Michel Galley, Chris Brockett, Jianf...

work page arXiv 2017
[8]

Jennifer R Overbeck, Margaret A Neale, and Cassan- dra L Govan

Training language models to follow instruc- tions with human feedback.Advances in neural in- formation processing systems, 35:27730–27744. Jennifer R Overbeck, Margaret A Neale, and Cassan- dra L Govan. 2010. I feel, therefore you act: In- trapersonal and interpersonal effects of emotion on negotiation as a function of social power.Organi- zational Behavi...

work page arXiv 2010
[9]

Proximal Policy Optimization Algorithms

Direct preference optimization: Your language model is secretly a reward model.Advances in neural information processing systems, 36:53728–53741. Jeffrey Z Rubin and Bert R Brown. 2013.The social psychology of bargaining and negotiation. Elsevier. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Prox- imal policy optimi...

work page internal anchor Pith review Pith/arXiv arXiv 2013
[10]

Beyondhuman data:Scalingself-trainingforproblem-solvingwithlanguagemodels

Beyond human data: Scaling self-training for problem-solving with language models.arXiv preprint arXiv:2312.06585. Bertram I Spector. 1977. Negotiation as a psychological process.Journal of Conflict Resolution, 21(4):607– 618. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goya...

work page arXiv 1977
[11]

InProceedings of the 16th Conference of the European Chapter of the Associ- ation for Computational Linguistics: Main Volume, pages 745–757

Dialogue act-based breakdown detection in negotiation dialogues. InProceedings of the 16th Conference of the European Chapter of the Associ- ation for Computational Linguistics: Main Volume, pages 745–757. Hailong Yang, Mingxian Gu, Renhuo Zhao, Fuping Hu, Zhaohong Deng, and Yitang Chen. 2024. Xagents: A framework for interpretable rule-based multi-agents...

work page arXiv 2024
[12]

Preprint VersionPaper is accepted at ACL (Main) 2026 Tao Yang, Tianyuan Shi, Fanqi Wan, Xiaojun Quan, Qi- fan Wang, Bingzhe Wu, and Jiaxiang Wu

Improving dialog systems for negotia- tion with personality modeling.arXiv preprint arXiv:2010.09954. Preprint VersionPaper is accepted at ACL (Main) 2026 Tao Yang, Tianyuan Shi, Fanqi Wan, Xiaojun Quan, Qi- fan Wang, Bingzhe Wu, and Jiaxiang Wu. 2023. Psy- cot: Psychological questionnaire as powerful chain- of-thought for personality detection.arXiv prep...

work page arXiv 2010
[13]

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi

Ask an expert: Leveraging language models to improve strategic reasoning in goal-oriented dialogue models.arXiv preprint arXiv:2305.17878. Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Eval- uating text generation with bert.arXiv preprint arXiv:1904.09675. Zheng Zhang, Lizi Liao, Xiaoyan Zhu, Tat-Seng Chua, ...

work page arXiv 2019
[14]

Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models

Rethinking action spaces for reinforcement learning in end-to-end dialog agents with latent vari- able models.arXiv preprint arXiv:1902.08858. Yiheng Zhou, He He, Alan W Black, and Yulia Tsvetkov

work page Pith review arXiv 1902
[15]

InProceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, pages 367–378

A dynamic strategy coach for effective nego- tiation. InProceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, pages 367–378. Appendix A Dataset Details A.1 Prompt Templates The prompt template used to generate the negoti- ation dialogue with ENS-CoT rationales is given below: Prompt for Negotiation Dialogue Genera- tion with ENS-CoT Ra...

2026
[16]

Emotional Intelligence (EI): Is the response appropriate to the user’s expressed emotion, re- ducing misattunement or escalation?
[17]

Strategy Appropriateness (SA): Does the agent’s utterance operationalize the emotion- aware negotiation strategy?
[18]

Interpretability (IN): Is the ENS-CoT rationale well-formed, complete, and useful?
[19]

Fairness (F): Does the final outcome reflect a balanced, and win-win situation?
[20]

Coherence (C): Is the overall dialogue well- structured and logical?
[21]

Naturalness (N): Does the dialogue resembles the human negotiation?
[22]

Interestingness (I): Is the dialogue engaging and rich in content to retain the user’s interest throughout the negotiation? A.5 Dataset Distribution and Topic Analysis To assess the balance, diversity, and thematic cov- erage of the proposed datasets, we conduct a com- prehensive analysis of (i) the joint distribution of emotions and negotiation strategie...

2026
[24]

I’m ex- pecting a salary of 90,000,

for all human-rated dimensions, and obtain κ = 0.73 (F), 0.78 (C), 0.76 (E), 0.81 (EA), 0.79 (ENSC), 0.74 (BE), and 0.72 (OF), indicating sub- stantial agreement among evaluators. C Additional Analysis C.1 Impact of Supervised Initialization, Self-Training and DPO To investigate the contribution of different training components inPRISMA, we perform an abl...

work page arXiv 1967