Not All Flips Are Conformity: Decomposing Stance Convergence in Multi-Agent LLM Debate

Chuan Xiao; Jianbin Qin; Ruiqi Xu; Shuyuan Zheng; Xiqi Hao; Yu-Xuan Qiu; Zengqing Wu

arxiv: 2606.00820 · v1 · pith:HH7WKKISnew · submitted 2026-05-30 · 💻 cs.CL

Not All Flips Are Conformity: Decomposing Stance Convergence in Multi-Agent LLM Debate

Xiqi Hao , Zengqing Wu , Yu-Xuan Qiu , Chuan Xiao , Ruiqi Xu , Shuyuan Zheng , Jianbin Qin This is my paper

Pith reviewed 2026-06-28 18:44 UTC · model grok-4.3

classification 💻 cs.CL

keywords multi-agent debateLLM reasoningstance convergenceconformitypersuasionanswer flip ratedecomposition frameworkcounterfactual conditions

0 comments

The pith

The conventional answer flip rate in multi-agent LLM debates conflates spontaneous instability, stance-induced conformity, and reasoning-induced persuasion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Multi-agent debate lets LLMs discuss to reach better answers, yet observed convergence could stem from genuine persuasion or mere social copying. Standard flip-rate metrics treat all answer changes as equivalent and therefore cannot separate agents that flip on their own from those that copy peers or adopt presented reasoning. The paper introduces controlled counterfactual conditions, such as self-reflection without peers, to isolate the three sources. Measurements in the main setting show that more than one-third of observations flip under self-reflection alone, that strict conformity accounts for 29 percent of changes and is predominantly harmful, and that even vacuous reasoning persuades resistant agents to adopt errors at rates of 20-39 percent. Because the sources cannot be told apart without these controls, simply lowering peer adoption leaves accuracy unchanged.

Core claim

The conventional answer flip rate conflates three distinct mechanisms: spontaneous instability (changes under self-reflection alone), stance-induced conformity (changes triggered by peer stances without reasoning content), and reasoning-induced persuasion (influence from the content of reasoning). A three-source decomposition framework isolates each mechanism through controlled counterfactual conditions, revealing in the primary MMLU-Pro setting that 37 percent of agent-question observations change under self-reflection alone, that strict conformity is 29 percent and remains predominantly harmful across replications, and that even vacuous reasoning is associated with 20-39 percent error adop

What carries the argument

Three-source decomposition framework that isolates spontaneous instability, stance-induced conformity, and reasoning-induced persuasion via controlled counterfactual conditions such as self-reflection alone.

If this is right

Without correctness labels or self-reflection controls, reducing peer adoption does not improve accuracy because harmful and beneficial influence cannot be distinguished.
Harmful conformity can be predicted from Round 0 features with AUC of 0.79.
Risk-targeted intervention reduces harmful conformity by 13.6 percentage points.
Even reasoning-like presentation without content carries substantial persuasive weight, leading resistant agents to adopt errors.
Robustness tests show substantial model-dependent instability across GPQA-Diamond and three model families.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Debate protocols could routinely run a self-reflection baseline first and discard or reweight agents that prove unstable before peer interaction begins.
The same decomposition could be applied to multi-agent negotiation or planning tasks to separate social compliance from informational transfer.
If instability proves highly model-dependent, selecting base models with lower self-flip rates might shrink the apparent benefit of debate more than any change to the debate rules themselves.

Load-bearing premise

The controlled counterfactual conditions isolate spontaneous instability, stance-induced conformity, and reasoning-induced persuasion as distinct mechanisms without significant confounding from the multi-agent setup itself.

What would settle it

If the flip rate under self-reflection alone equals the full multi-agent debate rate in a new run, or if the three measured components fail to sum consistently across conditions, the claim that the counterfactuals cleanly separate the mechanisms would be falsified.

Figures

Figures reproduced from arXiv: 2606.00820 by Chuan Xiao, Jianbin Qin, Ruiqi Xu, Shuyuan Zheng, Xiqi Hao, Yu-Xuan Qiu, Zengqing Wu.

**Figure 2.** Figure 2: Error adoption rates across the information [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 4.** Figure 4: Accuracy before and after intervention by risk [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Multi-agent debate (MAD) is a promising strategy for improving LLM reasoning, but when agents converge on a shared answer, it is unclear whether that convergence reflects genuine deliberation or social compliance. We show that the conventional answer flip rate conflates three distinct mechanisms: spontaneous instability, stance-induced conformity, and reasoning-induced persuasion. Our three-source decomposition framework isolates each through controlled counterfactual conditions. In the primary MMLU-Pro setting, 37% of agent-question observations change under self-reflection alone, while robustness tests show substantial model-dependent instability across GPQA-Diamond and three model families; strict conformity is 29% in the primary setting and remains predominantly harmful across model replications (57-77% correct-to-wrong). A controlled information-gradient experiment reveals that even vacuous reasoning is associated with 20-39% error adoption among resistant agents, with reasoning-like presentation carrying substantial persuasive weight. Harmful conformity can be predicted from Round 0 features (AUC = 0.79), and risk-targeted intervention reduces it by 13.6 percentage points (p < 0.001). However, without correctness labels or self-reflection controls, reducing peer adoption does not improve accuracy, because harmful and beneficial influence cannot be distinguished.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The three-mechanism decomposition of flips in multi-agent debate is a practical step forward, but the counterfactual conditions may not separate the sources as cleanly as claimed.

read the letter

The paper splits conventional flip rates in multi-agent LLM debate into spontaneous instability (37% from self-reflection alone), stance-induced conformity (29%, mostly harmful), and reasoning-induced persuasion (20-39% error adoption even from vacuous text). They quantify these on MMLU-Pro and GPQA-Diamond across model families, add a predictor for harmful conformity (AUC 0.79), and show an intervention that cuts conformity by 13.6 points. The finding that cutting peer adoption does not raise accuracy without correctness labels is a clear, honest caveat.

The experiments give concrete percentages and test robustness across settings, which is more than most MAD papers offer. The information-gradient condition is a reasonable way to probe persuasion separately.

The soft spot is the isolation claim. The counterfactual arms assume that self-reflection, strict conformity, and information conditions differ only in the intended mechanism, but shared prompt phrasing, agent identities, or repeated question exposure could shift all arms together. Without the exact condition texts and exclusion rules, it is hard to rule out that the measured gaps partly reflect experimental framing rather than the three named processes. Model-dependent instability also flags that results may not travel far.

This is for people who run or study multi-agent debate setups and want diagnostics beyond raw flip rates. It deserves peer review because the question is real, the controls are explicit, and the numbers are falsifiable even if the mechanism separation needs tighter validation.

Referee Report

3 major / 2 minor

Summary. The paper claims that conventional answer flip rates in multi-agent LLM debate conflate three mechanisms—spontaneous instability, stance-induced conformity, and reasoning-induced persuasion—and introduces a three-source decomposition framework using controlled counterfactual conditions. In the primary MMLU-Pro setting it reports 37% flips under self-reflection alone, 29% strict conformity (predominantly harmful at 57-77% correct-to-wrong), and 20-39% error adoption from an information-gradient experiment even with vacuous reasoning; it further shows that harmful conformity is predictable from Round-0 features (AUC 0.79) and that a risk-targeted intervention reduces it by 13.6 pp, while noting that reducing peer adoption without correctness labels does not improve accuracy.

Significance. If the decomposition is valid, the work supplies a more precise empirical vocabulary for convergence in MAD, distinguishes harmful from beneficial influence, and supplies actionable prediction and intervention results. The multi-benchmark, multi-model replication and the explicit demonstration that accuracy gains require correctness labels are concrete strengths that would inform both analysis and system design.

major comments (3)

[Abstract / counterfactual conditions] The central decomposition claim rests on the counterfactual conditions cleanly isolating the three mechanisms, yet the shared multi-agent prompt scaffolding, agent identities, and debate format are held constant across arms; any systematic effect of repeated question phrasing or model response bias would therefore be attributed to the named mechanisms rather than the experimental framing itself. (Abstract, primary MMLU-Pro setting and information-gradient experiment)
[Abstract / information-gradient experiment] The information-gradient result attributes 20-39% error adoption to 'reasoning-like presentation' even under vacuous reasoning, but the manuscript does not report how vacuous content is constructed or verified to contain no substantive reasoning; without that operationalization the persuasive-weight claim cannot be separated from residual content effects. (Abstract)
[Abstract / intervention results] The risk-targeted intervention is reported to reduce harmful conformity by 13.6 pp (p < 0.001), yet the paper does not specify whether the Round-0 predictor is applied prospectively or whether the reduction is measured in a held-out condition that avoids leakage from the same agent-question pairs used to train the predictor. (Abstract)

minor comments (2)

[Abstract] The abstract states 'robustness tests show substantial model-dependent instability across GPQA-Diamond and three model families' but supplies no model names, exact flip percentages, or table reference; adding these numbers would allow readers to assess the claimed variability.
[Abstract] The phrase 'strict conformity is 29% in the primary setting' would benefit from an explicit definition of the 'strict conformity' counterfactual condition (e.g., whether agents see only the peer answer or also the peer's reasoning trace).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our decomposition framework. We respond to each major comment below and note planned revisions to the manuscript.

read point-by-point responses

Referee: [Abstract / counterfactual conditions] The central decomposition claim rests on the counterfactual conditions cleanly isolating the three mechanisms, yet the shared multi-agent prompt scaffolding, agent identities, and debate format are held constant across arms; any systematic effect of repeated question phrasing or model response bias would therefore be attributed to the named mechanisms rather than the experimental framing itself. (Abstract, primary MMLU-Pro setting and information-gradient experiment)

Authors: We agree that holding the overall prompt scaffolding, agent identities, and debate format constant across conditions means that any residual effects from repeated question phrasing or model response biases are not fully separated from the targeted mechanisms. Our design isolates the mechanisms by varying only the presence/absence of peer stances and reasoning content while keeping other elements fixed, which is the standard approach for such decompositions; however, we acknowledge this does not eliminate all possible framing confounds. We will add an explicit discussion of this limitation, along with any related robustness checks, to the Methods and Limitations sections. revision: partial
Referee: [Abstract / information-gradient experiment] The information-gradient result attributes 20-39% error adoption to 'reasoning-like presentation' even under vacuous reasoning, but the manuscript does not report how vacuous content is constructed or verified to contain no substantive reasoning; without that operationalization the persuasive-weight claim cannot be separated from residual content effects. (Abstract)

Authors: The full manuscript (Section 3.2) describes vacuous reasoning as templated generic phrases (e.g., 'After careful consideration of the options...') that preserve surface format but contain no logical steps, evidence, or domain content; verification combined manual inspection by the authors with automated scans for reasoning-related keywords and sentence structures. We will revise the abstract to include a brief statement of this operationalization and expand the methods description for clarity. revision: yes
Referee: [Abstract / intervention results] The risk-targeted intervention is reported to reduce harmful conformity by 13.6 pp (p < 0.001), yet the paper does not specify whether the Round-0 predictor is applied prospectively or whether the reduction is measured in a held-out condition that avoids leakage from the same agent-question pairs used to train the predictor. (Abstract)

Authors: The predictor was trained on a random subset of agent-question pairs and evaluated prospectively on a disjoint held-out set of pairs to avoid leakage; the reported reduction and p-value reflect performance on this held-out condition. We will add this detail to the abstract and the corresponding experimental section in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical decomposition via controlled experiments

full rationale

The paper presents an empirical framework that measures flip rates under distinct experimental conditions (self-reflection alone, strict conformity setup, information-gradient). These are direct observational contrasts, not derivations, equations, or fitted parameters renamed as predictions. No self-citation chains, ansatzes, or uniqueness theorems are invoked as load-bearing premises. The central claim rests on the experimental design itself, which is externally falsifiable through replication on different models and datasets. This is self-contained empirical work with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the experimental controls and the assumption that the measured percentages reflect the isolated mechanisms without significant confounding from the multi-agent setup.

axioms (1)

domain assumption Controlled counterfactual conditions (e.g., self-reflection alone) isolate spontaneous instability, stance-induced conformity, and reasoning-induced persuasion as distinct mechanisms.
The three-source decomposition framework depends on this separation to attribute observed changes correctly.

pith-pipeline@v0.9.1-grok · 5769 in / 1350 out tokens · 29723 ms · 2026-06-28T18:44:49.419286+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

300 extracted references · 141 canonical work pages · 2 internal anchors

[1]

Forty-first international conference on machine learning , year=

Improving factuality and reasoning in language models through multiagent debate , author=. Forty-first international conference on machine learning , year=
[2]

International Conference on Machine Learning , pages=

Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs , author=. International Conference on Machine Learning , pages=. 2024 , organization=

2024
[3]

arXiv preprint arXiv:2509.05396 , year=

Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate , author=. arXiv preprint arXiv:2509.05396 , year=

work page arXiv
[4]

arXiv preprint arXiv:2502.08788 , year=

Stop Overvaluing Multi-Agent Debate--We Must Rethink Evaluation and Embrace Model Heterogeneity , author=. arXiv preprint arXiv:2502.08788 , year=

work page arXiv
[5]

arXiv preprint arXiv:2505.22960 , year=

Revisiting multi-agent debate as test-time scaling: A systematic study of conditional effectiveness , author=. arXiv preprint arXiv:2505.22960 , year=

work page arXiv
[6]

2025 IEEE International Conference on Information Reuse and Integration and Data Science (IRI) , pages=

Towards simulating social influence dynamics with llm-based multi-agents , author=. 2025 IEEE International Conference on Information Reuse and Integration and Data Science (IRI) , pages=. 2025 , organization=

2025
[7]

The Thirteenth International Conference on Learning Representations , year=

Do as We Do, Not as You Think: the Conformity of Large Language Models , author=. The Thirteenth International Conference on Learning Representations , year=
[8]

arXiv preprint arXiv:2505.21588 , year=

Herd behavior: Investigating peer influence in llm-based multi-agent systems , author=. arXiv preprint arXiv:2505.21588 , year=

work page arXiv
[9]

arXiv preprint arXiv:2508.14918 , year=

Disentangling the Drivers of LLM Social Conformity: An Uncertainty-Moderated Dual-Process Mechanism , author=. arXiv preprint arXiv:2508.14918 , year=

work page arXiv
[10]

arXiv preprint arXiv:2601.11563 , year=

Human-like Social Compliance in Large Language Models: Unifying Sycophancy and Conformity through Signal Competition Dynamics , author=. arXiv preprint arXiv:2601.11563 , year=

work page arXiv
[11]

International Conference on Learning Representations , volume=

Towards understanding sycophancy in language models , author=. International Conference on Learning Representations , volume=
[12]

Simple synthetic data reduces sycophancy in large language models

Simple synthetic data reduces sycophancy in large language models , author=. arXiv preprint arXiv:2308.03958 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

When truth is overridden: Uncovering the internal origins of sycophancy in large language models , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[14]

When Identity Skews Debate: Anonymization for Bias-Reduced Multi-Agent Reasoning

When Identity Skews Debate: Anonymization for Bias-Reduced Multi-Agent Reasoning , author=. arXiv preprint arXiv:2510.07517 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[15]

arXiv preprint arXiv:2509.23055 , year=

Peacemaker or Troublemaker: How Sycophancy Shapes Multi-Agent Debate , author=. arXiv preprint arXiv:2509.23055 , year=

work page arXiv
[16]

Advances in Neural Information Processing Systems , volume=

Multi-llm debate: Framework, principals, and interventions , author=. Advances in Neural Information Processing Systems , volume=
[17]

arXiv preprint arXiv:2509.11035 , year=

Free-mad: Consensus-free multi-agent debate , author=. arXiv preprint arXiv:2509.11035 , year=

work page arXiv
[18]

The Fourteenth International Conference on Learning Representations , year=

Multi-Agent Debate with Memory Masking , author=. The Fourteenth International Conference on Learning Representations , year=
[19]

Advances in Neural Information Processing Systems , volume=

Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting , author=. Advances in Neural Information Processing Systems , volume=
[20]

arXiv preprint arXiv:2507.13038 , year=

MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems , author=. arXiv preprint arXiv:2507.13038 , year=

work page arXiv
[21]

Transactions on Machine Learning Research , year=

Teaching Models to Express Their Uncertainty in Words , author=. Transactions on Machine Learning Research , year=
[22]

Advances in Neural Information Processing Systems , volume=

Mmlu-pro: A more robust and challenging multi-task language understanding benchmark , author=. Advances in Neural Information Processing Systems , volume=
[23]

First Conference on Language Modeling , year=

GPQA: A Graduate-Level Google-Proof Q&A Benchmark , author=. First Conference on Language Modeling , year=
[24]

International conference on learning representations , volume=

Large language models cannot self-correct reasoning yet , author=. International conference on learning representations , volume=
[25]

Advances in neural information processing systems , volume=

Self-refine: Iterative refinement with self-feedback , author=. Advances in neural information processing systems , volume=
[26]

The Eleventh International Conference on Learning Representations , year=

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. The Eleventh International Conference on Learning Representations , year=
[27]

International conference on learning representations , volume=

Chateval: Towards better llm-based evaluators through multi-agent debate , author=. International conference on learning representations , volume=
[28]

understanding

Do androids laugh at electric sheep? humor" understanding" benchmarks from the new yorker caption contest , author=. arXiv preprint arXiv:2209.06293 , year=

work page arXiv
[29]

arXiv preprint arXiv:2312.12241 , year=

Geomverse: A systematic evaluation of large models for geometric reasoning , author=. arXiv preprint arXiv:2312.12241 , year=

work page arXiv
[30]

Advances in Neural Information Processing Systems , volume=

Debate or vote: Which yields better decisions in multi-agent large language models? , author=. Advances in Neural Information Processing Systems , volume=
[31]

, author=

A study of normative and informational social influences upon individual judgment. , author=. The journal of abnormal and social psychology , volume=. 1955 , publisher=

1955
[32]

A minority of one against a unanimous majority

Studies of independence and conformity: I. A minority of one against a unanimous majority. , author=. Psychological monographs: General and applied , volume=. 1956 , publisher=

1956
[33]

1986 , publisher=

Communication and persuasion: Central and peripheral routes to attitude change , author=. 1986 , publisher=

1986
[34]

Journal of political Economy , volume=

A theory of fads, fashion, custom, and cultural change as informational cascades , author=. Journal of political Economy , volume=. 1992 , publisher=

1992
[35]

The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.0

work page doi:10.18653/v1/2026.wassa-1.0 2026
[36]

Council of LLM s: Evaluating Capability of Large Language Models to Annotate Propaganda

Sharma, Vivek and Jain, Shweta and Shokri, Mohammad and Levitan, Sarah Ita and Filatova, Elena. Council of LLM s: Evaluating Capability of Large Language Models to Annotate Propaganda. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.1

work page doi:10.18653/v1/2026.wassa-1.1 2026
[37]

Emoji Reactions on Telegram: Unreliable Indicators of Emotional Resonance

Tardelli, Serena and Alvisi, Lorenzo and Cima, Lorenzo and Cresci, Stefano and Tesconi, Maurizio. Emoji Reactions on Telegram: Unreliable Indicators of Emotional Resonance. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.2

work page doi:10.18653/v1/2026.wassa-1.2 2026
[38]

Quantifying Social Sentiment in Hostels Using A Domain-Specific Transformer Pipeline

McMurry, Ian W. Quantifying Social Sentiment in Hostels Using A Domain-Specific Transformer Pipeline. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.3

work page doi:10.18653/v1/2026.wassa-1.3 2026
[39]

Predicting Convincingness in Political Speech: How Emotional Tone Shapes Persuasive Strength

Verma, Bhuvanesh and Marreddy, Mounika and Mehler, Alexander. Predicting Convincingness in Political Speech: How Emotional Tone Shapes Persuasive Strength. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.4

work page doi:10.18653/v1/2026.wassa-1.4 2026
[40]

Measuring LLM s' Sensitivity to Paraphrased Opinion Prompts

Alhetelah, Bushra and Ahmad, Irfan. Measuring LLM s' Sensitivity to Paraphrased Opinion Prompts. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.5

work page doi:10.18653/v1/2026.wassa-1.5 2026
[41]

The Impact of Highlighting Subjective Language on Perceived News Trustworthiness

Shokri, Mohammad and Sharma, Vivek and Klapper, Emily and Jain, Shweta and Filatova, Elena and Levitan, Sarah Ita. The Impact of Highlighting Subjective Language on Perceived News Trustworthiness. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.6

work page doi:10.18653/v1/2026.wassa-1.6 2026
[42]

Appraisal Trajectories in Narratives Reveal Distinct Patterns of Emotion Evocation

Sch. Appraisal Trajectories in Narratives Reveal Distinct Patterns of Emotion Evocation. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.7

work page doi:10.18653/v1/2026.wassa-1.7 2026
[43]

Exploring Subjective Tasks in F arsi: A Survey Analysis and Evaluation of Language Models

Rooein, Donya and Plaza-del-Arco, Flor Miriam and Nozza, Debora and Hovy, Dirk. Exploring Subjective Tasks in F arsi: A Survey Analysis and Evaluation of Language Models. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.8

work page doi:10.18653/v1/2026.wassa-1.8 2026
[44]

and Loukachevitch, Natalia V

Iaroshenko, Polina V. and Loukachevitch, Natalia V. Emotional Lexicons: How Large Language Models Predict Emotional Ratings of R ussian Words. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.9

work page doi:10.18653/v1/2026.wassa-1.9 2026
[45]

Emotion-aware text simplification of user generated content using LLM s

Bezobrazova, Anastasiia and Sokova, Daria and Orasan, Constantin. Emotion-aware text simplification of user generated content using LLM s. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.10

work page doi:10.18653/v1/2026.wassa-1.10 2026
[46]

Crowd-Based Evaluation of Emotion Intensity Preservation in S panish -- B asque Tweet Machine Translation

Aranberri, Nora. Crowd-Based Evaluation of Emotion Intensity Preservation in S panish -- B asque Tweet Machine Translation. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.11

work page doi:10.18653/v1/2026.wassa-1.11 2026
[47]

and Markov, Ilia and Vossen, Piek

Schouten, Stefan F. and Markov, Ilia and Vossen, Piek. A Position Paper on Toxic Reasoning: Grounding Categories of Toxic Language in Implications and Attitudes. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.12

work page doi:10.18653/v1/2026.wassa-1.12 2026
[48]

Is Sentiment Banana-Shaped? Exploring the Geometry and Portability of Sentiment Concept Vectors

Lyngbaek, Laurits and Feldkamp, Pascale and Bizzoni, Yuri and Nielbo, Kristoffer and Enevoldsen, Kenneth. Is Sentiment Banana-Shaped? Exploring the Geometry and Portability of Sentiment Concept Vectors. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/20...

work page doi:10.18653/v1/2026.wassa-1.13 2026
[49]

Disentangling Emotion Understanding and Generation in Large Language Models

Jafari, Sadegh and Lefever, Els and Hoste, Veronique. Disentangling Emotion Understanding and Generation in Large Language Models. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.14

work page doi:10.18653/v1/2026.wassa-1.14 2026
[50]

News Credibility Assessment by LLM s and Humans: Implications for Political Bias

Neves, Pia Wenzel and Jakob, Charlott and Schmitt, Vera. News Credibility Assessment by LLM s and Humans: Implications for Political Bias. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.15

work page doi:10.18653/v1/2026.wassa-1.15 2026
[51]

Towards Simulating Social Media Users with LLM s: Evaluating the Operational Validity of Conditioned Comment Prediction

Schwager, Nils and M. Towards Simulating Social Media Users with LLM s: Evaluating the Operational Validity of Conditioned Comment Prediction. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.16

work page doi:10.18653/v1/2026.wassa-1.16 2026
[52]

Label-Consistent Data Generation for Aspect-Based Sentiment Analysis Using LLM Agents

Monfared, Mohammad Hossein Akbari and Flek, Lucie and Karimi, Akbar. Label-Consistent Data Generation for Aspect-Based Sentiment Analysis Using LLM Agents. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.17

work page doi:10.18653/v1/2026.wassa-1.17 2026
[53]

Antisocial Behavior Prediction: A Survey and Practical Guide

Ollagnier, Ana. Antisocial Behavior Prediction: A Survey and Practical Guide. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.18

work page doi:10.18653/v1/2026.wassa-1.18 2026
[54]

Real-Time Mitigation of Negative Emotion in Customer Care Calls

Gangopadhyay, Surupendu and Mehrabani, Mahnoosh. Real-Time Mitigation of Negative Emotion in Customer Care Calls. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.19

work page doi:10.18653/v1/2026.wassa-1.19 2026
[55]

Says Who? Argument Convincingness and Reader Stance Are Correlated with Perceived Author Personality

Weber, Sabine and Greschner, Lynn and Klinger, Roman. Says Who? Argument Convincingness and Reader Stance Are Correlated with Perceived Author Personality. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.20

work page doi:10.18653/v1/2026.wassa-1.20 2026
[56]

A Transformer and Prototype-based Interpretable Model for Contextual Sarcasm Detection

Wen, Ximing and Rezapour, Rezvaneh. A Transformer and Prototype-based Interpretable Model for Contextual Sarcasm Detection. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.21

work page doi:10.18653/v1/2026.wassa-1.21 2026
[57]

Multimodal Claim Extraction for Fact-Checking

Teo, Joycelyn and Cao, Rui and Deng, Zhenyun and Ding, Zifeng and Schlichtkrull, Michael Sejr and Vlachos, Andreas. Multimodal Claim Extraction for Fact-Checking. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.22

work page doi:10.18653/v1/2026.wassa-1.22 2026
[58]

A Multi-Aspect Evaluation Framework for Synthetic Data: Case Study on Irony and Sarcasm

Majer, Laura and Bari \'c , Ana and Sandalj, Florijan and Unkovi \'c , Ivan and Puva c a, Bojan and S najder, Jan. A Multi-Aspect Evaluation Framework for Synthetic Data: Case Study on Irony and Sarcasm. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2...

work page doi:10.18653/v1/2026.wassa-1.23 2026
[59]

Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.0

work page doi:10.18653/v1/2026.vardial-1.0 2026
[60]

and Abdelmoneim, Shahd and Kantharuban, Anjali and Alsboul, Otba and Lamsiyah, Salima and Marchisio, Kelly and Murray, Kenton

Robinson, Nathaniel R. and Abdelmoneim, Shahd and Kantharuban, Anjali and Alsboul, Otba and Lamsiyah, Salima and Marchisio, Kelly and Murray, Kenton. AMIYA Shared Task: A rabic Modeling In Your Accent at V ar D ial 2026. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.1

work page doi:10.18653/v1/2026.vardial-1.1 2026
[61]

Far Out: Evaluating Language Models on Slang in A ustralian and I ndian E nglish

Dilsiz, Deniz Kaya and Srirag, Dipankar and Joshi, Aditya. Far Out: Evaluating Language Models on Slang in A ustralian and I ndian E nglish. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.2

work page doi:10.18653/v1/2026.vardial-1.2 2026
[62]

Effects of Speaker Bias in Dialect Identification and Automatic Transcription with Self-Supervised Speech Models

Kuparinen, Olli. Effects of Speaker Bias in Dialect Identification and Automatic Transcription with Self-Supervised Speech Models. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.3

work page doi:10.18653/v1/2026.vardial-1.3 2026
[63]

O c W iki D ialects: A W ikipedia Dataset With Rich Metadata for O ccitan Dialect Identification

N \'e dey, Oriane and Bawden, Rachel and Cl \'e rice, Thibault and Sagot, Beno \^i t. O c W iki D ialects: A W ikipedia Dataset With Rich Metadata for O ccitan Dialect Identification. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.4

work page doi:10.18653/v1/2026.vardial-1.4 2026
[64]

and Garcia, Marcos

Irastortza-Urbieta, Xabier and Garc \'i a-Miguel, Jos \'e M. and Garcia, Marcos. Language Mixture to Develop Accurate G alician Dependency Parsers: An Exploration of Its Effects. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.5

work page doi:10.18653/v1/2026.vardial-1.5 2026
[65]

Crowdsourcing P iedmontese to Test LLM s on Non-Standard Orthography

Vico, Gianluca and Libovick \'y , Jind r ich. Crowdsourcing P iedmontese to Test LLM s on Non-Standard Orthography. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.6

work page doi:10.18653/v1/2026.vardial-1.6 2026
[66]

G erman- E nglish Code-Switching in Large Language Models

Aks. G erman- E nglish Code-Switching in Large Language Models. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.7

work page doi:10.18653/v1/2026.vardial-1.7 2026
[67]

Perplexity as a Metric for Dialectal Distance: A Computational Study of G reek Varieties

Chatzikyriakidis, Stergios and Psaltaki, Erofili and Papadakis, Dimitrios and Henriksson, Erik and Laippala, Veronika. Perplexity as a Metric for Dialectal Distance: A Computational Study of G reek Varieties. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.8

work page doi:10.18653/v1/2026.vardial-1.8 2026
[68]

A Subword Embedding Approach for Variation Detection in L uxembourgish User Comments

Lutgen, Anne-Marie and Plum, Alistair and Purschke, Christoph. A Subword Embedding Approach for Variation Detection in L uxembourgish User Comments. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.9

work page doi:10.18653/v1/2026.vardial-1.9 2026
[69]

Onomasiological Sense Alignment Across Dialect Dictionaries

Mederake, Nathalie and Urbach, Nico and Fischer, Hanna and Lameli, Alfred. Onomasiological Sense Alignment Across Dialect Dictionaries. A Taxonomy-Constrained LLM Classification. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.10

work page doi:10.18653/v1/2026.vardial-1.10 2026
[70]

and Uban, Ana Sabina and Marchitan, Teodor-George and Iordache, Ioan-Bogdan and Georgescu, Simona

Dinu, Liviu P. and Uban, Ana Sabina and Marchitan, Teodor-George and Iordache, Ioan-Bogdan and Georgescu, Simona. On the Intelligibility of R omance Language Varieties: S panish and P ortuguese in E urope and A merica. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.11

work page doi:10.18653/v1/2026.vardial-1.11 2026
[71]

Dialect Matters: Cross-Lingual ASR Transfer for Low-Resource I ndic Language Varieties

Dhasmana, Akriti and Srivastava, Aarohi and Chiang, David. Dialect Matters: Cross-Lingual ASR Transfer for Low-Resource I ndic Language Varieties. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.12

work page doi:10.18653/v1/2026.vardial-1.12 2026
[72]

Ara- HOPE : Human-Centric Post-Editing Evaluation for Dialectal A rabic to M odern S tandard A rabic Translation

Alabdullah, Abdullah and Han, Lifeng and Lin, Chenghua. Ara- HOPE : Human-Centric Post-Editing Evaluation for Dialectal A rabic to M odern S tandard A rabic Translation. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.13

work page doi:10.18653/v1/2026.vardial-1.13 2026
[73]

I ndic- T uned L ens: Interpreting Multilingual Models in I ndian Languages

Panchal, Mihir and Varshney, Deeksha and ., Mamta and Ekbal, Asif. I ndic- T uned L ens: Interpreting Multilingual Models in I ndian Languages. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.14

work page doi:10.18653/v1/2026.vardial-1.14 2026
[74]

Building ASR Resources for the Hutsul Dialect of U krainian

Kyslyi, Roman and Orlovskyi, Artem and Khomenko, Pavlo and Onyshchenko, Bohdan and Guzii, Zakhar. Building ASR Resources for the Hutsul Dialect of U krainian. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.15

work page doi:10.18653/v1/2026.vardial-1.15 2026
[75]

From F us H a to Folk: Exploring Cross-Lingual Transfer in A rabic Language Models

Khalak, Abdulmuizz and Issam, Abderrahmane and Spanakis, Gerasimos. From F us H a to Folk: Exploring Cross-Lingual Transfer in A rabic Language Models. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.16

work page doi:10.18653/v1/2026.vardial-1.16 2026
[76]

Extending ASR Evaluation Resources for M odern G reek Dialects

Tsoukala, Chara and Bompolas, Stavros and Margariti, Antigoni and Panagiotou, Konstantina and Plaiti, Maria Elisavet and Tzanakaki, Nefeli and Karatsareas, Petros and Ralli, Angela and Anastasopoulos, Antonios and Markantonatou, Stella. Extending ASR Evaluation Resources for M odern G reek Dialects. Proceedings of the 13th Workshop on NLP for Similar Lang...

work page doi:10.18653/v1/2026.vardial-1.17 2026
[77]

How Should We Model the Probability of a Language?

Dent, Rasul and Ortiz Suarez, Pedro and Cl \'e rice, Thibault and Sagot, Beno \^i t. How Should We Model the Probability of a Language?. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.18

work page doi:10.18653/v1/2026.vardial-1.18 2026
[78]

Bridging Dialectal Variation: A Phonetic Transcription Tool for T amil

Mahaganapathy, Ahrane and Karunakaran, Sumirtha and Navakulan, Kavitha and Sarveswaran, Kengatharaiyer. Bridging Dialectal Variation: A Phonetic Transcription Tool for T amil. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.19

work page doi:10.18653/v1/2026.vardial-1.19 2026
[79]

Regional Variation in the Performance of ASR Models on C roatian and S erbian

Samard z i \'c , Tanja and Rupnik, Peter and Ljube s i \'c , Nikola. Regional Variation in the Performance of ASR Models on C roatian and S erbian. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.20

work page doi:10.18653/v1/2026.vardial-1.20 2026
[80]

Syllable Structures Across A rabic Varieties

Qaddoumi, Abdelrahim and Kodner, Jordan and Khalifa, Salam and Broselow, Ellen and Rambow, Owen. Syllable Structures Across A rabic Varieties. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.21

work page doi:10.18653/v1/2026.vardial-1.21 2026

Showing first 80 references.

[1] [1]

Forty-first international conference on machine learning , year=

Improving factuality and reasoning in language models through multiagent debate , author=. Forty-first international conference on machine learning , year=

[2] [2]

International Conference on Machine Learning , pages=

Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs , author=. International Conference on Machine Learning , pages=. 2024 , organization=

2024

[3] [3]

arXiv preprint arXiv:2509.05396 , year=

Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate , author=. arXiv preprint arXiv:2509.05396 , year=

work page arXiv

[4] [4]

arXiv preprint arXiv:2502.08788 , year=

Stop Overvaluing Multi-Agent Debate--We Must Rethink Evaluation and Embrace Model Heterogeneity , author=. arXiv preprint arXiv:2502.08788 , year=

work page arXiv

[5] [5]

arXiv preprint arXiv:2505.22960 , year=

Revisiting multi-agent debate as test-time scaling: A systematic study of conditional effectiveness , author=. arXiv preprint arXiv:2505.22960 , year=

work page arXiv

[6] [6]

2025 IEEE International Conference on Information Reuse and Integration and Data Science (IRI) , pages=

Towards simulating social influence dynamics with llm-based multi-agents , author=. 2025 IEEE International Conference on Information Reuse and Integration and Data Science (IRI) , pages=. 2025 , organization=

2025

[7] [7]

The Thirteenth International Conference on Learning Representations , year=

Do as We Do, Not as You Think: the Conformity of Large Language Models , author=. The Thirteenth International Conference on Learning Representations , year=

[8] [8]

arXiv preprint arXiv:2505.21588 , year=

Herd behavior: Investigating peer influence in llm-based multi-agent systems , author=. arXiv preprint arXiv:2505.21588 , year=

work page arXiv

[9] [9]

arXiv preprint arXiv:2508.14918 , year=

Disentangling the Drivers of LLM Social Conformity: An Uncertainty-Moderated Dual-Process Mechanism , author=. arXiv preprint arXiv:2508.14918 , year=

work page arXiv

[10] [10]

arXiv preprint arXiv:2601.11563 , year=

Human-like Social Compliance in Large Language Models: Unifying Sycophancy and Conformity through Signal Competition Dynamics , author=. arXiv preprint arXiv:2601.11563 , year=

work page arXiv

[11] [11]

International Conference on Learning Representations , volume=

Towards understanding sycophancy in language models , author=. International Conference on Learning Representations , volume=

[12] [12]

Simple synthetic data reduces sycophancy in large language models

Simple synthetic data reduces sycophancy in large language models , author=. arXiv preprint arXiv:2308.03958 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

When truth is overridden: Uncovering the internal origins of sycophancy in large language models , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[14] [14]

When Identity Skews Debate: Anonymization for Bias-Reduced Multi-Agent Reasoning

When Identity Skews Debate: Anonymization for Bias-Reduced Multi-Agent Reasoning , author=. arXiv preprint arXiv:2510.07517 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

arXiv preprint arXiv:2509.23055 , year=

Peacemaker or Troublemaker: How Sycophancy Shapes Multi-Agent Debate , author=. arXiv preprint arXiv:2509.23055 , year=

work page arXiv

[16] [16]

Advances in Neural Information Processing Systems , volume=

Multi-llm debate: Framework, principals, and interventions , author=. Advances in Neural Information Processing Systems , volume=

[17] [17]

arXiv preprint arXiv:2509.11035 , year=

Free-mad: Consensus-free multi-agent debate , author=. arXiv preprint arXiv:2509.11035 , year=

work page arXiv

[18] [18]

The Fourteenth International Conference on Learning Representations , year=

Multi-Agent Debate with Memory Masking , author=. The Fourteenth International Conference on Learning Representations , year=

[19] [19]

Advances in Neural Information Processing Systems , volume=

Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting , author=. Advances in Neural Information Processing Systems , volume=

[20] [20]

arXiv preprint arXiv:2507.13038 , year=

MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems , author=. arXiv preprint arXiv:2507.13038 , year=

work page arXiv

[21] [21]

Transactions on Machine Learning Research , year=

Teaching Models to Express Their Uncertainty in Words , author=. Transactions on Machine Learning Research , year=

[22] [22]

Advances in Neural Information Processing Systems , volume=

Mmlu-pro: A more robust and challenging multi-task language understanding benchmark , author=. Advances in Neural Information Processing Systems , volume=

[23] [23]

First Conference on Language Modeling , year=

GPQA: A Graduate-Level Google-Proof Q&A Benchmark , author=. First Conference on Language Modeling , year=

[24] [24]

International conference on learning representations , volume=

Large language models cannot self-correct reasoning yet , author=. International conference on learning representations , volume=

[25] [25]

Advances in neural information processing systems , volume=

Self-refine: Iterative refinement with self-feedback , author=. Advances in neural information processing systems , volume=

[26] [26]

The Eleventh International Conference on Learning Representations , year=

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. The Eleventh International Conference on Learning Representations , year=

[27] [27]

International conference on learning representations , volume=

Chateval: Towards better llm-based evaluators through multi-agent debate , author=. International conference on learning representations , volume=

[28] [28]

understanding

Do androids laugh at electric sheep? humor" understanding" benchmarks from the new yorker caption contest , author=. arXiv preprint arXiv:2209.06293 , year=

work page arXiv

[29] [29]

arXiv preprint arXiv:2312.12241 , year=

Geomverse: A systematic evaluation of large models for geometric reasoning , author=. arXiv preprint arXiv:2312.12241 , year=

work page arXiv

[30] [30]

Advances in Neural Information Processing Systems , volume=

Debate or vote: Which yields better decisions in multi-agent large language models? , author=. Advances in Neural Information Processing Systems , volume=

[31] [31]

, author=

A study of normative and informational social influences upon individual judgment. , author=. The journal of abnormal and social psychology , volume=. 1955 , publisher=

1955

[32] [32]

A minority of one against a unanimous majority

Studies of independence and conformity: I. A minority of one against a unanimous majority. , author=. Psychological monographs: General and applied , volume=. 1956 , publisher=

1956

[33] [33]

1986 , publisher=

Communication and persuasion: Central and peripheral routes to attitude change , author=. 1986 , publisher=

1986

[34] [34]

Journal of political Economy , volume=

A theory of fads, fashion, custom, and cultural change as informational cascades , author=. Journal of political Economy , volume=. 1992 , publisher=

1992

[35] [35]

The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.0

work page doi:10.18653/v1/2026.wassa-1.0 2026

[36] [36]

Council of LLM s: Evaluating Capability of Large Language Models to Annotate Propaganda

Sharma, Vivek and Jain, Shweta and Shokri, Mohammad and Levitan, Sarah Ita and Filatova, Elena. Council of LLM s: Evaluating Capability of Large Language Models to Annotate Propaganda. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.1

work page doi:10.18653/v1/2026.wassa-1.1 2026

[37] [37]

Emoji Reactions on Telegram: Unreliable Indicators of Emotional Resonance

Tardelli, Serena and Alvisi, Lorenzo and Cima, Lorenzo and Cresci, Stefano and Tesconi, Maurizio. Emoji Reactions on Telegram: Unreliable Indicators of Emotional Resonance. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.2

work page doi:10.18653/v1/2026.wassa-1.2 2026

[38] [38]

Quantifying Social Sentiment in Hostels Using A Domain-Specific Transformer Pipeline

McMurry, Ian W. Quantifying Social Sentiment in Hostels Using A Domain-Specific Transformer Pipeline. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.3

work page doi:10.18653/v1/2026.wassa-1.3 2026

[39] [39]

Predicting Convincingness in Political Speech: How Emotional Tone Shapes Persuasive Strength

Verma, Bhuvanesh and Marreddy, Mounika and Mehler, Alexander. Predicting Convincingness in Political Speech: How Emotional Tone Shapes Persuasive Strength. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.4

work page doi:10.18653/v1/2026.wassa-1.4 2026

[40] [40]

Measuring LLM s' Sensitivity to Paraphrased Opinion Prompts

Alhetelah, Bushra and Ahmad, Irfan. Measuring LLM s' Sensitivity to Paraphrased Opinion Prompts. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.5

work page doi:10.18653/v1/2026.wassa-1.5 2026

[41] [41]

The Impact of Highlighting Subjective Language on Perceived News Trustworthiness

Shokri, Mohammad and Sharma, Vivek and Klapper, Emily and Jain, Shweta and Filatova, Elena and Levitan, Sarah Ita. The Impact of Highlighting Subjective Language on Perceived News Trustworthiness. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.6

work page doi:10.18653/v1/2026.wassa-1.6 2026

[42] [42]

Appraisal Trajectories in Narratives Reveal Distinct Patterns of Emotion Evocation

Sch. Appraisal Trajectories in Narratives Reveal Distinct Patterns of Emotion Evocation. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.7

work page doi:10.18653/v1/2026.wassa-1.7 2026

[43] [43]

Exploring Subjective Tasks in F arsi: A Survey Analysis and Evaluation of Language Models

Rooein, Donya and Plaza-del-Arco, Flor Miriam and Nozza, Debora and Hovy, Dirk. Exploring Subjective Tasks in F arsi: A Survey Analysis and Evaluation of Language Models. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.8

work page doi:10.18653/v1/2026.wassa-1.8 2026

[44] [44]

and Loukachevitch, Natalia V

Iaroshenko, Polina V. and Loukachevitch, Natalia V. Emotional Lexicons: How Large Language Models Predict Emotional Ratings of R ussian Words. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.9

work page doi:10.18653/v1/2026.wassa-1.9 2026

[45] [45]

Emotion-aware text simplification of user generated content using LLM s

Bezobrazova, Anastasiia and Sokova, Daria and Orasan, Constantin. Emotion-aware text simplification of user generated content using LLM s. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.10

work page doi:10.18653/v1/2026.wassa-1.10 2026

[46] [46]

Crowd-Based Evaluation of Emotion Intensity Preservation in S panish -- B asque Tweet Machine Translation

Aranberri, Nora. Crowd-Based Evaluation of Emotion Intensity Preservation in S panish -- B asque Tweet Machine Translation. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.11

work page doi:10.18653/v1/2026.wassa-1.11 2026

[47] [47]

and Markov, Ilia and Vossen, Piek

Schouten, Stefan F. and Markov, Ilia and Vossen, Piek. A Position Paper on Toxic Reasoning: Grounding Categories of Toxic Language in Implications and Attitudes. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.12

work page doi:10.18653/v1/2026.wassa-1.12 2026

[48] [48]

Is Sentiment Banana-Shaped? Exploring the Geometry and Portability of Sentiment Concept Vectors

Lyngbaek, Laurits and Feldkamp, Pascale and Bizzoni, Yuri and Nielbo, Kristoffer and Enevoldsen, Kenneth. Is Sentiment Banana-Shaped? Exploring the Geometry and Portability of Sentiment Concept Vectors. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/20...

work page doi:10.18653/v1/2026.wassa-1.13 2026

[49] [49]

Disentangling Emotion Understanding and Generation in Large Language Models

Jafari, Sadegh and Lefever, Els and Hoste, Veronique. Disentangling Emotion Understanding and Generation in Large Language Models. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.14

work page doi:10.18653/v1/2026.wassa-1.14 2026

[50] [50]

News Credibility Assessment by LLM s and Humans: Implications for Political Bias

Neves, Pia Wenzel and Jakob, Charlott and Schmitt, Vera. News Credibility Assessment by LLM s and Humans: Implications for Political Bias. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.15

work page doi:10.18653/v1/2026.wassa-1.15 2026

[51] [51]

Towards Simulating Social Media Users with LLM s: Evaluating the Operational Validity of Conditioned Comment Prediction

Schwager, Nils and M. Towards Simulating Social Media Users with LLM s: Evaluating the Operational Validity of Conditioned Comment Prediction. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.16

work page doi:10.18653/v1/2026.wassa-1.16 2026

[52] [52]

Label-Consistent Data Generation for Aspect-Based Sentiment Analysis Using LLM Agents

Monfared, Mohammad Hossein Akbari and Flek, Lucie and Karimi, Akbar. Label-Consistent Data Generation for Aspect-Based Sentiment Analysis Using LLM Agents. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.17

work page doi:10.18653/v1/2026.wassa-1.17 2026

[53] [53]

Antisocial Behavior Prediction: A Survey and Practical Guide

Ollagnier, Ana. Antisocial Behavior Prediction: A Survey and Practical Guide. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.18

work page doi:10.18653/v1/2026.wassa-1.18 2026

[54] [54]

Real-Time Mitigation of Negative Emotion in Customer Care Calls

Gangopadhyay, Surupendu and Mehrabani, Mahnoosh. Real-Time Mitigation of Negative Emotion in Customer Care Calls. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.19

work page doi:10.18653/v1/2026.wassa-1.19 2026

[55] [55]

Says Who? Argument Convincingness and Reader Stance Are Correlated with Perceived Author Personality

Weber, Sabine and Greschner, Lynn and Klinger, Roman. Says Who? Argument Convincingness and Reader Stance Are Correlated with Perceived Author Personality. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.20

work page doi:10.18653/v1/2026.wassa-1.20 2026

[56] [56]

A Transformer and Prototype-based Interpretable Model for Contextual Sarcasm Detection

Wen, Ximing and Rezapour, Rezvaneh. A Transformer and Prototype-based Interpretable Model for Contextual Sarcasm Detection. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.21

work page doi:10.18653/v1/2026.wassa-1.21 2026

[57] [57]

Multimodal Claim Extraction for Fact-Checking

Teo, Joycelyn and Cao, Rui and Deng, Zhenyun and Ding, Zifeng and Schlichtkrull, Michael Sejr and Vlachos, Andreas. Multimodal Claim Extraction for Fact-Checking. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2026.wassa-1.22

work page doi:10.18653/v1/2026.wassa-1.22 2026

[58] [58]

A Multi-Aspect Evaluation Framework for Synthetic Data: Case Study on Irony and Sarcasm

Majer, Laura and Bari \'c , Ana and Sandalj, Florijan and Unkovi \'c , Ivan and Puva c a, Bojan and S najder, Jan. A Multi-Aspect Evaluation Framework for Synthetic Data: Case Study on Irony and Sarcasm. The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis ( WASSA 2026). 2026. doi:10.18653/v1/2...

work page doi:10.18653/v1/2026.wassa-1.23 2026

[59] [59]

Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.0

work page doi:10.18653/v1/2026.vardial-1.0 2026

[60] [60]

and Abdelmoneim, Shahd and Kantharuban, Anjali and Alsboul, Otba and Lamsiyah, Salima and Marchisio, Kelly and Murray, Kenton

Robinson, Nathaniel R. and Abdelmoneim, Shahd and Kantharuban, Anjali and Alsboul, Otba and Lamsiyah, Salima and Marchisio, Kelly and Murray, Kenton. AMIYA Shared Task: A rabic Modeling In Your Accent at V ar D ial 2026. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.1

work page doi:10.18653/v1/2026.vardial-1.1 2026

[61] [61]

Far Out: Evaluating Language Models on Slang in A ustralian and I ndian E nglish

Dilsiz, Deniz Kaya and Srirag, Dipankar and Joshi, Aditya. Far Out: Evaluating Language Models on Slang in A ustralian and I ndian E nglish. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.2

work page doi:10.18653/v1/2026.vardial-1.2 2026

[62] [62]

Effects of Speaker Bias in Dialect Identification and Automatic Transcription with Self-Supervised Speech Models

Kuparinen, Olli. Effects of Speaker Bias in Dialect Identification and Automatic Transcription with Self-Supervised Speech Models. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.3

work page doi:10.18653/v1/2026.vardial-1.3 2026

[63] [63]

O c W iki D ialects: A W ikipedia Dataset With Rich Metadata for O ccitan Dialect Identification

N \'e dey, Oriane and Bawden, Rachel and Cl \'e rice, Thibault and Sagot, Beno \^i t. O c W iki D ialects: A W ikipedia Dataset With Rich Metadata for O ccitan Dialect Identification. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.4

work page doi:10.18653/v1/2026.vardial-1.4 2026

[64] [64]

and Garcia, Marcos

Irastortza-Urbieta, Xabier and Garc \'i a-Miguel, Jos \'e M. and Garcia, Marcos. Language Mixture to Develop Accurate G alician Dependency Parsers: An Exploration of Its Effects. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.5

work page doi:10.18653/v1/2026.vardial-1.5 2026

[65] [65]

Crowdsourcing P iedmontese to Test LLM s on Non-Standard Orthography

Vico, Gianluca and Libovick \'y , Jind r ich. Crowdsourcing P iedmontese to Test LLM s on Non-Standard Orthography. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.6

work page doi:10.18653/v1/2026.vardial-1.6 2026

[66] [66]

G erman- E nglish Code-Switching in Large Language Models

Aks. G erman- E nglish Code-Switching in Large Language Models. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.7

work page doi:10.18653/v1/2026.vardial-1.7 2026

[67] [67]

Perplexity as a Metric for Dialectal Distance: A Computational Study of G reek Varieties

Chatzikyriakidis, Stergios and Psaltaki, Erofili and Papadakis, Dimitrios and Henriksson, Erik and Laippala, Veronika. Perplexity as a Metric for Dialectal Distance: A Computational Study of G reek Varieties. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.8

work page doi:10.18653/v1/2026.vardial-1.8 2026

[68] [68]

A Subword Embedding Approach for Variation Detection in L uxembourgish User Comments

Lutgen, Anne-Marie and Plum, Alistair and Purschke, Christoph. A Subword Embedding Approach for Variation Detection in L uxembourgish User Comments. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.9

work page doi:10.18653/v1/2026.vardial-1.9 2026

[69] [69]

Onomasiological Sense Alignment Across Dialect Dictionaries

Mederake, Nathalie and Urbach, Nico and Fischer, Hanna and Lameli, Alfred. Onomasiological Sense Alignment Across Dialect Dictionaries. A Taxonomy-Constrained LLM Classification. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.10

work page doi:10.18653/v1/2026.vardial-1.10 2026

[70] [70]

and Uban, Ana Sabina and Marchitan, Teodor-George and Iordache, Ioan-Bogdan and Georgescu, Simona

Dinu, Liviu P. and Uban, Ana Sabina and Marchitan, Teodor-George and Iordache, Ioan-Bogdan and Georgescu, Simona. On the Intelligibility of R omance Language Varieties: S panish and P ortuguese in E urope and A merica. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.11

work page doi:10.18653/v1/2026.vardial-1.11 2026

[71] [71]

Dialect Matters: Cross-Lingual ASR Transfer for Low-Resource I ndic Language Varieties

Dhasmana, Akriti and Srivastava, Aarohi and Chiang, David. Dialect Matters: Cross-Lingual ASR Transfer for Low-Resource I ndic Language Varieties. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.12

work page doi:10.18653/v1/2026.vardial-1.12 2026

[72] [72]

Ara- HOPE : Human-Centric Post-Editing Evaluation for Dialectal A rabic to M odern S tandard A rabic Translation

Alabdullah, Abdullah and Han, Lifeng and Lin, Chenghua. Ara- HOPE : Human-Centric Post-Editing Evaluation for Dialectal A rabic to M odern S tandard A rabic Translation. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.13

work page doi:10.18653/v1/2026.vardial-1.13 2026

[73] [73]

I ndic- T uned L ens: Interpreting Multilingual Models in I ndian Languages

Panchal, Mihir and Varshney, Deeksha and ., Mamta and Ekbal, Asif. I ndic- T uned L ens: Interpreting Multilingual Models in I ndian Languages. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.14

work page doi:10.18653/v1/2026.vardial-1.14 2026

[74] [74]

Building ASR Resources for the Hutsul Dialect of U krainian

Kyslyi, Roman and Orlovskyi, Artem and Khomenko, Pavlo and Onyshchenko, Bohdan and Guzii, Zakhar. Building ASR Resources for the Hutsul Dialect of U krainian. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.15

work page doi:10.18653/v1/2026.vardial-1.15 2026

[75] [75]

From F us H a to Folk: Exploring Cross-Lingual Transfer in A rabic Language Models

Khalak, Abdulmuizz and Issam, Abderrahmane and Spanakis, Gerasimos. From F us H a to Folk: Exploring Cross-Lingual Transfer in A rabic Language Models. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.16

work page doi:10.18653/v1/2026.vardial-1.16 2026

[76] [76]

Extending ASR Evaluation Resources for M odern G reek Dialects

Tsoukala, Chara and Bompolas, Stavros and Margariti, Antigoni and Panagiotou, Konstantina and Plaiti, Maria Elisavet and Tzanakaki, Nefeli and Karatsareas, Petros and Ralli, Angela and Anastasopoulos, Antonios and Markantonatou, Stella. Extending ASR Evaluation Resources for M odern G reek Dialects. Proceedings of the 13th Workshop on NLP for Similar Lang...

work page doi:10.18653/v1/2026.vardial-1.17 2026

[77] [77]

How Should We Model the Probability of a Language?

Dent, Rasul and Ortiz Suarez, Pedro and Cl \'e rice, Thibault and Sagot, Beno \^i t. How Should We Model the Probability of a Language?. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.18

work page doi:10.18653/v1/2026.vardial-1.18 2026

[78] [78]

Bridging Dialectal Variation: A Phonetic Transcription Tool for T amil

Mahaganapathy, Ahrane and Karunakaran, Sumirtha and Navakulan, Kavitha and Sarveswaran, Kengatharaiyer. Bridging Dialectal Variation: A Phonetic Transcription Tool for T amil. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.19

work page doi:10.18653/v1/2026.vardial-1.19 2026

[79] [79]

Regional Variation in the Performance of ASR Models on C roatian and S erbian

Samard z i \'c , Tanja and Rupnik, Peter and Ljube s i \'c , Nikola. Regional Variation in the Performance of ASR Models on C roatian and S erbian. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.20

work page doi:10.18653/v1/2026.vardial-1.20 2026

[80] [80]

Syllable Structures Across A rabic Varieties

Qaddoumi, Abdelrahim and Kodner, Jordan and Khalifa, Salam and Broselow, Ellen and Rambow, Owen. Syllable Structures Across A rabic Varieties. Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects. 2026. doi:10.18653/v1/2026.vardial-1.21

work page doi:10.18653/v1/2026.vardial-1.21 2026