Towards Multimodal Sarcasm Detection (An _Obviously_ Perfect Paper)

Santiago Castro, Devamanyu Hazarika, Verónica Pérez-Rosas, Roger Zimmermann, Rada Mihalcea, Soujanya Poria · 2019 · cs.CL · arXiv 1906.01815

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Sarcasm is often expressed through several verbal and non-verbal cues, e.g., a change of tone, overemphasis in a word, a drawn-out syllable, or a straight looking face. Most of the recent work in sarcasm detection has been carried out on textual data. In this paper, we argue that incorporating multimodal cues can improve the automatic classification of sarcasm. As a first step towards enabling the development of multimodal approaches for sarcasm detection, we propose a new sarcasm dataset, Multimodal Sarcasm Detection Dataset (MUStARD), compiled from popular TV shows. MUStARD consists of audiovisual utterances annotated with sarcasm labels. Each utterance is accompanied by its context of historical utterances in the dialogue, which provides additional information on the scenario where the utterance occurs. Our initial results show that the use of multimodal information can reduce the relative error rate of sarcasm detection by up to 12.9% in F-score when compared to the use of individual modalities. The full dataset is publicly available for use at https://github.com/soujanyaporia/MUStARD

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

cs.CV · 2025-11-26 · unverdicted · novelty 6.0

Contrastive Fusion (ConFu) adds a fused-modality contrastive term to jointly align individual modalities and their combinations, enabling capture of higher-order dependencies like XOR relations while preserving pairwise alignments.

When Jokes Cross the Line: Analyzing Regular Humor and Dark Humor in YouTube Shorts

cs.MM · 2026-04-30 · unverdicted · novelty 5.0

TwistedHumor dataset shows dark humor in YouTube Shorts clusters around critique, coping, awkwardness and identity with more mixed and toxic audience reactions than regular humor.

Mitigating Multimodal Inconsistency via Cognitive Dual-Pathway Reasoning for Intent Recognition

cs.MM · 2026-05-10 · unverdicted · novelty 5.0

CDPR uses an intuition pathway for cross-modal consensus and a reasoning pathway for quantifying and mitigating inconsistencies to improve multimodal intent recognition.

citing papers explorer

Showing 3 of 3 citing papers.

The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment cs.CV · 2025-11-26 · unverdicted · none · ref 1 · internal anchor
Contrastive Fusion (ConFu) adds a fused-modality contrastive term to jointly align individual modalities and their combinations, enabling capture of higher-order dependencies like XOR relations while preserving pairwise alignments.
When Jokes Cross the Line: Analyzing Regular Humor and Dark Humor in YouTube Shorts cs.MM · 2026-04-30 · unverdicted · none · ref 5 · internal anchor
TwistedHumor dataset shows dark humor in YouTube Shorts clusters around critique, coping, awkwardness and identity with more mixed and toxic audience reactions than regular humor.
Mitigating Multimodal Inconsistency via Cognitive Dual-Pathway Reasoning for Intent Recognition cs.MM · 2026-05-10 · unverdicted · none · ref 2
CDPR uses an intuition pathway for cross-modal consensus and a reasoning pathway for quantifying and mitigating inconsistencies to improve multimodal intent recognition.

Towards Multimodal Sarcasm Detection (An _Obviously_ Perfect Paper)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer