Beyond Isolated Utterances: Cue-Guided Interaction for Context-Dependent Conversational Multimodal Understanding

· 2026 · cs.MM · arXiv 2604.25618

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Conversational multimodal understanding aims to infer the meaning or label of the current utterance from its preceding dialogue context together with textual, acoustic, and visual signals. Existing methods mainly strengthen contextual modeling through enhanced encoding, fusion, or propagation, but rarely abstract the context-utterance dependency into an explicit cue and incorporate it into later multimodal reasoning. To address this issue, we propose CUCI-Net for conversational multimodal understanding. CUCI-Net fully preserves the structural distinction between context and utterance during encoding, effectively abstracts their dependency into an interpretation cue by combining local modality evidence with global contextual evidence, and seamlessly integrates the resulting cue into the final multimodal interaction stage for context-conditioned prediction. Extensive experiments on mainstream benchmark datasets fully demonstrate the effectiveness of the proposed method.

representative citing papers

IDO: Incongruity-aware Distribution Optimization for Multimodal Fake News Detection

cs.CV · 2026-06-02 · unverdicted · novelty 5.0

IDO uses channel-wise reweighting, Gaussian modeling of factual uncertainty, and incongruity contrastive learning to achieve SOTA multimodal fake news detection.

State-Anchored Complete-View Distillation for Robust Conversational Multimodal Emotion Recognition

cs.MM · 2026-05-28 · unverdicted · novelty 4.0

CoRe-KD improves conversational multimodal emotion recognition under missing modalities via complete-view state anchoring and nonverbal conflict exposure on IEMOCAP and MELD.

citing papers explorer

Showing 2 of 2 citing papers.

IDO: Incongruity-aware Distribution Optimization for Multimodal Fake News Detection cs.CV · 2026-06-02 · unverdicted · none · ref 61 · internal anchor
IDO uses channel-wise reweighting, Gaussian modeling of factual uncertainty, and incongruity contrastive learning to achieve SOTA multimodal fake news detection.
State-Anchored Complete-View Distillation for Robust Conversational Multimodal Emotion Recognition cs.MM · 2026-05-28 · unverdicted · none · ref 3 · internal anchor
CoRe-KD improves conversational multimodal emotion recognition under missing modalities via complete-view state anchoring and nonverbal conflict exposure on IEMOCAP and MELD.

Beyond Isolated Utterances: Cue-Guided Interaction for Context-Dependent Conversational Multimodal Understanding

fields

years

verdicts

representative citing papers

citing papers explorer