S3 decomposes multimodal data into selectable semantic experts, routes them adaptively, and sparsifies to achieve higher accuracy on MultiBench benchmarks with peak performance at intermediate sparsity levels.
InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Pro- cessing (EMNLP-IJCNLP)
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
CDPR uses an intuition pathway for cross-modal consensus and a reasoning pathway for quantifying and mitigating inconsistencies to improve multimodal intent recognition.
A self-captioning method using a Multimodal Interaction Gate amplifies redundant interactions to reduce visual-induced errors by 38.3% and improve consistency by 16.8% in vision-language models.
CUCI-Net abstracts context-utterance dependency into an interpretation cue that combines local modality signals with global context and feeds it into the final multimodal interaction for context-conditioned predictions.
citing papers explorer
-
Toward Structural Multimodal Representations: Specialization, Selection, and Sparsification via Mixture-of-Experts
S3 decomposes multimodal data into selectable semantic experts, routes them adaptively, and sparsifies to achieve higher accuracy on MultiBench benchmarks with peak performance at intermediate sparsity levels.
-
Mitigating Multimodal Inconsistency via Cognitive Dual-Pathway Reasoning for Intent Recognition
CDPR uses an intuition pathway for cross-modal consensus and a reasoning pathway for quantifying and mitigating inconsistencies to improve multimodal intent recognition.
-
Self-Captioning Multimodal Interaction Tuning: Amplifying Exploitable Redundancies for Robust Vision Language Models
A self-captioning method using a Multimodal Interaction Gate amplifies redundant interactions to reduce visual-induced errors by 38.3% and improve consistency by 16.8% in vision-language models.
-
Beyond Isolated Utterances: Cue-Guided Interaction for Context-Dependent Conversational Multimodal Understanding
CUCI-Net abstracts context-utterance dependency into an interpretation cue that combines local modality signals with global context and feeds it into the final multimodal interaction for context-conditioned predictions.