Multimodal unified attention networks for vision-and-language interactions,

· 1908 · arXiv 1908.04107

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Alzheimer's Disease Diagnosis using a Multimodal Approach with 3D MRI and PET

cs.LG · 2026-06-18 · unverdicted · novelty 5.0

Multimodal 3D CNN model with GMU, gated self-attention, and sparsely gated MoE achieves up to 95.47% accuracy on NC vs AD using MRI and PET, with ablations showing MoE benefit.

ViASNet: A Video Ad Saliency Network for Predicting Dynamic Saliency and Viewer Engagement

cs.CV · 2026-05-28 · unverdicted · novelty 4.0

ViASNet applies a 3D U-Net architecture augmented with audio and semantic inputs to predict dynamic saliency in video ads and uses frame-wise entropy to diagnose low-engagement scenes on eye-tracked data from 151 ads.

citing papers explorer

Showing 2 of 2 citing papers.

Alzheimer's Disease Diagnosis using a Multimodal Approach with 3D MRI and PET cs.LG · 2026-06-18 · unverdicted · none · ref 11
Multimodal 3D CNN model with GMU, gated self-attention, and sparsely gated MoE achieves up to 95.47% accuracy on NC vs AD using MRI and PET, with ablations showing MoE benefit.
ViASNet: A Video Ad Saliency Network for Predicting Dynamic Saliency and Viewer Engagement cs.CV · 2026-05-28 · unverdicted · none · ref 14
ViASNet applies a 3D U-Net architecture augmented with audio and semantic inputs to predict dynamic saliency in video ads and uses frame-wise entropy to diagnose low-engagement scenes on eye-tracked data from 151 ads.

Multimodal unified attention networks for vision-and-language interactions,

fields

years

verdicts

representative citing papers

citing papers explorer