MCM-AVQA improves correlation with human quality scores by using modality-specific confidence to suppress unreliable signals during audio-visual fusion under asymmetric distortions.
Audio-visual multime- dia quality assessment: A comprehensive survey
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
dataset 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
dataset 1polarities
use dataset 1representative citing papers
A hierarchical cross-modal fusion architecture with LoRA adapters predicts human perception of AI-dubbed clips at PCC > 0.75 after training on 12k Hindi-English examples and human MOS fine-tuning.
citing papers explorer
-
Multimodal Confidence Modeling in Audio-Visual Quality Assessment
MCM-AVQA improves correlation with human quality scores by using modality-specific confidence to suppress unreliable signals during audio-visual fusion under asymmetric distortions.
-
Can Hierarchical Cross-Modal Fusion Predict Human Perception of AI Dubbed Content?
A hierarchical cross-modal fusion architecture with LoRA adapters predicts human perception of AI-dubbed clips at PCC > 0.75 after training on 12k Hindi-English examples and human MOS fine-tuning.