AV-Master introduces dynamic adaptive focus sampling, modality preference modeling, and dual-path contrastive loss to outperform prior methods on audio-visual question answering benchmarks.
Question-aware global-local video understanding network for audio-visual question answering,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
AV-Master: Dual-Path Comprehensive Perception Makes Better Audio-Visual Question Answering
AV-Master introduces dynamic adaptive focus sampling, modality preference modeling, and dual-path contrastive loss to outperform prior methods on audio-visual question answering benchmarks.