Query-adaptive audio-visual person retrieval detects active modalities via cross-modal score consistency, achieving 94.2% P@1 on BBC Rewind corpus and outperforming unimodal and fixed-fusion baselines.
Speaker retrieval in the wild: Challenges, effectiveness and robustness,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
To Be Multimodal or Not to Be: Query-Adaptive Audio-Visual Person Retrieval via Active Modality Detection
Query-adaptive audio-visual person retrieval detects active modalities via cross-modal score consistency, achieving 94.2% P@1 on BBC Rewind corpus and outperforming unimodal and fixed-fusion baselines.