CoDAAR aligns modality-specific codebooks at the index level using Discrete Temporal Alignment and Cascading Semantic Alignment to achieve cross-modal generalization while preserving unique structures, reporting state-of-the-art results on event classification, localization, video segmentation, and跨
Pretraining Setup Backbone features:Following [32], for every 1 s video segment, we sample 16 RGB frames and extract pool5 ac- tivations from a VGG-19 model [30]
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Cross-Modal-Domain Generalization Through Semantically Aligned Discrete Representations
CoDAAR aligns modality-specific codebooks at the index level using Discrete Temporal Alignment and Cascading Semantic Alignment to achieve cross-modal generalization while preserving unique structures, reporting state-of-the-art results on event classification, localization, video segmentation, and跨