MMTM improves topic coherence and temporal stability in long-form video by tri-modal similarity-gated fusion of speech, audio, and visual embeddings with BERTopic, shown on German and English news datasets with released code and corpus.
Lokmanoglu and Dror Walter
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MMTM: Tri-Modal Topic Modeling for Long-Form Video via Similarity-Gated Fusion
MMTM improves topic coherence and temporal stability in long-form video by tri-modal similarity-gated fusion of speech, audio, and visual embeddings with BERTopic, shown on German and English news datasets with released code and corpus.