MoSA improves dynamic scene graph generation by fusing motion attributes with spatial features and aligning them cross-modally with relationship text embeddings, plus a weighted loss for rare classes, achieving top results on Action Genome.
(2.5+ 1) d spatio-temporal scene graphs for video question answering
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MOSA: Motion-Guided Semantic Alignment for Dynamic Scene Graph Generation
MoSA improves dynamic scene graph generation by fusing motion attributes with spatial features and aligning them cross-modally with relationship text embeddings, plus a weighted loss for rare classes, achieving top results on Action Genome.