A systematic analysis of evaluation practices in multimedia event extraction reveals that minor methodological choices cause large performance swings and overestimation of cross-modal grounding ability.
Proceedings of the 30th ACM International Conference on Multimedia , pages =
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3verdicts
UNVERDICTED 3representative citing papers
NEST is a new benchmark dataset for narrative event structures in long videos, with baselines reporting ETD below 8%, EL under 6%, EAE below 11%, and ERE at 35-44% F1.
ECHO reframes multimedia event extraction as multi-agent iterative refinement over an explicit Multimedia Event Hypergraph with a decoupled Link-then-Bind strategy, delivering 7.3 and 15.5 F1 gains on event mention and argument role.
citing papers explorer
-
Evaluation Pitfalls and Challenges in Multimedia Event Extraction
A systematic analysis of evaluation practices in multimedia event extraction reveals that minor methodological choices cause large performance swings and overestimation of cross-modal grounding ability.