Introduces an eight-class taxonomy for semantic image-text relations based on three metrics and a multimodal embedding model for predicting the classes from collected data.
Papalexakis, and Amit K
2 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 2representative citing papers
TimeProVe proposes a propose-then-verify framework using lightweight action-based candidate evidence generation followed by targeted VLM verification for efficient long video temporal reasoning, achieving 7.3% improvement on OTB with 75% fewer VLM calls.
citing papers explorer
-
Understanding, Categorizing and Predicting Semantic Image-Text Relations
Introduces an eight-class taxonomy for semantic image-text relations based on three metrics and a multimodal embedding model for predicting the classes from collected data.
-
TimeProVe: Propose, then Verify for Efficient Long Video Temporal Reasoning in Activities of Daily Living
TimeProVe proposes a propose-then-verify framework using lightweight action-based candidate evidence generation followed by targeted VLM verification for efficient long video temporal reasoning, achieving 7.3% improvement on OTB with 75% fewer VLM calls.