PLUME uses latent-state autoregressive rollouts and a progressive training curriculum to deliver efficient reasoning for universal multimodal embeddings without generating explicit rationales.
Megapairs: Massive data synthesis for universal mul- timodal retrieval
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
HIVE raises multimodal retrieval nDCG@10 to 41.7 on the MM-BRIGHT benchmark by inserting LLM-driven hypothesis generation and verification between retrieval passes, delivering +9.5 over the best text-only baseline and +14.1 over the best multimodal baseline.
SSA-ME uses saliency-aware modeling to reduce visual neglect and semantic drift, achieving SOTA results on the MMEB benchmark for multimodal retrieval.
citing papers explorer
-
PLUME: Latent Reasoning Based Universal Multimodal Embedding
PLUME uses latent-state autoregressive rollouts and a progressive training curriculum to deliver efficient reasoning for universal multimodal embeddings without generating explicit rationales.
-
HIVE: Query, Hypothesize, Verify An LLM Framework for Multimodal Reasoning-Intensive Retrieval
HIVE raises multimodal retrieval nDCG@10 to 41.7 on the MM-BRIGHT benchmark by inserting LLM-driven hypothesis generation and verification between retrieval passes, delivering +9.5 over the best text-only baseline and +14.1 over the best multimodal baseline.
-
Combating Visual Neglect and Semantic Drift in Large Multimodal Models for Enhanced Cross-Modal Retrieval
SSA-ME uses saliency-aware modeling to reduce visual neglect and semantic drift, achieving SOTA results on the MMEB benchmark for multimodal retrieval.