Llave: Large language and vision embedding models with hardness-weighted contrastive learning

Zhibin Lan, Liqiang Niu, Fandong Meng, Jie Zhou, Jinsong Su · 2025 · arXiv 2503.04812

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

mEOL: Training-Free Instruction-Guided Multimodal Embedder for Vector Graphics and Image Retrieval

cs.CV · 2026-04-18 · unverdicted · novelty 7.0

mEOL creates aligned embeddings for text, images, and SVGs using instruction-guided MLLM one-word summaries and semantic SVG rewriting, outperforming baselines on a new text-to-SVG retrieval benchmark.

PLUME: Latent Reasoning Based Universal Multimodal Embedding

cs.CV · 2026-04-02 · unverdicted · novelty 7.0

PLUME uses latent-state autoregressive rollouts and a progressive training curriculum to deliver efficient reasoning for universal multimodal embeddings without generating explicit rationales.

Beyond Chain-of-Thought: Rewrite as a Universal Interface for Generative Multimodal Embeddings

cs.CV · 2026-04-24 · unverdicted · novelty 6.0

Rewrite-driven generation with alignment and RL produces shorter, more effective generative multimodal embeddings than CoT methods on retrieval benchmarks.

Enhancing Multimodal In-Context Learning via Inductive-Deductive Reasoning

cs.CV · 2026-05-04 · unverdicted · novelty 5.0

A framework with similarity-based visual token compression, dynamic attention rebalancing, and explicit inductive-deductive chain-of-thought improves multimodal ICL performance across eight benchmarks for open-source VLMs.

Combating Visual Neglect and Semantic Drift in Large Multimodal Models for Enhanced Cross-Modal Retrieval

cs.CV · 2026-04-28 · unverdicted · novelty 5.0

SSA-ME uses saliency-aware modeling to reduce visual neglect and semantic drift, achieving SOTA results on the MMEB benchmark for multimodal retrieval.

citing papers explorer

Showing 5 of 5 citing papers.

mEOL: Training-Free Instruction-Guided Multimodal Embedder for Vector Graphics and Image Retrieval cs.CV · 2026-04-18 · unverdicted · none · ref 22
mEOL creates aligned embeddings for text, images, and SVGs using instruction-guided MLLM one-word summaries and semantic SVG rewriting, outperforming baselines on a new text-to-SVG retrieval benchmark.
PLUME: Latent Reasoning Based Universal Multimodal Embedding cs.CV · 2026-04-02 · unverdicted · none · ref 23
PLUME uses latent-state autoregressive rollouts and a progressive training curriculum to deliver efficient reasoning for universal multimodal embeddings without generating explicit rationales.
Beyond Chain-of-Thought: Rewrite as a Universal Interface for Generative Multimodal Embeddings cs.CV · 2026-04-24 · unverdicted · none · ref 20
Rewrite-driven generation with alignment and RL produces shorter, more effective generative multimodal embeddings than CoT methods on retrieval benchmarks.
Enhancing Multimodal In-Context Learning via Inductive-Deductive Reasoning cs.CV · 2026-05-04 · unverdicted · none · ref 19
A framework with similarity-based visual token compression, dynamic attention rebalancing, and explicit inductive-deductive chain-of-thought improves multimodal ICL performance across eight benchmarks for open-source VLMs.
Combating Visual Neglect and Semantic Drift in Large Multimodal Models for Enhanced Cross-Modal Retrieval cs.CV · 2026-04-28 · unverdicted · none · ref 19
SSA-ME uses saliency-aware modeling to reduce visual neglect and semantic drift, achieving SOTA results on the MMEB benchmark for multimodal retrieval.

Llave: Large language and vision embedding models with hardness-weighted contrastive learning

fields

years

verdicts

representative citing papers

citing papers explorer