FoCo learns composition for zero-shot CIR via text-anchored visual aggregation and context-conditioned semantic completion trained jointly with cross-instance contrastive loss, reporting SOTA on four benchmarks.
Modality- agnostic attention fusion for visual search with text feedback,
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
ZeroSight supplies a video-derived dataset and evaluation protocol for genuine zero-shot composed image retrieval plus the SC4CIR consistency method, demonstrating that prior benchmarks inflate reported performance across 27 tested approaches.
Composed image retrieval is reframed as calibrated intent resolution under uncertainty via conformal prediction sets and expected-information-gain clarification, with new AmbiCIR benchmark showing matched single-turn SOTA and faster multi-turn resolution with valid coverage.
citing papers explorer
-
Learning to Compose: Revisiting Proxy Task Design for Zero-Shot Composed Image Retrieval
FoCo learns composition for zero-shot CIR via text-anchored visual aggregation and context-conditioned semantic completion trained jointly with cross-instance contrastive loss, reporting SOTA on four benchmarks.
-
Never Seen Before: Benchmarking Genuine Zero-Shot Composed Image Retrieval with Consistent Video-Sourced Datasets
ZeroSight supplies a video-derived dataset and evaluation protocol for genuine zero-shot composed image retrieval plus the SC4CIR consistency method, demonstrating that prior benchmarks inflate reported performance across 27 tested approaches.
-
Resolving Ambiguity in Composed Image Retrieval via Calibrated Interaction
Composed image retrieval is reframed as calibrated intent resolution under uncertainty via conformal prediction sets and expected-information-gain clarification, with new AmbiCIR benchmark showing matched single-turn SOTA and faster multi-turn resolution with valid coverage.