arxiv: 1511.06361 · v6 · pith:X2HS26AZnew · submitted 2015-11-19 · 💻 cs.LG · cs.CL· cs.CV

Order-Embeddings of Images and Language

Ivan Vendrov , Ryan Kiros , Sanja Fidler , Raquel Urtasun This is my paper

classification 💻 cs.LG cs.CLcs.CV

keywords imageshierarchylanguagerepresentationsadvocateappliedapproachescaptioning

0 comments

read the original abstract

Hypernymy, textual entailment, and image captioning can be seen as special cases of a single visual-semantic hierarchy over words, sentences, and images. In this paper we advocate for explicitly modeling the partial order structure of this hierarchy. Towards this goal, we introduce a general method for learning ordered representations, and show how it can be applied to a variety of tasks involving images and language. We show that the resulting representations improve performance over current approaches for hypernym prediction and image-caption retrieval.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Differentiable Bayesian Relaxation for Latent Partial-Order Inference
stat.ML 2026-05 unverdicted novelty 7.0

The authors replace discontinuous precedence and frontier constraints in a partial-order model with smooth surrogates, producing a continuous posterior that supports gradient MCMC and variational inference while recov...
HSG: Hyperbolic Scene Graph
cs.CV 2026-04 unverdicted novelty 6.0

Hyperbolic Scene Graph (HSG) learns embeddings in hyperbolic space for better hierarchical structure in scene graphs, achieving graph IoU of 33.51 versus 25.37 for the best Euclidean baseline.
Lorentz Framework for Semantic Segmentation
cs.CV 2026-04 unverdicted novelty 6.0

A Lorentz-model hyperbolic framework for semantic segmentation that integrates with Euclidean networks, provides free uncertainty maps, and is validated on ADE20K, COCO-Stuff, Pascal-VOC and Cityscapes using DeepLabV3...
HyNeuralMap: Hyperbolic Mapping of Visual Semantics to Neural Hierarchies
cs.CV 2026-05 unverdicted novelty 5.0

HyNeuralMap applies the hyperbolic Lorentz model to embed visual semantics and neural responses into a shared hierarchical space, outperforming Euclidean baselines on semantic prediction and cross-modal retrieval.
Root Mean Square Layer Normalization
cs.LG 2019-10 conditional novelty 5.0

RMSNorm delivers re-scaling invariance and comparable accuracy to LayerNorm while cutting computation by skipping mean subtraction, yielding 7-64% runtime reductions across tested models.