Order-Embeddings of Images and Language
read the original abstract
Hypernymy, textual entailment, and image captioning can be seen as special cases of a single visual-semantic hierarchy over words, sentences, and images. In this paper we advocate for explicitly modeling the partial order structure of this hierarchy. Towards this goal, we introduce a general method for learning ordered representations, and show how it can be applied to a variety of tasks involving images and language. We show that the resulting representations improve performance over current approaches for hypernym prediction and image-caption retrieval.
This paper has not been read by Pith yet.
Forward citations
Cited by 5 Pith papers
-
A Differentiable Bayesian Relaxation for Latent Partial-Order Inference
The authors replace discontinuous precedence and frontier constraints in a partial-order model with smooth surrogates, producing a continuous posterior that supports gradient MCMC and variational inference while recov...
-
HSG: Hyperbolic Scene Graph
Hyperbolic Scene Graph (HSG) learns embeddings in hyperbolic space for better hierarchical structure in scene graphs, achieving graph IoU of 33.51 versus 25.37 for the best Euclidean baseline.
-
Lorentz Framework for Semantic Segmentation
A Lorentz-model hyperbolic framework for semantic segmentation that integrates with Euclidean networks, provides free uncertainty maps, and is validated on ADE20K, COCO-Stuff, Pascal-VOC and Cityscapes using DeepLabV3...
-
HyNeuralMap: Hyperbolic Mapping of Visual Semantics to Neural Hierarchies
HyNeuralMap applies the hyperbolic Lorentz model to embed visual semantics and neural responses into a shared hierarchical space, outperforming Euclidean baselines on semantic prediction and cross-modal retrieval.
-
Root Mean Square Layer Normalization
RMSNorm delivers re-scaling invariance and comparable accuracy to LayerNorm while cutting computation by skipping mean subtraction, yielding 7-64% runtime reductions across tested models.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.