Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding

Chengkai Li; Wei Hu; Zequn Sun

arxiv: 1708.05045 · v2 · pith:LSM5JMRVnew · submitted 2017-08-16 · 💻 cs.CL · cs.AI· cs.DB

Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding

Zequn Sun , Wei Hu , Chengkai Li This is my paper

classification 💻 cs.CL cs.AIcs.DB

keywords alignmententitycross-lingualembeddingmachinetranslationapproachesattribute-preserving

0 comments

read the original abstract

Entity alignment is the task of finding entities in two knowledge bases (KBs) that represent the same real-world object. When facing KBs in different natural languages, conventional cross-lingual entity alignment methods rely on machine translation to eliminate the language barriers. These approaches often suffer from the uneven quality of translations between languages. While recent embedding-based techniques encode entities and relationships in KBs and do not need machine translation for cross-lingual entity alignment, a significant number of attributes remain largely unexplored. In this paper, we propose a joint attribute-preserving embedding model for cross-lingual entity alignment. It jointly embeds the structures of two KBs into a unified vector space and further refines it by leveraging attribute correlations in the KBs. Our experimental results on real-world datasets show that this approach significantly outperforms the state-of-the-art embedding approaches for cross-lingual entity alignment and could be complemented with methods based on machine translation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

HELEA: Hard-Negative Benchmark and LLM-based Reranking for Robust Entity Alignment
cs.CL 2026-05 unverdicted novelty 6.0

HELEA creates hard-negative benchmarks (DW-HN29K, DY-HN27K) where name-overlap baselines fail and reports F1 0.967 on the new sets while preserving strong standard-benchmark scores via encoder retrieval plus untrained...