Recognition: unknown
In Data or Invisible: Toward a Better Digital Representation of Low-Resource Languages with Knowledge Graphs
Pith reviewed 2026-05-08 10:54 UTC · model grok-4.3
The pith
Linguistic proximity and analogical reasoning can improve multilingual knowledge graph completion for low-resource languages.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Analyzing language distribution across major LOD knowledge graphs reveals underrepresentation of low-resource languages, and cross-lingual transfer strategies leveraging linguistic proximity and analogical reasoning can be used to complete these graphs and increase language coverage.
What carries the argument
Cross-lingual transfer candidate selection using linguistic proximity, curated alignments, and analogical reasoning to identify correspondences between languages for knowledge graph completion.
Load-bearing premise
Linguistic proximity and analogical reasoning will lead to better cross-lingual transfers that improve knowledge graph completion and language coverage in LOD.
What would settle it
If the planned experiments demonstrate that proximity-based and analogy-based candidate selection does not yield higher completion accuracy or broader language coverage compared to standard methods, the proposed benefits would not hold.
Figures
read the original abstract
Emerging digital technologies are exacerbating the existing divide in Open Access Data (OAD) between high-and low-resource languages, excluding many communities from participating in the global digital transformation. In this PhD proposal, we aim to address this gap, focusing on the language coverage of Linked Open Data knowledge graphs (LOD KGs). First, we identify key variables that characterize language distribution in LOD, including the number of Wikipedia articles per language edition and the number of language-tagged entities in LOD KGs. These variables are analyzed across three major multilingual LOD KGs, DBpedia, BabelNet, and Wikidata, providing insights into the representation and distribution of languages within LOD. Building on this analysis, we intend to study the impact of cross-lingual transfer candidate selection on the task of multilingual KG completion. In particular, we plan to investigate strategies based on linguistic proximity and the availability of curated annotated alignments between languages. Language proximity also motivates us to explore the benefits of analogical reasoning that relies on (dis)similarities and has not yet been investigated to identify correspondences across languages to improve KG completion performance and enhance language coverage in LOD.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This manuscript is a PhD proposal that seeks to improve the representation of low-resource languages in Linked Open Data (LOD) knowledge graphs. It begins by identifying key variables characterizing language distribution, such as the number of Wikipedia articles per language and language-tagged entities, and states that these variables are analyzed across DBpedia, BabelNet, and Wikidata. The proposal then outlines plans to investigate the impact of cross-lingual transfer candidate selection on multilingual KG completion, with strategies based on linguistic proximity, curated annotated alignments, and analogical reasoning relying on (dis)similarities to identify correspondences across languages and enhance language coverage in LOD.
Significance. If the planned experiments are executed and confirm measurable improvements, the work could advance methods for multilingual KG completion and help mitigate the digital divide for low-resource languages in LOD. The emphasis on analogical reasoning based on linguistic (dis)similarities represents a novel angle not yet investigated in this context, and the structured identification of variables for language distribution provides a clear foundation. The proposal's value is prospective, as it identifies research gaps and a logical sequence of steps without presenting any completed analysis or results.
major comments (1)
- Abstract: The statement that the identified variables 'are analyzed across three major multilingual LOD KGs, DBpedia, BabelNet, and Wikidata, providing insights into the representation and distribution of languages within LOD' is not accompanied by any data, tables, quantitative findings, or even preliminary results. Since the subsequent plans for cross-lingual transfer explicitly build on this analysis, the absence of these insights is load-bearing for assessing the proposal's foundation and feasibility.
minor comments (2)
- The manuscript would benefit from an initial definition or expansion of the acronym 'LOD' on first use for readers outside the immediate subfield.
- Adding a small number of key references to existing work on cross-lingual KG completion or analogical reasoning in knowledge graphs would better situate the proposed novelty.
Simulated Author's Rebuttal
We thank the referee for the detailed review of our PhD proposal. The single major comment is addressed point-by-point below. We agree that the abstract wording requires clarification to accurately reflect the prospective nature of the work.
read point-by-point responses
-
Referee: [—] Abstract: The statement that the identified variables 'are analyzed across three major multilingual LOD KGs, DBpedia, BabelNet, and Wikidata, providing insights into the representation and distribution of languages within LOD' is not accompanied by any data, tables, quantitative findings, or even preliminary results. Since the subsequent plans for cross-lingual transfer explicitly build on this analysis, the absence of these insights is load-bearing for assessing the proposal's foundation and feasibility.
Authors: We acknowledge that the abstract's use of present tense ('are analyzed... providing insights') can be read as implying completed analysis and results, which is not the case. This is a PhD proposal that outlines a research plan; the identification and analysis of language-distribution variables across DBpedia, BabelNet, and Wikidata constitute the first stage of the proposed work, with no empirical findings yet available. The subsequent cross-lingual transfer experiments are explicitly conditioned on completing that analysis. To resolve the ambiguity, we will revise the abstract to future tense (e.g., 'will be analyzed... to provide insights') and add a brief sentence clarifying the sequential structure of the proposal. This change directly addresses the concern that the foundation and feasibility cannot be assessed without the insights, while preserving the logical flow from distribution analysis to cross-lingual methods. revision: yes
Circularity Check
No significant circularity in this PhD proposal
full rationale
This document is a PhD proposal that identifies variables characterizing language distribution in LOD KGs (Wikipedia articles per language, language-tagged entities) and analyzes them across DBpedia, BabelNet, and Wikidata. It then outlines intended future investigations into cross-lingual transfer candidate selection, linguistic proximity, curated alignments, and analogical reasoning for multilingual KG completion. No equations, derivations, fitted parameters, predictions, completed experiments, or quantitative results are present. No self-citations, ansatzes, or uniqueness claims are invoked in a load-bearing manner for any result, as the text consists entirely of planned work without any claimed outputs or reductions to inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Low-resource languages are underrepresented in LOD KGs such as DBpedia, BabelNet, and Wikidata
Reference graph
Works this paper leans on
-
[1]
findings-acl.15/
ODDA: An OODA-driven diverse data augmentation framework for low- resource relation extraction - ACL anthology,https://aclanthology.org/2025. findings-acl.15/
2025
-
[2]
Pivoted low resource multilingual translation with NER optimization | ACM transactions on asian and low-resource language information processing,https: //dl.acm.org/doi/10.1145/3727876
-
[3]
Massively multilingual neural machine translation
Aharoni, R., Johnson, M., Firat, O.: Massively multilingual neural machine trans- lation. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 3874–3884. Association for Computatio...
-
[4]
Engineering Applications of Artificial Intelligence 139, 109660 (2025)
Akhtar, M.U., Liu, J., Xie, Z., Cui, X., Liu, X., Huang, B.: Multilingual en- tity alignment by abductive knowledge reasoning on multiple knowledge graphs. Engineering Applications of Artificial Intelligence 139, 109660 (2025). https: //doi.org/https://doi.org/10.1016/j.engappai.2024.109660, https://www. sciencedirect.com/science/article/pii/S0952197624018189
-
[5]
Ali, M., Berrendorf, M., Hoyt, C.T., Vermue, L., Galkin, M., Sharifzadeh, S., Fis- cher, A., Tresp, V., Lehmann, J.: Bringing light into the dark: A large-scale eval- uation of knowledge graph embedding models under a unified framework. IEEE Trans. Pattern Anal. Mach. Intell.44(12), 8825–8845 (2022).https://doi.org/ 10.1109/TPAMI.2021.3124805
-
[6]
ArXivabs/2111.05147 (2021), https://api.semanticscholar.org/CorpusID:243860875
Alsaidi, S., Decker, A., Marquer, E., Murena, P.A., Couceiro, M.: Tackling morpho- logical analogies using deep learning - extended version. ArXivabs/2111.05147 (2021), https://api.semanticscholar.org/CorpusID:243860875
-
[7]
In: Calzolari, N., Kan, M.Y., Hoste, V., Lenci, A., Sakti, S., Xue, N
Anil, A., Gutierrez-Basulto, V., Ibanez-Garcia, Y., Schockaert, S.: Inductive knowl- edge graph completion with GNNs and rules: An analysis. In: Calzolari, N., Kan, M.Y., Hoste, V., Lenci, A., Sakti, S., Xue, N. (eds.) Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation(LREC-COLING2024).pp...
2024
-
[9]
Chen, X., Chen, M., Fan, C., Uppunda, A., Sun, Y., Zaniolo, C.: Multilingual knowledge graph completion via ensemble knowledge transfer. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 3227–3238. Association for Computationa...
-
[11]
Dong, B., Bu, C., Wang, Y., Zhu, Y., Wu, X.: Disentangled multi-view graph neural network for multilingual knowledge graph completion. Appl. Soft Comput.183, 113605 (2025). https://doi.org/10.1016/J.ASOC.2025.113605
-
[12]
Haleem, A., Javaid, M., Qadri, M.A., Suman, R.: Understanding the role of digi- tal technologies in education: A review3, 275–285.https://doi.org/10.1016/j. susoc.2022.05.004
work page doi:10.1016/j 2022
-
[13]
In: Han, J., Kamber, M., Pei, J
Han, J., Kamber, M., Pei, J.: 10 - cluster analysis: Basic concepts and methods. In: Han, J., Kamber, M., Pei, J. (eds.) Data Mining: Concepts and Techniques (Third Edition), pp. 443–495. The Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann, Boston, third edition edn. (2012).https://doi.org/ 10.1016/B978-0-12-381479-1.00010-1
-
[14]
Helm, P., Bella, G., Koch, G., Giunchiglia, F.: Diversity and language technology: How techno-linguistic bias can cause epistemic injustice. https://doi.org/10. 48550/arXiv.2307.13714
-
[15]
Synthesis Lectures on Data, Semantics, and Knowledge, Morgan & Claypool Publishers (2021)
Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., de Melo, G., Gutierrez, C., Kirrane, S., Gayo, J.E.L., Navigli, R., Neumaier, S., Ngomo, A.N., Polleres, A., Rashid, S.M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., Zim- mermann, A.: Knowledge Graphs. Synthesis Lectures on Data, Semantics, and Knowledge, Morgan & Claypool Publishers (2021). http...
2021
-
[16]
Huang, Z., Li, Z., Jiang, H., Cao, T., Lu, H., Yin, B., Subbian, K., Sun, Y., Wang, W.: Multilingual knowledge graph completion with self-supervised adaptive graph alignment. CoRR abs/2203.14987 (2022). https://doi.org/10.48550/ARXIV. 2203.14987
work page internal anchor Pith review doi:10.48550/arxiv 2022
-
[17]
In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
Jarnac, L., Couceiro, M., Monnin, P.: Relevant entity selection: Knowledge graph bootstrapping via zero-shot analogical pruning. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. p. 934–944. CIKM ’23, Association for Computing Machinery, New York, NY, USA (2023). https://doi.org/10.1145/3583780.3615030
-
[18]
In: Rambow, O., Wanner, L., Apidianaki, M., Al-Khalifa, H., Eugenio, B.D., Schockaert, S
Jin,Z.,Zhang,C.,Hu,Z.,Yu,J.,Ma,R.,Chen,Q.,Liao,X.,Zhang,Y.:Cycleoie:A low-resource training framework for open information extraction. In: Rambow, O., Wanner, L., Apidianaki, M., Al-Khalifa, H., Eugenio, B.D., Schockaert, S. (eds.) Proceedings of the 31st International Conference on Computational Linguistics, COLING 2025, Abu Dhabi, UAE, January 19-24, 20...
2025
-
[19]
Joshi, P., Santy, S., Budhiraja, A., Bali, K., Choudhury, M.: The state and fate of linguistic diversity and inclusion in the NLP world. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. pp. 6282–6293. Association for ...
-
[20]
Kolluru, K., Mohammed, M., Mittal, S., Chakrabarti, S., ., M.: Alignment- augmented consistent translation for multilingual open information extraction. In: Proceedings of the 60th Annual Meeting of the Association for Computational Lin- guistics (Volume 1: Long Papers). pp. 2502–2517. Association for Computational Linguistics. https://doi.org/10.18653/v1...
-
[21]
Journal of Digi- tal Economy 3, 240–248 (2024)
Li, H., Li, Q., Xu, Z., Ye, X.: Digital technologies. Journal of Digi- tal Economy 3, 240–248 (2024). https://doi.org/https://doi.org/10.1016/ j.jdec.2025.02.001, https://www.sciencedirect.com/science/article/pii/ S2773067025000032 Toward a Better Representation of Low-Resource Languages with KGs 13
2024
-
[22]
In: European Conference on Symbolic and Quantitative Ap- proaches to Reasoning and Uncertainty (2019),https://api.semanticscholar
Lim, S., Prade, H., Richard, G.: Solving word analogies: A machine learn- ing perspective. In: European Conference on Symbolic and Quantitative Ap- proaches to Reasoning and Uncertainty (2019),https://api.semanticscholar. org/CorpusID:202763532
2019
-
[23]
In: Chiruzzo, L., Ritter, A., Wang, L
Liu, X., Hu, C., Zhang, R., Chen, J., Xu, B.: Improving data annotation for low- resource relation extraction with logical rule-augmented collaborative language models. In: Chiruzzo, L., Ritter, A., Wang, L. (eds.) Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Com- putational Linguistics: Human Language T...
2025
-
[24]
Loo, R.T.J., Nasta, F., Macchi, M., Baudot, A., Burstein, F., Bove, R., Greve, M., Fröhlich, H., Khalid, S., Küderle, A., Moore, S.L., Storms, V., Torous, J., Glaab, E.: Recommendations for successful development and implementation of digital health technology tools27, e56747. https://doi.org/10.2196/56747
-
[25]
Graph entropy guided node embedding dimension selection for graph neural networks
Luo, G., Li, J., Peng, H., Yang, C., Sun, L., Yu, P.S., He, L.: Graph en- tropy guided node embedding dimension selection for graph neural networks. ArXiv abs/2105.03178 (2021), https://api.semanticscholar.org/CorpusID: 234093064
-
[26]
https://doi.org/https://doi.org/10.1016/ j.techfore.2021.121359
Lythreatis, S., Singh, S.K., El-Kassar, A.N.: The digital divide: A review and fu- ture research agenda175, 121359. https://doi.org/https://doi.org/10.1016/ j.techfore.2021.121359
-
[27]
Mao, C., Gao, X., Song, R., He, S., Gao, S., Liu, K., Yu, Z.: Multilingual knowledge graph completion via efficient multilingual knowledge sharing. CoRR abs/2510.07736 (2025). https://doi.org/10.48550/ARXIV.2510.07736
-
[28]
Marquer, E., Couceiro, M.: Solving morphological analogies: from retrieval to generation: Solving morphological analogies: from retrieval to generation. An- nals of Mathematics and Artificial Intelligence93(2), 263–298 (Jun 2024).https: //doi.org/10.1007/s10472-024-09945-7
-
[29]
Mohammadshahi, A., Vamvas, J., Sennrich, R.: Investigating multi-pivot ensem- bling with massively multilingual machine translation models.https://doi.org/ 10.48550/arXiv.2311.07439
-
[30]
In: Annual Meeting of the Association for Computational Linguistics (2023), https://api.semanticscholar.org/CorpusID:258865272
Naous, T., Ryan, M.J., Xu, W.: Having beer after prayer? measuring cultural bias in large language models. In: Annual Meeting of the Association for Computational Linguistics (2023), https://api.semanticscholar.org/CorpusID:258865272
2023
-
[31]
Nigatu, H.H., Tonja, A.L., Rosman, B., Solorio, T., Choudhury, M.: The zeno’s paradoxof’low-resource’languages.In:Al-Onaizan,Y.,Bansal,M.,Chen,Y.(eds.) Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024, Miami, FL, USA, November 12-16, 2024. pp. 17753– 17774. Association for Computational Linguistics (2024)....
2024
-
[33]
Semantic Web 8(3), 489–508 (2017).https://doi.org/10.3233/ SW-160218 14 N.-E
Paulheim, H.: Knowledge graph refinement: A survey of approaches and evalua- tion methods. Semantic Web 8(3), 489–508 (2017).https://doi.org/10.3233/ SW-160218 14 N.-E. Mbengue
2017
-
[34]
Peng, B., Zhu, Y., Liu, Y., Bo, X., Shi, H., Hong, C., Zhang, Y., Tang, S.: Graph retrieval-augmented generation: A survey. ACM Trans. Inf. Syst.44(2) (Dec 2025). https://doi.org/10.1145/3777378, https://doi.org/10.1145/3777378
-
[35]
Artificial Intelligence Review pp
Peng, C., Xia, F., Naseriparsa, M., Osborne, F.: Knowledge graphs: Opportunities and challenges. Artificial Intelligence Review pp. 1 – 32 (2023),https://api. semanticscholar.org/CorpusID:257757244
2023
-
[36]
ArXiv abs/2511.03610 (2025), https://api.semanticscholar.org/CorpusID: 282758083
Ringwald, C., Gandon, F.L., Faron-Zucker, C., Michel, F., Akl, H.A.: A sys- tematic review of relation extraction task since the emergence of transformers. ArXiv abs/2511.03610 (2025), https://api.semanticscholar.org/CorpusID: 282758083
-
[37]
Singh, B., Kandru, P., Sharma, A., Varma, V.: Massively multilingual language models for cross lingual fact extraction from low resource indian languages. CoRR abs/2302.04790 (2023). https://doi.org/10.48550/ARXIV.2302.04790
-
[38]
In: Findings of the Association for Computational Linguistics: ACL 2023
Song, R., He, S., Gao, S., Cai, L., Liu, K., Yu, Z., Zhao, J.: Multilingual knowledge graph completion from pretrained language models with knowledge constraints. In: Findings of the Association for Computational Linguistics: ACL 2023. pp. 7709–
2023
-
[41]
Appworld: A controllable world of apps and people for benchmarking interactive coding agents
Soto Martinez, W., Parmentier, Y., Gardent, C.: Phylogeny-inspired soft prompts for data-to-text generation in low-resource languages. In: Park, J.C., Arase, Y., Hu, B., Lu, W., Wijaya, D., Purwarianti, A., Krisnadhi, A.A. (eds.) Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacifi...
-
[42]
Tang, R., Zhao, Y., Zong, C., Zhou, Y.: Multilingual knowledge graph completion with language-sensitive multi-graph attention. In: Proceedings of the 61st Annual MeetingoftheAssociationforComputationalLinguistics(Volume1:LongPapers). pp. 10508–10519. Association for Computational Linguistics.https://doi.org/ 10.18653/v1/2023.acl-long.586
-
[43]
Tong, V., Nguyen, D.Q., Huynh, T.T., Nguyen, T.T., Nguyen, Q.V.H., Niepert, M.: Joint multilingual knowledge graph completion and alignment. In: Findings of the Association for Computational Linguistics: EMNLP 2022. pp. 4646–4658. Association for Computational Linguistics.https://doi.org/10.18653/v1/2022. findings-emnlp.341
-
[44]
In: Artificial Intelligence, Soft Computing and Applications
Vijayan, K., Anand, O.: Language-agnostic text processing for information ex- traction. In: Artificial Intelligence, Soft Computing and Applications. pp. 129–
-
[45]
https: //doi.org/10.5121/csit.2022.122310
Academy and Industry Research Collaboration Center (AIRCC). https: //doi.org/10.5121/csit.2022.122310
-
[46]
Viksna, R., Skadina, I., Skadins, R., Vasiljevs, A., Rozis, R.: Assessing multilin- guality of publicly accessible websites. In: International Conference on Language Resources and Evaluation (2022),https://api.semanticscholar.org/CorpusID: 250164111 Toward a Better Representation of Low-Resource Languages with KGs 15
2022
-
[47]
https://doi.org/10.1016/j.dim.2025
Wang, D., Anwar, M., Tang, R.: Open access data policies and technologies: In- troduction to special issue9(1), 100093. https://doi.org/10.1016/j.dim.2025. 100093
-
[48]
Emerald Publishing Limited (2022)
Wang, D., Richards, D., Bilgin, A., Chen, C.: The development of open government data: connecting supply and demand through portals. Emerald Publishing Limited (2022). https://doi.org/10.1108/9781802623154
-
[49]
2004 International Symposium on Technology and Society (IEEE Cat
Wolk, R.M.: The effects of english language dominance of the internet and the digital divide. 2004 International Symposium on Technology and Society (IEEE Cat. No.04CH37548) pp. 174–178 (2004), https://api.semanticscholar.org/ CorpusID:6241691
2004
-
[50]
https://doi.org/10.1016/j.knosys.2023
Zhang, W., Şerban, O., Sun, J., Guo, Y.: Conflict-aware multilingual knowl- edge graph completion281, 111070. https://doi.org/10.1016/j.knosys.2023. 111070
-
[51]
Zhong, T., Yang, Z., Liu, Z., Zhang, R., Liu, Y., Sun, H., Pan, Y., Li, Y., Zhou, Y., Jiang, H., Chen, J., Liu, T.: Opportunities and challenges of large language models for low-resource languages in humanities research.https://doi.org/10. 48550/arXiv.2412.04497
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.