Using Large Language Models and Knowledge Graphs to Improve the Interpretability of Machine Learning Models in Manufacturing
Pith reviewed 2026-05-10 08:43 UTC · model grok-4.3
The pith
A knowledge graph linked to machine learning outputs lets large language models generate accurate explanations for manufacturing decisions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that by structuring domain data and ML outputs in a knowledge graph and employing selective retrieval of relevant facts from the graph to inform an LLM, the resulting explanations of ML results achieve high factual accuracy and practical usefulness in manufacturing contexts.
What carries the argument
The selective retrieval of relevant facts from the knowledge graph that conditions the large language model's generation of explanations.
Load-bearing premise
That the combination of selective facts from the knowledge graph and language model output will consistently yield explanations that are correct and valuable for real manufacturing decisions.
What would settle it
A test case in the manufacturing environment where the generated explanation contradicts known domain facts or leads operators to a suboptimal production decision.
Figures
read the original abstract
Explaining Machine Learning (ML) results in a transparent and user-friendly manner remains a challenging task of Explainable Artificial Intelligence (XAI). In this paper, we present a method to enhance the interpretability of ML models by using a Knowledge Graph (KG). We store domain-specific data along with ML results and their corresponding explanations, establishing a structured connection between domain knowledge and ML insights. To make these insights accessible to users, we designed a selective retrieval method in which relevant triplets are extracted from the KG and processed by a Large Language Model (LLM) to generate user-friendly explanations of ML results. We evaluated our method in a manufacturing environment using the XAI Question Bank. Beyond standard questions, we introduce more complex, tailored questions that highlight the strengths of our approach. We evaluated 33 questions, analyzing responses using quantitative metrics such as accuracy and consistency, as well as qualitative ones such as clarity and usefulness. Our contribution is both theoretical and practical: from a theoretical perspective, we present a novel approach for effectively enabling LLMs to dynamically access a KG in order to improve the explainability of ML results. From a practical perspective, we provide empirical evidence showing that such explanations can be successfully applied in real-world manufacturing environments, supporting better decision-making in manufacturing processes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes integrating a Knowledge Graph (KG) storing domain-specific manufacturing data, ML model results, and explanations with a Large Language Model (LLM) via selective triplet retrieval to generate user-friendly explanations of ML outputs. It claims this dynamic KG access improves interpretability over standard approaches, with evaluation on 33 questions (standard and tailored complex ones) from the XAI Question Bank using quantitative metrics (accuracy, consistency) and qualitative metrics (clarity, usefulness), supported by empirical evidence from a real-world manufacturing environment.
Significance. If the central claim holds under controlled evaluation, the work offers a practical bridge between structured domain knowledge and generative explanations, with potential to support better decision-making in manufacturing XAI applications. The real-world deployment and use of an external XAI Question Bank provide a concrete strength in applicability, though the lack of comparative baselines leaves the improvement attributable to KG+LLM integration under-supported.
major comments (2)
- [Evaluation] Evaluation section (as described): The reported results on 33 questions provide absolute scores for accuracy, consistency, clarity, and usefulness but include no baseline arms (e.g., LLM without retrieval, standard XAI methods such as SHAP/LIME, or rule-based KG queries). This prevents isolating the contribution of selective triplet retrieval to any observed gains and weakens support for the claim that the approach improves explainability.
- [Evaluation] Evaluation section: The introduction of 'more complex, tailored questions' beyond the standard XAI Question Bank raises the possibility of post-hoc selection or tailoring; without pre-specification, inter-rater agreement details, or statistical significance testing, it is unclear whether these questions fairly test the method or introduce selection bias into the usefulness and clarity assessments.
minor comments (2)
- [Method] The abstract and method description would benefit from explicit details on KG construction (e.g., how ML results and explanations are encoded as triplets), the exact retrieval algorithm, and the specific LLM employed to enable reproducibility.
- [Evaluation] Clarify whether the 33 questions were evaluated by multiple raters and report any inter-rater reliability metrics to strengthen the qualitative assessments of clarity and usefulness.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and describe the revisions we will incorporate to strengthen the evaluation section.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section (as described): The reported results on 33 questions provide absolute scores for accuracy, consistency, clarity, and usefulness but include no baseline arms (e.g., LLM without retrieval, standard XAI methods such as SHAP/LIME, or rule-based KG queries). This prevents isolating the contribution of selective triplet retrieval to any observed gains and weakens support for the claim that the approach improves explainability.
Authors: We agree that the lack of baseline comparisons limits the strength of claims about the specific benefits of selective triplet retrieval. In the revised manuscript we will add new experiments that include (1) an LLM-only condition without KG retrieval and (2) a rule-based KG query baseline that returns raw triplets without LLM generation. We will report the same quantitative and qualitative metrics for these conditions. Direct comparison with SHAP or LIME is not straightforward because those methods produce feature-importance scores rather than natural-language explanations grounded in manufacturing domain knowledge; we will add a short discussion clarifying this distinction and why a head-to-head numerical comparison would be misleading. revision: yes
-
Referee: [Evaluation] Evaluation section: The introduction of 'more complex, tailored questions' beyond the standard XAI Question Bank raises the possibility of post-hoc selection or tailoring; without pre-specification, inter-rater agreement details, or statistical significance testing, it is unclear whether these questions fairly test the method or introduce selection bias into the usefulness and clarity assessments.
Authors: We acknowledge that the current description does not provide sufficient detail on how the tailored questions were generated or evaluated. In the revision we will (1) list all 33 questions explicitly, (2) describe the criteria used to create the additional manufacturing-specific questions (including that they were formulated before running the evaluation), (3) report inter-rater agreement statistics for the qualitative ratings of clarity and usefulness, and (4) include statistical significance tests comparing the standard and tailored question sets where appropriate. These additions will allow readers to judge whether selection bias is present. revision: yes
Circularity Check
No significant circularity; method and evaluation are independently defined against external benchmarks
full rationale
The paper presents a method for selective triplet retrieval from a KG combined with LLM generation to produce explanations of ML results, then evaluates the outputs on 33 questions drawn from the external XAI Question Bank using accuracy, consistency, clarity, and usefulness metrics. No equations, fitted parameters, or first-principles derivations appear in the described chain. The method is specified independently of its own evaluation outcomes, and the benchmark is external rather than self-generated. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claim therefore does not reduce to its inputs by construction and remains self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The XAI Question Bank supplies a representative and sufficient set of questions for assessing explanation quality in a manufacturing context.
Reference graph
Works this paper leans on
-
[1]
IEEE access : practical innovations, open solutions6, 52138–52160 (2018)
Adadi, A., Berrada, M.: Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE access : practical innovations, open solutions6, 52138–52160 (2018)
2018
-
[2]
In: Findings of the Association for Computational Linguistics: NAACL 2024
Agarwal, S., Menon, R., Singh, S., Gardner, M., Khashabi, D.: Bring your own KG: Self-supervised program synthesis for zero-shot KGQA. In: Findings of the Association for Computational Linguistics: NAACL 2024. pp. 837–859 (2024)
2024
-
[3]
Information Fusion58, 82–115 (2020)
Arrieta, B.A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., Herrera, F.: Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion58, 82–115 (2020)
2020
-
[4]
Intelligent Systems with Applications26, 200501 (2025)
Benhanifia, A., Cheikh, Z.B., Oliveira, P.M., Valente, A., Lima, J.: Systematic review of predictive maintenance practices in the manufacturing sector. Intelligent Systems with Applications26, 200501 (2025)
2025
-
[5]
In: The Semantic Web – ESWC 2024
Dasoulas, I., et al.: MLSea: A semantic layer for discoverable machine learning. In: The Semantic Web – ESWC 2024. Lecture Notes in Computer Science, Springer (2024)
2024
-
[6]
Doshi-Velez,F.,Kim,B.:Towardsarigorousscienceofinterpretablemachinelearn- ing (2017)
2017
-
[7]
Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., Metropolitansky, D., Ness, R.O., Larson, J.: From local to global: A graph RAG approach to query-focused summarization (2025)
2025
-
[8]
International Journal of Machine Learning12(4), 200–220 (2023)
Garcia,L.,Sanchez,M.:Theroleofontologiesindataminingandmachinelearning. International Journal of Machine Learning12(4), 200–220 (2023)
2023
-
[9]
Industry 4.0 Science41(2), 30–36 (2023)
Höpken,W., Stetter,R., Pfeil,M., Bayer,T.,Michelberger, B.,Schuchter, T.,Lohr, A.: Digital twins using semantic modeling and ai. Industry 4.0 Science41(2), 30–36 (2023)
2023
-
[10]
In: Proceedings of the 2nd International Workshop on Semantic Technologies and Deep Learning Models for Scientific
Kaplan, A., Keim, J., Schneider, M., Koziolek, A., Reussner, R.: Combining knowl- edge graphs and large language models to ease knowledge access in software archi- tecture research. In: Proceedings of the 2nd International Workshop on Semantic Technologies and Deep Learning Models for Scientific. CEUR Workshop Proceed- ings, vol. 3697, pp. 76–82 (2024) Im...
2024
-
[11]
Journal of Data Science and Technology8(2), 50–70 (2023)
Khan, F., Lee, C.: Evaluating ML-schema: An empirical study on data mining interoperability. Journal of Data Science and Technology8(2), 50–70 (2023)
2023
-
[12]
Journal of Advances in Information Technology15(10), 1157–1162 (2024)
Lan, M., Xia, Y., Zhou, G., Huang, N., Li, Z., Wu, H.: LLM4QA: Leveraging large language model for efficient knowledge graph reasoning with SPARQL query. Journal of Advances in Information Technology15(10), 1157–1162 (2024)
2024
-
[13]
Journal of Big Data8(3), 3 (2021)
Liang, S., Stockinger, K., Mendes de Farias, T., Anisimova, M., Gil, M.: Querying knowledge graphs in natural language. Journal of Big Data8(3), 3 (2021)
2021
-
[14]
In: Proceedings of the 2020 CHI conference on human factors in computing systems
Liao, Q.V., Gruen, D., Miller, S.: Questioning the ai: informing design practices for explainable ai user experiences. In: Proceedings of the 2020 CHI conference on human factors in computing systems. pp. 1–15 (2020)
2020
-
[15]
In: Proceedings of the International Conference on Data Mining
Nguyen, P., Barrett, D.: Advancing data mining results sharing with ML-schema. In: Proceedings of the International Conference on Data Mining. pp. 300–315 (2023)
2023
-
[16]
OpenAI: OpenAI chat completion API format (2024)
2024
-
[17]
In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Ovadia,O.,Brief,M.,Mishaeli,M.,Elisha,O.:Fine-tuningorretrieval?Comparing knowledge injection in llms. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 237–250 (Jan 2024)
2024
-
[18]
IEEE Transactions on Knowledge and Data Engineering36(7), 3580–3599 (2024)
Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., Wu, X.: Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering36(7), 3580–3599 (2024)
2024
-
[19]
ML-Schema: Exposing the Semantics of Machine Learning with Schemas and Ontologies
Publio, G.C., Esteves, D., Ławrynowicz, A., Panov, P., Soldatova, L.N., Soru, T., Vanschoren, J., Zafar, H.: ML-Schema: Exposing the semantics of machine learning with schemas and ontologies. CoRRabs/1807.05351(2018)
work page Pith review arXiv 2018
-
[20]
IEEE access : practical innovations, open solutions10, 70712–70723 (2022)
Rony, M.R.A.H., Kumar, U., Teucher, R., Kovriguina, L., Lehmann, J.: SGPT: A generative approach for SPARQL query generation from natural language ques- tions. IEEE access : practical innovations, open solutions10, 70712–70723 (2022)
2022
-
[21]
The Knowledge Engineering Review34, e17 (2019)
Sampath Kumar, V.R., Khamis, A., Fiorini, S., Carbonera, J.L., Alarcos, A.O., Habib, M., Goncalves, P., Li, H., Olszewska, J.I.: Ontologies for industry 4.0. The Knowledge Engineering Review34, e17 (2019)
2019
-
[22]
Procedia CIRP136, 61–66 (2025)
Schuchter, T., Saft, P., Stetter, R., Pfeil, M., Höpken, W., Till, M., Rudolph, S.: Application of artificial intelligence in model-based systems engineering of auto- mated production systems. Procedia CIRP136, 61–66 (2025)
2025
-
[23]
IEEE Transactions on Knowledge and Data Engineering35(1), 614–633 (2023)
von Rueden, L., Mayer, S., Beckh, K., Georgiev, B., Giesselbach, S., Heese, R., Kirsch, B., Pfrommer, J., Pick, A., Ramamurthy, R., Walczak, M., Garcke, J., Bauckhage, C., Schuecker, J.: Informed machine learning – a taxonomy and sur- vey of integrating prior knowledge into learning systems. IEEE Transactions on Knowledge and Data Engineering35(1), 614–633 (2023)
2023
-
[24]
Journal of Artificial Intelligence Research10(1), 40–60 (2023)
Wilson, J., Liu, H.: Integration of ML-schema with machine learning platforms. Journal of Artificial Intelligence Research10(1), 40–60 (2023)
2023
-
[25]
In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-
Xiao, G., Calvanese, D., Kontchakov, R., Lembo, D., Poggi, A., Rosati, R., Za- kharyaschev, M.: Ontology-based data access: A survey. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-
-
[26]
5511–5519 (Jul 2018)
pp. 5511–5519 (Jul 2018)
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.