Automated Semantic Fault Localization in SysML v2: A Human-in-the-Loop Framework Using Knowledge-Graph Augmented LLMs
Pith reviewed 2026-06-26 07:32 UTC · model grok-4.3
The pith
A knowledge graph and fine-tuned small language model localize semantic faults in SysML v2 models and suggest repairs as unified diff patches at over 91 percent success.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The framework combines a knowledge graph encoding physical compatibility rules with fine-tuned small language models to automatically localize semantic faults in SysML v2 models and suggest repairs as unified diff patches. The graph generates synthetic training data by introducing plausible violations and augments inference to ensure suggestions respect domain constraints. Evaluation shows fine-tuning boosts repair success from less than 3% to more than 91% on 1,184 samples in the vehicle systems domain.
What carries the argument
Knowledge-graph-augmented fine-tuned small language model that outputs unified diff patches for semantic fault localization and repair.
If this is right
- Semantic violations that survive compiler checks can be caught and presented as candidate patches before they reach integration testing.
- Patch-based output reduces the length of model suggestions by more than 60 percent compared with full rewritten models.
- The human engineer retains final judgment because the system produces reviewable diffs rather than autonomous edits.
- The same knowledge-graph approach can in principle be rebuilt for other SysML v2 domains once their interface rules are encoded.
Where Pith is reading between the lines
- If the graph can be maintained as designs evolve, the method could serve as a continuously updated guardrail inside existing MBSE toolchains.
- Early localization of interface mismatches might shorten the feedback loop between modeling and physical prototyping in complex systems.
- The synthetic-data generation step could be reused to stress-test other verification tools that currently rely only on syntactic rules.
Load-bearing premise
The knowledge graph fully and accurately encodes the physical compatibility rules and the synthetic violations it generates match the semantic errors engineers actually make.
What would settle it
Apply the trained model to a collection of real SysML v2 vehicle models that contain documented semantic faults introduced by practicing engineers and measure whether repair success stays above 50 percent.
Figures
read the original abstract
SysML v2's textual syntax enables compiler-based validation of model structure and language conformance. However, semantic mistakes that preserve syntactic validity but violate domain rules cannot be detected through compilers. These errors can propagate through the design process and surface late as costly integration failures. This paper presents a human-in-the-loop framework for identifying and repairing such errors automatically. It combines a fine-tuned Small Language Model (SLM) with a domain knowledge graph encoding physical compatibility rules between system elements. The knowledge graph also guides the generation of synthetic training data by systematically introducing plausible domain violations, and augments the model at inference time to ground repair suggestions in valid engineering constraints. We demonstrate the framework using the vehicle systems domain, where the knowledge graph captures the relationships between the mechanical, electrical, fluid, and signal interfaces. Two SLMs, Qwen2.5-Coder-1.5B and DeepSeek-Coder-6.7B, are fine-tuned to output unified diff patches that localize faults and present candidate repairs for engineer review, preserving human judgment in the design process. Evaluation of 1,184 test samples shows that fine-tuning improves semantic fault repair from less than 3% to more than 91%, with patch-based output reducing token length by over 60%. The framework offers a practical path toward AI-assisted model verification that complements existing MBSE tools.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a human-in-the-loop framework for semantic fault localization and repair in SysML v2 models. It uses a domain knowledge graph encoding physical compatibility rules for vehicle systems to generate synthetic training data with injected violations and to augment a fine-tuned SLM (Qwen2.5-Coder-1.5B or DeepSeek-Coder-6.7B) at inference. The model outputs unified diff patches for engineer review. On 1,184 test samples, fine-tuning raises repair success from <3% to >91% while cutting token length by >60%.
Significance. If the evaluation generalizes, the work would offer a practical complement to syntactic compiler checks in MBSE by addressing domain-rule violations early in design. Credit is due for the concrete before/after metrics on a sizable test set, the patch-based output format that preserves human oversight, and the dual use of the KG for data generation and grounding. However, the synthetic-only evaluation limits immediate claims about real-world utility.
major comments (3)
- [Abstract] Abstract: The headline result (repair rate rising from <3% to >91% on 1,184 samples) rests entirely on test cases created by the same KG-driven violation injection process used to generate training data. No description is given of how the test split was constructed to avoid leakage, nor of any held-out set of real engineer-introduced faults or expert validation that the synthetic distribution matches actual SysML v2 modeling errors.
- [Abstract] Abstract: The <3% baseline is not defined (zero-shot LLM? rule-based checker? other MBSE tool?). Without an explicit comparison to existing semantic-analysis or fault-localization methods for SysML or MBSE, the magnitude of the reported improvement cannot be assessed.
- [Abstract] Abstract: The framework's claim that the KG 'fully and accurately encodes the physical compatibility rules' and that repairs are 'grounded in valid engineering constraints' is load-bearing, yet no completeness, consistency, or expert-validation study of the KG is reported.
minor comments (1)
- [Abstract] The abstract would benefit from a one-sentence definition of the 'semantic fault repair' success metric (exact match to ground-truth patch? semantic equivalence? engineer acceptance rate?).
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below and indicate where revisions will be made to improve clarity and completeness.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline result (repair rate rising from <3% to >91% on 1,184 samples) rests entirely on test cases created by the same KG-driven violation injection process used to generate training data. No description is given of how the test split was constructed to avoid leakage, nor of any held-out set of real engineer-introduced faults or expert validation that the synthetic distribution matches actual SysML v2 modeling errors.
Authors: The evaluation uses synthetic data generated via the KG violation injection process for both training and testing. The 1,184 test samples were produced with distinct random seeds and no shared violation instances from the training set; the split was performed at the sample level post-generation to reduce leakage risk. We agree that the lack of real engineer-introduced faults and expert validation of the synthetic distribution is a limitation of the current study. We will revise the manuscript to explicitly detail the data generation and splitting procedure in the Evaluation section and add a limitations discussion on the synthetic-only nature of the dataset. revision: yes
-
Referee: [Abstract] Abstract: The <3% baseline is not defined (zero-shot LLM? rule-based checker? other MBSE tool?). Without an explicit comparison to existing semantic-analysis or fault-localization methods for SysML or MBSE, the magnitude of the reported improvement cannot be assessed.
Authors: The <3% figure represents the zero-shot performance of the base SLMs (Qwen2.5-Coder-1.5B and DeepSeek-Coder-6.7B) without fine-tuning or KG augmentation. We will revise the abstract and Evaluation section to define this baseline explicitly. Regarding comparisons to other MBSE semantic analysis tools, the manuscript focuses on the novel KG-augmented fine-tuning approach for SysML v2; we will expand the related work section to discuss why direct empirical comparisons were not feasible at this stage due to the absence of comparable public implementations for semantic fault localization in this domain. revision: yes
-
Referee: [Abstract] Abstract: The framework's claim that the KG 'fully and accurately encodes the physical compatibility rules' and that repairs are 'grounded in valid engineering constraints' is load-bearing, yet no completeness, consistency, or expert-validation study of the KG is reported.
Authors: The KG was constructed based on domain knowledge of vehicle systems interfaces (mechanical, electrical, fluid, signal) drawn from engineering standards and expert input. We acknowledge that the current manuscript does not include a formal completeness, consistency, or external expert validation study of the KG. We will revise the manuscript to add a dedicated subsection on KG construction, including the rule sources and any internal checks performed, and note the need for broader validation as future work. revision: partial
Circularity Check
No significant circularity; empirical evaluation on held-out synthetic samples is self-contained.
full rationale
The paper's central claim is an empirical result: fine-tuning lifts semantic fault repair from <3% to >91% on 1,184 test samples. No derivation chain, equations, or self-referential definitions are present. Training and test data are both generated from the same knowledge graph, but this is standard supervised learning on held-out synthetic data rather than a reduction by construction (no fitted parameter renamed as prediction, no self-citation load-bearing the result, no ansatz smuggled in). The framework is evaluated against its own generated distribution, which is externally falsifiable via real engineer faults, satisfying the independence criteria. No load-bearing step reduces to its inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The domain knowledge graph correctly and comprehensively represents physical compatibility rules between system elements.
Reference graph
Works this paper leans on
-
[1]
Jin, Dongming and Jin, Zhi and Li, Linyu and Fang, Zheng and Li, Jia and Chen, Xiaohong , journal=
-
[2]
Li, Zirui and Husung, Stephan and Wang, Haoze , journal=
-
[3]
INCOSE International Symposium , volume=
Rafique, Khushnood Adil and Shah, Sanan and Dalecke,. INCOSE International Symposium , volume=. 2025 , organization=
2025
-
[4]
International Conference on Practical Applications of Agents and Multi-Agent Systems , pages=
Bouamra, Yasmine and Yun, Bruno and Poisson, Alexandre and Armetta, Fr. International Conference on Practical Applications of Agents and Multi-Agent Systems , pages=. 2025 , organization=
2025
-
[5]
Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jiaxi and Liu, Dayiheng and Zhang, Lei and Liu, Tianyu and Zhang, Jiajun and Yu, Bowen and Lu, Keming and others , journal=
-
[6]
Daya Guo and Qihao Zhu and Dejian Yang and Zhenda Xie and Kai Dong and Wentao Zhang and Guanting Chen and Xiao Bi and Y. Wu and Y. K. Li and Fuli Luo and Yingfei Xiong and Wenfeng Liang , year=. 2401.14196 , archivePrefix=
-
[7]
Computers in Industry , volume=
Cibri. Computers in Industry , volume=. 2025 , publisher=
2025
-
[8]
DeHart, John K. , title =. INCOSE International Symposium , volume =. doi:https://doi.org/10.1002/iis2.13262 , url =. https://incose.onlinelibrary.wiley.com/doi/pdf/10.1002/iis2.13262 , year =
-
[9]
Qualis, Richard , year =
-
[10]
2024 , publisher =
GitHub repository , howpublished =. 2024 , publisher =
2024
-
[11]
Chunqiu Steven Xia and Yinlin Deng and Soren Dunn and Lingming Zhang , year=. 2407.01489 , archivePrefix=
-
[12]
Pan and Shuyi Yang and Lakshya A
Mert Cemri and Melissa Z. Pan and Shuyi Yang and Lakshya A. Agrawal and Bhavya Chopra and Rishabh Tiwari and Kurt Keutzer and Aditya Parameswaran and Dan Klein and Kannan Ramchandran and Matei Zaharia and Joseph E. Gonzalez and Ion Stoica , year=. 2503.13657 , archivePrefix=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.