arxiv: 2605.01582 · v1 · submitted 2026-05-02 · 💻 cs.IR · cs.AI

Recognition: unknown

KG-First, LLM-Fallback: A Hybrid Microservice for Grounded Skill Search and Explanation

Ngoc Luyen Le , Marie-H\'el\`ene Abel , Bertrand Laforge

Authors on Pith no claims yet

Pith reviewed 2026-05-09 17:39 UTC · model grok-4.3

classification 💻 cs.IR cs.AI

keywords skill searchknowledge graphLLM fallbackcompetency frameworkshybrid retrievalmicroserviceprovenanceexplanation generation

0 comments

The pith

A hybrid microservice unifies heterogeneous skill frameworks into a provenance-preserving knowledge graph and uses LLMs only for constrained ranking and explanation to deliver high-accuracy retrieval at low latency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SkillGraph-Service to make complex competency frameworks such as ESCO, ROME, and O*NET usable by educators through a single interoperable microservice. It builds a unified knowledge graph that keeps track of data origins and adopts a KG-first strategy that calls on large language models only as a fallback for ranking or audience-specific explanations. A lightweight hybrid engine combining full-text search and vector indexing resolves the vocabulary gaps common in educator queries. Evaluation on a multilingual dataset shows the approach reaches nDCG@5 scores above 0.94 while staying under 200 milliseconds, indicating that heavy cross-encoder re-ranking is not required in this setting. The work further finds that forcing LLMs to output structured JSON improves citation accuracy while deterministic templates achieve better overall evidence coverage.

Core claim

The central claim is that a KG-first, LLM-fallback architecture implemented with a hybrid retrieval engine fusing SQLite FTS5 and HNSW vector search unifies heterogeneous competency frameworks into a provenance-preserving knowledge graph, resolves vocabulary mismatch in educator queries, and delivers nDCG@5 greater than 0.94 at sub-200 ms latency, rendering expensive cross-encoder re-ranking unnecessary while supporting auditable, audience-aware explanations.

What carries the argument

The KG-first, LLM-fallback architecture with a lightweight hybrid retrieval engine that fuses SQLite FTS5 full-text search and HNSW vector search to handle educator queries against the unified provenance-preserving knowledge graph.

If this is right

Educators gain a single fast interface to query and receive traceable explanations from multiple skill frameworks without needing to master their individual technical structures.
Computationally expensive cross-encoder re-ranking can be avoided while still reaching high retrieval quality in skill-search tasks.
JSON-constrained LLMs provide high citation precision in explanations, though deterministic templates maximize evidence coverage when faithfulness is prioritized.
The resulting microservice supports scalable, auditable integration of complex skill data into digital learning ecosystems.
The architecture separates symbolic rigor in the graph layer from sub-symbolic flexibility in the LLM layer, allowing each to be updated independently.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same KG-first pattern could be tested in other domains where non-experts query structured databases, such as medical guidelines or regulatory documents, to see whether the latency and accuracy benefits hold.
Limiting LLM use to constrained ranking and explanation steps may reduce both inference cost and the risk of hallucinated content compared with end-to-end LLM retrieval systems.
The observed fluency-faithfulness trade-off suggests that production deployments might combine both template and constrained-LLM explanation paths and let users or downstream systems choose based on context.
Extending the evaluation to query logs from actual learning platforms would test whether the reported nDCG and latency figures generalize beyond the paper's multilingual dataset.

Load-bearing premise

The load-bearing premise is that heterogeneous sources can be merged into one provenance-preserving knowledge graph without loss of fidelity and that educator queries reliably exhibit vocabulary mismatch the hybrid engine can resolve.

What would settle it

A concrete falsifier would be an evaluation on real educator queries where the hybrid method yields nDCG@5 below 0.9, average latency above 200 ms, or where manual inspection reveals frequent mapping errors or missing provenance links in the constructed knowledge graph.

Figures

Figures reproduced from arXiv: 2605.01582 by Bertrand Laforge, Marie-H\'el\`ene Abel, Ngoc Luyen Le.

**Figure 1.** Figure 1: SkillGraph-Service architecture: KG-first, hybrid retrieval, LLM-supported, minimal REST API. Because some frameworks do not list every prerequisite or sub-skill, the system builds a small, relevant “candidate list” around the target skill (close neighbours in the graph and skills tied to the same jobs). It then orders this list using a few transparent signals – how close items are in the graph, how simil… view at source ↗

**Figure 2.** Figure 2: illustrates the latency distribution. The BM25-only approach is negligible in cost (median 20 ms). The Hybrid system remains highly responsive, making it suitable for interactive autocomplete (median 187.5 ms). However, the Re-rank stage introduces a bottleneck, increasing median latency by an order of magnitude (> 2 s). Given the lack of effectiveness gain, the re-ranking step in its current configuratio… view at source ↗

read the original abstract

Authoritative competency frameworks such as ESCO, ROME, and O*NET are essential for aligning education with labor market needs, yet their technical complexity and structural heterogeneity hinder practical adoption by educators. This paper introduces SkillGraph-Service, an interoperable microservice designed to bridge this gap by unifying these resources into a provenance-preserving Knowledge Graph (KG). Adopting a KG-first, LLM-fallback architecture, the system combines symbolic rigor with sub-symbolic flexibility. It implements a lightweight hybrid retrieval engine (fusing SQLite FTS5 and HNSW vector search) to handle the vocabulary mismatch in educator queries, and utilizes Large Language Models (LLMs) strictly for constrained ranking and audience-aware explanation. Empirical evaluation on a multilingual dataset reveals that the proposed hybrid strategy achieves superior retrieval effectiveness (nDCG@5>0.94) with sub-200 ms latency, rendering computationally expensive cross-encoder re-ranking may be unnecessary for this domain. Furthermore, an analysis of generated explanations highlights a trade-off between fluency and faithfulness: while JSON-constrained LLMs ensure high citation precision, deterministic templates remain the most reliable method for maximizing evidence coverage. The resulting architecture offers a practical, scalable, and auditable solution for integrating complex skill data into digital learning ecosystems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A practical engineering paper on a KG-first hybrid for skill search that reports strong retrieval numbers but leaves the unification step without quantitative checks.

read the letter

The main thing to know is that this paper builds and deploys SkillGraph-Service, a microservice that merges ESCO, ROME, and O*NET into one provenance-preserving knowledge graph, then uses a lightweight hybrid retriever (FTS5 plus HNSW) with LLM fallback only for constrained ranking and explanations. The reported nDCG@5 above 0.94 and sub-200 ms latency are the headline results, and the architecture keeps the LLM on a short leash to preserve auditability.

Referee Report

2 major / 1 minor

Summary. The manuscript presents SkillGraph-Service, a hybrid microservice that unifies authoritative competency frameworks (ESCO, ROME, O*NET) into a provenance-preserving knowledge graph. It adopts a KG-first, LLM-fallback approach, employing a lightweight hybrid retrieval engine combining SQLite FTS5 and HNSW vector search to address vocabulary mismatch in educator queries, with LLMs used only for constrained ranking and audience-aware explanations. On a multilingual dataset, the system achieves nDCG@5 greater than 0.94 with latency under 200 ms, and the authors analyze trade-offs in explanation generation, concluding that deterministic templates maximize evidence coverage while JSON-constrained LLMs ensure citation precision. The architecture is positioned as a practical, scalable, and auditable solution for integrating skill data into digital learning ecosystems.

Significance. If the empirical results prove robust upon detailed scrutiny, this work demonstrates a viable hybrid architecture that combines symbolic knowledge representation with sub-symbolic flexibility for skill search and explanation. It could reduce the need for expensive cross-encoder re-ranking in this domain while providing grounded, auditable outputs, offering significant practical value for aligning education with labor market needs through interoperable competency data.

major comments (2)

[Abstract] The central performance claims of nDCG@5 > 0.94 and sub-200 ms latency lack supporting details on dataset size, baseline comparisons, statistical tests, or the construction of the multilingual evaluation, making it difficult to evaluate the superiority of the hybrid strategy or the assertion that cross-encoder re-ranking may be unnecessary.
[Abstract] The provenance-preserving unification of heterogeneous sources (ESCO, ROME, O*NET) into the KG is a load-bearing precondition for the retrieval effectiveness claims, but no quantitative fidelity metrics (e.g., relation coverage, conflict resolution accuracy, or inter-source consistency) are reported, leaving open the possibility that high nDCG scores arise from construction artifacts rather than the hybrid engine.

minor comments (1)

[Abstract] The phrasing 'rendering computationally expensive cross-encoder re-ranking may be unnecessary' is grammatically awkward and should be revised for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will incorporate revisions to improve clarity and completeness.

read point-by-point responses

Referee: [Abstract] The central performance claims of nDCG@5 > 0.94 and sub-200 ms latency lack supporting details on dataset size, baseline comparisons, statistical tests, or the construction of the multilingual evaluation, making it difficult to evaluate the superiority of the hybrid strategy or the assertion that cross-encoder re-ranking may be unnecessary.

Authors: We acknowledge that the abstract is brief and does not include these specifics. The full manuscript details the evaluation in Section 4, including the multilingual dataset construction, baseline systems (lexical, vector, and cross-encoder), and statistical tests. To make the claims more self-contained, we will revise the abstract to summarize key elements such as dataset scale, primary baselines, and latency results. This will better support the evaluation of the hybrid approach and the efficiency argument relative to cross-encoders. revision: yes
Referee: [Abstract] The provenance-preserving unification of heterogeneous sources (ESCO, ROME, O*NET) into the KG is a load-bearing precondition for the retrieval effectiveness claims, but no quantitative fidelity metrics (e.g., relation coverage, conflict resolution accuracy, or inter-source consistency) are reported, leaving open the possibility that high nDCG scores arise from construction artifacts rather than the hybrid engine.

Authors: This observation is correct; while Section 3 describes the unification process, provenance tracking, and conflict resolution, no quantitative fidelity metrics are provided. We will revise the manuscript to add these metrics (e.g., coverage and consistency statistics from the integration) in a dedicated subsection or table. This addition will help confirm that performance derives from the retrieval engine rather than construction artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system evaluation without derivation chain

full rationale

The paper presents an implemented microservice (SkillGraph-Service) that unifies competency frameworks into a provenance-preserving KG and evaluates a hybrid retrieval engine via direct measurements of nDCG@5 and latency on a held-out multilingual dataset. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the reported results. The performance figures are obtained from running the constructed system rather than reducing to any input by definition or construction. The KG unification step is a preprocessing construction whose fidelity is asserted but not derived; the downstream metrics are independent empirical observations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that heterogeneous competency frameworks can be merged into a single auditable graph without introducing contradictions or loss of source attribution.

axioms (1)

domain assumption Competency frameworks from different sources can be unified into a single provenance-preserving knowledge graph without semantic loss
Invoked in the design of SkillGraph-Service to enable interoperable queries

pith-pipeline@v0.9.0 · 5532 in / 1230 out tokens · 23617 ms · 2026-05-09T17:39:41.313558+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 6 canonical work pages · 3 internal anchors

[1]

Skos simple knowledge organization system reference,

A. Miles and S. Bechhofer, “Skos simple knowledge organization system reference,” 2009

2009
[2]

Sparql 1.1 query language,

G. Steve Harris and A. Seaborne, “Sparql 1.1 query language,” https: //www.w3.org/TR/sparql11-query/, [Accessed November 2025]

2025
[3]

The probabilistic relevance frame- work: Bm25 and beyond,

S. Robertson, H. Zaragozaet al., “The probabilistic relevance frame- work: Bm25 and beyond,”Foundations and Trends® in Information Retrieval, vol. 3, no. 4, pp. 333–389, 2009

2009
[4]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,”arXiv:1908.10084, 2019

work page internal anchor Pith review arXiv 1908
[5]

Automated skill decomposition meets expert ontologies: Bridging the granularity gap with llms,

L. N. Luyen and M.-H. Abel, “Automated skill decomposition meets expert ontologies: Bridging the granularity gap with llms,” arXiv:2510.11313, 2025

work page arXiv 2025
[6]

How well do llms predict prerequisite skills? zero-shot comparison to expert-defined concepts,

N. L. Le and M.-H. Abel, “How well do llms predict prerequisite skills? zero-shot comparison to expert-defined concepts,”arXiv:2507.18479, 2025

work page arXiv 2025
[7]

Capability-driven skill generation with llms: A rag-based approach for reusing existing libraries and interfaces,

L. M. V . Da Silva, A. K ¨oche, N. K ¨onig, F. Gehlhoff, and A. Fay, “Capability-driven skill generation with llms: A rag-based approach for reusing existing libraries and interfaces,” in2025 IEEE 30th Interna- tional Conference on Emerging Technologies and Factory Automation (ETFA). IEEE, 2025, pp. 1–8

2025
[8]

Ieee standard for learning technology–data model for shareable com- petency definitions,

“Ieee standard for learning technology–data model for shareable com- petency definitions,”IEEE Std 1484.20.3-2022, pp. 1–31, 2023

2022
[9]

Efficient and robust approxi- mate nearest neighbor search using hierarchical navigable small world graphs,

Y . A. Malkov and D. A. Yashunin, “Efficient and robust approxi- mate nearest neighbor search using hierarchical navigable small world graphs,”IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 4, pp. 824–836, 2018

2018
[10]

Combination of multiple searches,

E. A. Fox and J. A. Shaw, “Combination of multiple searches,”NIST special publication SP, vol. 243, 1994

1994
[11]

Reciprocal rank fusion outperforms condorcet and individual rank learning methods,

G. V . Cormack, C. L. Clarke, and S. Buettcher, “Reciprocal rank fusion outperforms condorcet and individual rank learning methods,” inProceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 2009, pp. 758–759

2009
[12]

Passage Re-ranking with BERT

R. Nogueira and K. Cho, “Passage re-ranking with bert,” arXiv:1901.04085, 2019

work page internal anchor Pith review arXiv 1901
[13]

Retrieval- augmented generation for knowledge-intensive nlp tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020

2020
[14]

K-bert: Enabling language representation with knowledge graph,

W. Liu, P. Zhou, Z. Zhao, Z. Wang, Q. Ju, H. Deng, and P. Wang, “K-bert: Enabling language representation with knowledge graph,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 03, 2020, pp. 2901–2908

2020
[15]

Splade: Sparse lexical and expansion model for first stage ranking,

T. Formal, B. Piwowarski, and S. Clinchant, “Splade: Sparse lexical and expansion model for first stage ranking,” inProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 2288–2292

2021
[16]

Colbert: Efficient and effective passage search via contextualized late interaction over bert,

O. Khattab and M. Zaharia, “Colbert: Efficient and effective passage search via contextualized late interaction over bert,” inProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020, pp. 39–48

2020
[17]

Vers un cadre ontologique pour la gestion des comp ´etences: `a des fins de formation, de recrutement, de m´etier, ou de recherches associ ´ees,

N. L. Le, M.-H. Abel, and B. Laforge, “Vers un cadre ontologique pour la gestion des comp ´etences: `a des fins de formation, de recrutement, de m´etier, ou de recherches associ ´ees,”arXiv:2507.05767, 2025

work page arXiv 2025
[18]

Memorae project: an approach and a platform for learning innovation,

M.-H. Abel, “Memorae project: an approach and a platform for learning innovation,”Multimedia Tools and Applications, vol. 81, no. 25, pp. 35 555–35 569, 2022

2022
[19]

Sqlite fts5 extension,

S. Consortium, “Sqlite fts5 extension,” https://sqlite.org/fts5.html, [Ac- cessed 11/2025]

2025
[20]

Understanding hnswlib: A graph-based library for fast approximate nearest neighbor search,

R. Winastwan, “Understanding hnswlib: A graph-based library for fast approximate nearest neighbor search,” https://zilliz.com/learn/ learn-hnswlib-graph-based-library-for-fast-ann, [Accessed 11/2025]

2025
[21]

gpt-oss-120b & gpt-oss-20b Model Card

S. Agarwal, L. Ahmad,et al., “gpt-oss-120b & gpt-oss-20b model card,” arXiv:2508.10925, 2025

work page internal anchor Pith review arXiv 2025