MaDI-Bench supplies the first end-to-end benchmark tasks for full relational data integration pipelines across domains plus a variant-generation method to slow saturation.
Evaluation of Pipelines for Data Integration into Knowledge Graphs
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Integrating new data into knowledge graphs (KG) typically involves different tasks that are executed within workflows or pipelines There are many possible pipelines for a specific integration problem but there is not yet a general approach to evaluate the overall quality and performance of such pipelines to be able to determine the best choices. We therefore propose a new benchmark KGI-Bench to evaluate integration pipelines that ingest different kinds of input data into an existing KG. We evaluate pipelines by analyzing their output, i.e., the updated KG, with the three complementary quality metrics coverage, correctness and consistency. We also provide benchmark datasets (seed KG, overlapping input data of three formats, reference KG as a ground truth) for the movie domain. To demonstrate the applicability and usefulness of the proposed benchmark, we comparatively evaluate 12 pipelines and analyze their behavior across different input data formats and design choices.
fields
cs.DB 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MaDI-Bench: An End-to-End Data Integration Benchmark
MaDI-Bench supplies the first end-to-end benchmark tasks for full relational data integration pipelines across domains plus a variant-generation method to slow saturation.