arxiv: 2602.03417 · v2 · submitted 2026-02-03 · 💻 cs.CL

Recognition: 2 theorem links

· Lean Theorem

FactNet: A Billion-Scale Knowledge Graph for Multilingual Factual Grounding

Yingli Shen , Wen Lai , Jie Zhou , Xueren Zhang , Yudong Wang , Kangyang Luo , Shuo Wang , Ge Gao

show 2 more authors

Alexander Fraser Maosong Sun

Authors on Pith no claims yet

Pith reviewed 2026-05-16 08:02 UTC · model grok-4.3

classification 💻 cs.CL

keywords FactNetknowledge graphmultilingual groundingWikidataWikipediaknowledge graph completionfact checkingcross-lingual transfer

0 comments

The pith

FactNet couples 1.7 billion Wikidata assertions with traceable evidence spans from 316 Wikipedia editions to support multilingual factual grounding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents FactNet as a resource that pairs structured assertions from Wikidata with specific textual pointers in Wikipedia pages across hundreds of languages. This construction uses a deterministic pipeline that records exact byte locations for every link, making every evidence unit directly verifiable. The work also releases FactNet-Bench, a set of tasks for knowledge graph completion, question answering, and fact checking that includes controls against data leakage. Experiments on the benchmark show that methods using both structure and text outperform purely structural or purely language-model approaches, and that patterns learned in high-resource languages transfer to lower-resource ones.

Core claim

FactNet is built by aligning 1.7 billion Wikidata assertions to 3.01 billion evidence pointers extracted from 316 native Wikipedia language editions through a fully deterministic pipeline that guarantees byte-level traceability back to source text. The resource is released together with FactNet-Bench, an evaluation suite for knowledge graph completion, question answering, and fact checking that incorporates systematic leakage controls. Tests on this suite demonstrate that structural methods, text-aware methods, and LLM-integrated methods produce measurably different performance profiles, and that cross-lingual structure in the graph supports knowledge transfer across language-resource tiers.

What carries the argument

The deterministic construction pipeline that maps each Wikidata assertion to precise byte spans of supporting text in Wikipedia pages while preserving language-native editions.

If this is right

Models trained on FactNet can ground outputs in retrievable Wikipedia text for both high- and low-resource languages.
The benchmark separates the contributions of graph structure, textual context, and LLM integration on the same data.
Cross-lingual transfer experiments become possible because the same Wikidata assertions appear with evidence in multiple languages.
Any downstream system can trace a generated fact back to an exact Wikipedia byte range for verification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Large-scale grounded resources of this form could reduce hallucination rates in multilingual LLMs by supplying explicit evidence links rather than relying on parametric memory.
The same alignment technique might be applied to other structured sources such as domain-specific databases to create additional grounded graphs.
Systematic differences in alignment quality across language editions could surface coverage or bias issues in Wikipedia itself.

Load-bearing premise

The pipeline produces accurate links between assertions and evidence without introducing systematic errors or language-specific artifacts.

What would settle it

A manual audit of a random sample of the linked evidence spans that finds error rates above a few percent or consistent language-dependent biases in the alignment quality.

Figures

Figures reproduced from arXiv: 2602.03417 by Alexander Fraser, Ge Gao, Jie Zhou, Kangyang Luo, Maosong Sun, Shuo Wang, Wen Lai, Xueren Zhang, Yingli Shen, Yudong Wang.

**Figure 1.** Figure 1: FactNet Architecture. The graph couples Wikidata claims with native evidence (from Wikipedia) via three layers: FactStatement (atomic unit), FactSense (grounded span with byte-offsets), and FactSynset (cross-lingual normalization). RelationEdges facilitate structural reasoning. 2024; Sui et al., 2025). However, a critical bottleneck persists in multilingual settings, where evidence is unevenly distribute… view at source ↗

**Figure 2.** Figure 2: FactNet construction workflow. The pipeline processes dumps through three deterministic stages: (1) view extraction, (2) statement canonicalization, and (3) evidence matching. By avoiding stochastic models, we ensure every generated FactSynset and FactSense retains a stable, auditable trace back to the source snapshot. dumps, specifically Wikidata JSON, Wikipedia XML, and SQL link tables. We avoid stochast… view at source ↗

**Figure 3.** Figure 3: Results on FactNet-Bench: (a) KGC under leakage control, (b) MKQA semantic parsing (18 langs), (c) MFC verification and evidence quality. Error bars show std. over 3 seeds. hances GNN performance (Appendix D.4), demonstrating that auditable, rule-derived structures can improve learning efficiency without violating inductive constraints. (II) FactNet-MKQA Analysis. Results in [PITH_FULL_IMAGE:figures/full_… view at source ↗

**Figure 4.** Figure 4: Language-rank distribution diagnostics. This figure complements [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗

**Figure 5.** Figure 5: Evidence-gap funnel visualization. We recommend plotting both macro- and micro-averaged funnels, and stacking the dominant loss reasons per tier. This figure operationalizes the “evidence gap” and helps users choose between improving extraction coverage versus restricting to the strong-evidence subset [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

read the original abstract

Large language models hallucinate factual claims and struggle to ground their outputs in retrievable evidence, particularly in non-English languages. Existing resources impose a trade-off: structured knowledge bases lack textual grounding, whereas grounded datasets remain small and monolingual. We introduce FactNet, a billion-scale open resource that couples 1.7B Wikidata assertions with 3.01B evidence pointers drawn from 316 native Wikipedia editions. FactNet employs a deterministic construction pipeline, ensuring that every evidence unit is traceable to its source with byte-level precision. We further establish FactNet-Bench, an evaluation suite for Knowledge Graph Completion, Question Answering, and Fact Checking, equipped with systematic leakage controls. Experiments demonstrate that FactNet-Bench differentiates among structural, text-aware, and LLM-integrated methods, and that cross-lingual structure enables knowledge transfer across language tiers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FactNet scales a grounded multilingual KG to 1.7B assertions across 316 languages with traceability, but the linking pipeline has no reported accuracy numbers.

read the letter

FactNet puts 1.7 billion Wikidata assertions together with 3 billion evidence pointers from 316 Wikipedia editions, all kept traceable to byte offsets. That combination of size, language coverage, and explicit grounding is new compared with earlier smaller or monolingual datasets. The deterministic pipeline based on surface forms and sentence boundaries is a straightforward way to keep everything reproducible, and the FactNet-Bench suite adds leakage controls for KG completion, QA, and fact checking. The reported experiments show the benchmark can separate structural, text-aware, and LLM-based methods, and cross-lingual links appear to support some transfer across language tiers. Those are the concrete advances. The main gap is that the paper gives no quantitative check on how accurate the evidence links actually are. There are no precision or recall figures, no human-annotated sample, and no language-by-language error breakdown. For lower-resource Wikipedias the mention detection step is likely noisier, yet nothing is shown on whether that introduces systematic bias. Without those numbers the resource is harder to trust at face value. The work is a data paper rather than a modeling one, so there are no complex equations or free parameters to worry about. Citation patterns follow standard Wikidata and Wikipedia lines. This is useful for groups building multilingual fact-checking systems or trying to ground non-English LLMs. A reader who needs large-scale traceable data would get direct value from the released resource once the link quality is clearer. It is worth sending to peer review because the scale and benchmark design are substantial enough to justify referee time, even if the validation section needs expansion.

Referee Report

2 major / 1 minor

Summary. The paper introduces FactNet, a billion-scale open knowledge graph that couples 1.7B Wikidata assertions with 3.01B evidence pointers drawn from 316 native Wikipedia editions via a deterministic construction pipeline ensuring byte-level traceability. It further presents FactNet-Bench, an evaluation suite for Knowledge Graph Completion, Question Answering, and Fact Checking equipped with leakage controls, and reports experiments showing that the resource differentiates structural, text-aware, and LLM-integrated methods while enabling cross-lingual knowledge transfer.

Significance. If the linking accuracy holds, FactNet would be a significant contribution as the first billion-scale multilingual resource bridging structured knowledge bases and grounded textual evidence, directly addressing LLM hallucination and non-English grounding gaps. The deterministic pipeline, scale, openness, and inclusion of leakage-controlled benchmarks are notable strengths that could support reproducible research across language tiers.

major comments (2)

[Methods] Methods section (construction pipeline): The rule-based alignment using entity surface forms, sentence boundaries, and byte offsets across 316 editions is presented as deterministic and accurate, but the manuscript supplies no human-annotated precision/recall figures, no ablation on the heuristics, and no language-specific error rates for lower-resource Wikipedias where mention detection is noisier; this directly undermines the central claim of reliable, unbiased evidence pointers.
[Experiments] Experiments section (FactNet-Bench): The claim that the benchmark differentiates methods and enables cross-lingual transfer is stated, yet no quantitative results, tables of performance metrics, or details on leakage control implementation are provided to support these assertions or allow verification of the evaluation suite's validity.

minor comments (1)

[Abstract] Abstract: The phrase 'systematic leakage controls' is used without any elaboration on their design or effectiveness, which should be clarified for readers evaluating the benchmark's reliability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [Methods] Methods section (construction pipeline): The rule-based alignment using entity surface forms, sentence boundaries, and byte offsets across 316 editions is presented as deterministic and accurate, but the manuscript supplies no human-annotated precision/recall figures, no ablation on the heuristics, and no language-specific error rates for lower-resource Wikipedias where mention detection is noisier; this directly undermines the central claim of reliable, unbiased evidence pointers.

Authors: We agree that the current manuscript lacks sufficient quantitative validation of the alignment pipeline. In the revised version, we will add human-annotated precision and recall figures on a stratified sample of languages (including lower-resource editions), ablations on the core heuristics, and language-specific error rates to substantiate the reliability of the evidence pointers. revision: yes
Referee: [Experiments] Experiments section (FactNet-Bench): The claim that the benchmark differentiates methods and enables cross-lingual transfer is stated, yet no quantitative results, tables of performance metrics, or details on leakage control implementation are provided to support these assertions or allow verification of the evaluation suite's validity.

Authors: We acknowledge that the Experiments section would benefit from expanded quantitative support. The revision will include detailed performance tables comparing structural, text-aware, and LLM-integrated methods, along with explicit implementation details for the leakage controls, to allow full verification of the benchmark's validity and the cross-lingual transfer results. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

This is a data resource paper that describes the deterministic construction of FactNet by aligning existing Wikidata assertions with Wikipedia evidence spans across 316 editions. No mathematical derivations, predictive equations, fitted parameters, or self-referential claims are present in the provided abstract or description. The central contribution is the resource and benchmark suite itself; the pipeline is presented as rule-based and traceable without any reduction of outputs to inputs by construction or load-bearing self-citation. This matches the expected non-circular outcome for resource papers.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The resource rests on the assumption that Wikidata and Wikipedia are sufficiently accurate sources; no free parameters or invented entities are introduced beyond the linking rules themselves.

axioms (2)

domain assumption Wikidata assertions constitute reliable factual ground truth
Construction begins by taking 1.7B Wikidata assertions as given.
domain assumption Wikipedia text provides valid supporting evidence for those assertions
Evidence pointers are extracted from Wikipedia editions.

pith-pipeline@v0.9.0 · 5467 in / 1174 out tokens · 37757 ms · 2026-05-16T08:02:40.282142+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

FactNet employs a strictly deterministic construction pipeline... three layers: FactStatement, FactSense, FactSynset... matching engine with structure-based, link-based, lexical matchers
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

1.7B Wikidata assertions with 3.01B evidence pointers... 316 Wikipedia editions

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 3 internal anchors

[1]

naacl-main.278/

URL https://aclanthology.org/2021. naacl-main.278/. Altuncu, E., Baskent, C., Bhattacherjee, S., Li, S., and Roy, D. Factors: A new dataset for studying the fact-checking ecosystem. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3530–3539, 2025. Auer, S., Bizer, C., Kobilarov, G., Leh...

work page 2021
[2]

Augenstein, I., Lioma, C., Wang, D., Chaves Lima, L., Hansen, C., Hansen, C., and Simonsen, J

Springer, 2007. Augenstein, I., Lioma, C., Wang, D., Chaves Lima, L., Hansen, C., Hansen, C., and Simonsen, J. G. MultiFC: A real-world multi-domain dataset for evidence-based fact checking of claims. In Inui, K., Jiang, J., Ng, V ., and Wan, X. (eds.),Proceedings of the 2019 Conference on Empir- ical Methods in Natural Language Processing and the 9th Int...

work page doi:10.18653/v1/d19-1475 2007
[3]

URL https: //aclanthology.org/2023.rocling-1.1/

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP). URL https: //aclanthology.org/2023.rocling-1.1/. Chen, C., Wang, Y ., Li, B., and Lam, K.-Y . Knowledge is flat: A Seq2Seq generative framework for various knowl- edge graph completion. In Calzolari, N., Huang, C.- R., Kim, H., Pustejovsky, J., Wanner, L., Choi, K.-S.,...

work page doi:10.18653/v1/ 2023
[5]

The Llama 3 Herd of Models

URL https://aclanthology.org/2025. findings-acl.827/. Gardent, C., Shimorina, A., Narayan, S., and Perez- Beltrachini, L. The WebNLG challenge: Generat- ing text from RDF data. In Alonso, J. M., Bugar ´ın, A., and Reiter, E. (eds.),Proceedings of the 10th In- ternational Conference on Natural Language Gener- ation, pp. 124–133, Santiago de Compostela, Spa...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/w17-3518 2025
[6]

findings-emnlp.123/

URL https://aclanthology.org/2023. findings-emnlp.123/. Jia, Z., Christmann, P., and Weikum, G. Faithful tempo- ral question answering over heterogeneous sources. In Proceedings of the ACM Web Conference 2024, pp. 2052– 2063, 2024. Kaffee, L.-A., Piscopo, A., V ougiouklis, P., Simperl, E., Carr, L., and Pintscher, L. A glimpse into babel: an analysis of m...

work page 2023
[7]

URL https://aclanthology.org/P11-1132/

Association for Computational Linguistics. URL https://aclanthology.org/P11-1132/. Liu, Y ., Cao, Y ., Lin, X., Shang, Y ., Wang, S., and Pan, S. Enhancing large language model for knowledge graph completion via structure-aware alignment-tuning. In Christodoulopoulos, C., Chakraborty, T., Rose, C., and Peng, V . (eds.),Proceedings of the 2025 Confer- ence...

work page 2025
[8]

Zhang, Y ., Yang, Y ., Shu, J., Wen, X., and Sang, J

Association for Computational Linguistics. ISBN 979-8-89176-332-6. doi: 10.18653/v1/2025.emnlp-main

work page doi:10.18653/v1/2025.emnlp-main 2025
[9]

emnlp-main.1061/

URL https://aclanthology.org/2025. emnlp-main.1061/. Longpre, S., Perisetla, K., Chen, A., Ramesh, N., DuBois, C., and Singh, S. Entity-based knowledge conflicts in question answering. In Moens, M.-F., Huang, X., Specia, L., and Yih, S. W.-t. (eds.),Proceedings of the 2021 Con- ference on Empirical Methods in Natural Language Pro- cessing, pp. 7052–7063, ...

work page doi:10.18653/v1/2021.emnlp-main 2025
[11]

naacl-long.226/

URL https://aclanthology.org/2025. findings-acl.591/. Ma, H., Xu, W., Wei, Y ., Chen, L., Wang, L., Liu, Q., Wu, S., and Wang, L. EX-FEVER: A dataset for multi- hop explainable fact verification. In Ku, L.-W., Martins, A., and Srikumar, V . (eds.),Findings of the Association for Computational Linguistics: ACL 2024, pp. 9340– 9353, Bangkok, Thailand, Augus...

work page doi:10.18653/v1/2024 2025
[12]

coling-main.580

URL https://aclanthology.org/2025. findings-emnlp.599/. Pikuliak, M., Srba, I., Moro, R., Hromadka, T., Smoleˇn, T., Meliˇsek, M., Vykopal, I., Simko, J., Podrouˇzek, J., and Bielikova, M. Multilingual previously fact-checked claim retrieval. In Bouamor, H., Pino, J., and Bali, K. (eds.), Proceedings of the 2023 Conference on Empirical Meth- ods in Natura...

work page doi:10.18653/v1/2023.emnlp-main 2025
[13]

emnlp-main.1027/

URL https://aclanthology.org/2023. emnlp-main.1027/. Qi, P., Zhang, Y ., Zhang, Y ., Bolton, J., and Manning, C. D. Stanza: A python natural language processing toolkit for many human languages. In Celikyilmaz, A. and Wen, T.-H. (eds.),Proceedings of the 58th An- nual Meeting of the Association for Computational Lin- guistics: System Demonstrations, pp. 1...

work page doi:10.18653/v1/2020.acl-demos.14 2023
[14]

ISBN 979-8-89176-256-5

Association for Computational Linguistics. ISBN 979-8-89176-256-5. doi: 10.18653/v1/2025.findings-acl

work page doi:10.18653/v1/2025.findings-acl 2025
[15]

RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space

URL https://aclanthology.org/2025. findings-acl.436/. Sun, Z., Deng, Z.-H., Nie, J.-Y ., and Tang, J. Rotate: Knowl- edge graph embedding by relational rotation in complex space.arXiv preprint arXiv:1902.10197, 2019. Thorne, J., Vlachos, A., Christodoulopoulos, C., and Mit- tal, A. FEVER: a large-scale dataset for fact extrac- tion and VERification. In Wa...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/n18-1074 2025
[16]

Qwen3 Technical Report

URL https://aclanthology.org/2024. emnlp-main.1088/. Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 12 FactNet: A Billion-Scale Knowledge Graph for Multilingual Factual Grounding Yao, L., Peng, J., Mao, C., and Luo, Y . Exploring large lang...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

Sitelink Exists (Condition) 1.00 1.00 1.00

work page
[18]

Page Retrieval Success 0.98 0.94 0.89

work page
[19]

Unit Construction Success 0.96 0.91 0.82

work page
[20]

stub/boilerplate dominated

Matching Success (≥1sense)0.79 0.58 0.36 Primary Loss FactorMatching Matching Page/Unit Attribution of losses within stages.To make the funnel actionable for dataset users, we further decompose Page Retrieval failures into redirect-only sitelinks, disambiguation pages, and XML parsing errors. Similarly, we decompose Unit Construction failures into pages w...

work page 2020