pith. sign in

arxiv: 1907.10528 · v1 · pith:4LFWT37Fnew · submitted 2019-07-24 · 💻 cs.DB · cs.AI

The sameAs Problem: A Survey on Identity Management in the Web of Data

Pith reviewed 2026-05-24 16:22 UTC · model grok-4.3

classification 💻 cs.DB cs.AI
keywords sameAs problemidentity managementWeb of Datalinked dataowl:sameAsknowledge graphsdata integrationsurvey
0
0 comments X

The pith

Incorrect sameAs links disrupt data reuse across the Web of Data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey investigates the sameAs problem in decentralized knowledge graphs, where multiple names for one entity require owl:sameAs statements to connect overlapping data sources. These statements enable useful deductions for reuse, yet incorrect applications produce wide effects in a global system. The authors review prior work that has already shown identity is broken, examine weaknesses in existing solutions, and identify open challenges ahead. A sympathetic reader would care because reliable identity links are required for any effective data integration or reasoning over the Web of Data.

Core claim

The paper claims that identity management in the Web of Data is broken. Several earlier studies have established problems with sameAs statements. This survey maps the current state of solutions, draws out their main weaknesses, and lists the open challenges that remain to be solved.

What carries the argument

The owl:sameAs statement, which asserts that two names denote the same entity and thereby links data for reuse.

If this is right

  • Knowledge graphs from different sources cannot be merged reliably without correct identity links.
  • Deductive systems built on the Web of Data will propagate errors from faulty sameAs statements.
  • Data reuse across the Web of Data remains limited until identity weaknesses are resolved.
  • Future identity solutions must address the specific weaknesses identified in current approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Applications such as federated query engines or linked-data browsers may need explicit handling of uncertain identity rather than assuming sameAs is reliable.
  • Similar identity management difficulties could arise in other decentralized data environments outside the Web of Data.
  • Verification or qualification mechanisms for identity statements may be required in addition to the existing binary sameAs relation.

Load-bearing premise

The body of prior work reviewed accurately represents the prevalence and nature of the sameAs problem today.

What would settle it

A large-scale audit of public datasets that finds most sameAs statements to be correct and shows no measurable negative effects on downstream data applications.

read the original abstract

In a decentralised knowledge representation system such as the Web of Data, it is common and indeed desirable for different knowledge graphs to overlap. Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Whilst the deductive value of such identity statements can be extremely useful in enhancing various knowledge-based systems, incorrect use of identity can have wide-ranging effects in a global knowledge space like the Web of Data. With several works already proven that identity in the Web is broken, this survey investigates the current state of this "sameAs problem". An open discussion highlights the main weaknesses suffered by solutions in the literature, and draws open challenges to be faced in the future.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. This survey investigates the 'sameAs problem' in the Web of Data: the use of owl:sameAs to link overlapping knowledge graphs, the deductive utility of correct identity statements, and the wide-ranging effects of incorrect identity assertions. It reviews prior works demonstrating that identity management on the Web is broken, discusses main weaknesses of existing solutions, and identifies open challenges for future work.

Significance. If the survey faithfully represents the cited literature, it consolidates knowledge on a practically important issue for decentralized Semantic Web systems and can usefully direct attention to open challenges in identity management.

major comments (1)
  1. [Introduction / Survey methodology] The manuscript does not describe the search strategy, inclusion/exclusion criteria, or temporal scope used to select the reviewed works. This information is required to evaluate whether the cited body of literature accurately represents the current state of the sameAs problem (as asserted in the abstract and introduction).
minor comments (1)
  1. [Abstract] The abstract states that 'several works already proven that identity in the Web is broken' but does not cite those works; the introduction should provide explicit references at this point.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for this constructive comment on our survey. We agree that a clear description of the literature selection process is necessary for a survey paper and will incorporate it in the revision.

read point-by-point responses
  1. Referee: [Introduction / Survey methodology] The manuscript does not describe the search strategy, inclusion/exclusion criteria, or temporal scope used to select the reviewed works. This information is required to evaluate whether the cited body of literature accurately represents the current state of the sameAs problem (as asserted in the abstract and introduction).

    Authors: We acknowledge the omission. The original manuscript focused on synthesizing known issues and challenges rather than on the systematic review protocol. In the revised version we will add a new subsection (likely in Section 1 or as a dedicated 'Survey Methodology' section) that explicitly states the search strategy (e.g., Google Scholar, DBLP, Semantic Web venues), the keywords and Boolean queries employed, the temporal scope (papers up to mid-2019), and the inclusion/exclusion criteria applied to ensure the cited works represent the state of the sameAs problem. revision: yes

Circularity Check

0 steps flagged

No significant circularity: survey of external literature

full rationale

This is a literature survey whose central claim (incorrect identity statements can have wide-ranging effects) is supported by citations to prior external demonstrations that 'identity in the Web is broken.' No new equations, fitted parameters, predictions, uniqueness theorems, or ansatzes are introduced by the authors. The load-bearing condition is accurate representation of the cited body of work, which is external to the present paper and therefore does not reduce to self-definition or self-citation chains within this manuscript. No circular steps are present.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Survey paper; no free parameters, axioms, or invented entities are introduced by the authors.

pith-pipeline@v0.9.0 · 5661 in / 842 out tokens · 17389 ms · 2026-05-24T16:22:41.138559+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    Batchelor, C

    []Batchelor et al., 2014 C. Batchelor, C. Brenninkmeijer, C. Chichester, M. Davies, D. Digles, I. Dunlop, C. Evelo, A. Gaulton, C. Goble, A. Gray, et al. Scientific lenses to support multiple views over linked chemistry data. In ISWC, pages 98–113. Springer,

  2. [2]

    []Beek et al., 2016 W . Beek, S. Schlobach, and F. van Harmelen. A contextualised semantics for owl: sameas. In ISWC, pages 405–419. Springer,

  3. [3]

    []Beek et al., 2018 W . Beek, J. Raad, J. Wielemaker, and F. van Harmelen. sameas. cc: The closure of 500m owl: sameas statements. In ESWC, pages 65–80. Springer,

  4. [4]

    Bouquet, F

    []Bouquet et al., 2003 P . Bouquet, F. Giunchiglia, F. V an Harmelen, L. Serafini, and H. Stuckenschmidt. C-owl: Contextualizing ontologies. In ISWC, pages 164–179. Springer,

  5. [5]

    Bouquet, H

    []Bouquet et al., 2007 P . Bouquet, H. Stoermer, and D. Gi- acomuzzi. OKKAM: enabling a web of entities. In I3, volume 249 of CEUR W orkshop Proceedings,

  6. [6]

    CudreMauroux, P

    []CudreMauroux et al., 2009 P . CudreMauroux, P . Haghani, M. Jost, K. Aberer, and H. De Meer. idmesh: graph-based disambiguation of linked data. In WWW, pages 591–600. ACM,

  7. [7]

    Cuzzola, E

    []Cuzzola et al., 2015 J. Cuzzola, E. Bagheri, and J. Jo- vanovic. Filtering inaccurate entity co-references on the linked open data. In DEXA, pages 128–143. Springer,

  8. [8]

    []de Melo, 2013 G. de Melo. Not quite the same: Identity constraints for the web of linked data. In Twenty-Seventh AAAI Conference on Artificial Intelligence ,

  9. [9]

    Euzenat and P

    []Euzenat and Shvaiko, 2013 J. Euzenat and P . Shvaiko. On- tology Matching, 2nd Edition . Springer,

  10. [10]

    Fern´ andez, W

    []Fern´ andezet al., 2017 J. Fern´ andez, W . Beek, M. Mart´ ınez-Prieto, and M. Arias. Lod-a-lot. In ISWC, pages 75–83. Springer,

  11. [11]

    Ferrara, A

    []Ferrara et al., 2013 A. Ferrara, A. Nikolov, and F. Scharffe. Data linking for the semantic web. Semantic W eb: On- tology and Knowledge Base Enabled T ools, Services, and Applications, 169:326,

  12. [12]

    []Geach, 1967 P .T. Geach. Identity. Review of Metaphysics , 21:3–12,

  13. [13]

    Glaser, A

    []Glaser et al., 2009 H. Glaser, A. Jaffri, and I. Millard. Managing co-reference on the semantic web. In WWW W orkshop on Linked Data on the W eb,

  14. [14]

    Grant and V

    []Grant and Subrahmanian, 1995 J. Grant and V . S. Subrah- manian. Reasoning in inconsistent knowledge bases. IEEE Trans. Knowl. Data Eng., 7(1):177–189,

  15. [15]

    Gu´ eret, P

    []Gu´ eretet al., 2012 C. Gu´ eret, P . Groth, C. Stadler, and J. Lehmann. Assessing linked data mappings using net- work measures. In ESWC, pages 87–102. Springer,

  16. [16]

    []Guha, 1991 R. Guha. Contexts: a formalization and some applications, volume

  17. [17]

    Halpin, P

    []Halpin et al., 2010 H. Halpin, P . J Hayes, J. McCusker, D. McGuinness, and H. Thompson. When owl:sameAs isn’t the same: An analysis of identity in Linked Data. In ISWC, pages 305–320. Springer,

  18. [18]

    Sense and reference on the web (doctoral dissertation)

    []Halpin, 2010 Harry Halpin. Sense and reference on the web (doctoral dissertation). University of Edinburgh,

  19. [19]

    Hogan, A

    []Hogan et al., 2012 A. Hogan, A. Zimmermann, J. Um- brich, A. Polleres, and S. Decker. Scalable and distributed methods for entity matching, consolidation and disam- biguation over linked data corpora. W eb Semantics Jour- nal, 10:76–110,

  20. [20]

    Idrissou, R

    []Idrissou et al., 2017 A. Idrissou, R. Hoekstra, F. van Harmelen, A. Khalili, and P . van den Besselaar. Is my: sameas the same as your: sameas?: Lenticular lenses for context-specific identity. In K-CAP, page

  21. [21]

    []Kripke, 1972 S. Kripke. Naming and necessity. In Seman- tics of natural language , pages 253–355. Springer,

  22. [22]

    []Lewis, 1986 D. Lewis. On the plurality of worlds. Oxford, 14:43,

  23. [23]

    Mealling and R Daniel

    []Mealling and Daniel, 1999 M. Mealling and R Daniel. Uri resolution services necessary for urn resolution (rfc 2483 ),

  24. [24]

    Nentwig, M

    []Nentwig et al., 2017 M. Nentwig, M. Hartung, A. Ngonga Ngomo, and E. Rahm. A survey of current link discovery frameworks. Semantic W eb, 8(3):419–436,

  25. [25]

    []Nguyen, 2007 N. Nguyen. Advanced methods for inconsis- tent knowledge management. Springer Science & Business Media, Secaucus, NJ, USA,

  26. [26]

    Papaleo, N

    []Papaleo et al., 2014 L. Papaleo, N. Pernelle, F. Sa¨ ıs, and C. Dumont. Logical detection of invalid sameas statements in rdf data. In EKAW, pages 373–384. Springer,

  27. [27]

    Paulheim

    []Paulheim, 2014 H. Paulheim. Identifying wrong links be- tween datasets by multi-dimensional outlier detection. In W oDOOM, pages 27–38,

  28. [28]

    []Raad et al., 2017 J. Raad, N. Pernelle, and F. Sa¨ ıs. Detec- tion of contextual identity links in a knowledge base. In K-CAP, page

  29. [29]

    []Raad et al., 2018 J. Raad, W . Beek, F. van Harmelen, N. Pernelle, and F. Sa¨ ıs. Detecting erroneous identity links on the web using network metrics. In ISWC, pages 391–

  30. [30]

    Schlegel, F

    []Schlegel et al., 2014 K. Schlegel, F. Stegmaier, S. Bayerl, M. Granitzer, and H. Kosch. Balloon fusion: Sparql rewrit- ing based on unified co-reference information. In Data Engineering W orkshops, pages 254–259. IEEE,

  31. [31]

    Entity linking with a knowledge base: Issues, tech- niques, and solutions

    []Shen et al., 2015 Wei Shen, Jianyong Wang, and Jiawei Han. Entity linking with a knowledge base: Issues, tech- niques, and solutions. IEEE Transactions on Knowledge and Data Engineering , 27(2):443–460,

  32. [32]

    V aldestilhas, T

    []V aldestilhaset al., 2017 A. V aldestilhas, T. Soru, and A. Ngonga Ngomo. Cedal: time-efficient detection of er- roneous links in large-scale link repositories. In ICWI, pages 106–113. ACM, 2017