pith. machine review for the scientific record. sign in

arxiv: 2605.01337 · v1 · submitted 2026-05-02 · 💻 cs.DL

Recognition: unknown

Comparison of OpenAlex and Scopus coverage of German institutions' publications in top-tier journals

Andrey Lovakov , Ivan Sterligov

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:14 UTC · model grok-4.3

classification 💻 cs.DL
keywords OpenAlexScopusaffiliation metadatainstitutional publicationsGerman research institutionstop-tier journalsbibliometric coveragepublication counts
0
0 comments X

The pith

OpenAlex undercounts German institutions' publications in top journals compared to Scopus, though relative rankings stay stable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares how many publications from German research institutions appear in OpenAlex versus Scopus when limited to articles in top-tier journals. OpenAlex includes more papers overall at the journal level, but it attributes fewer of them to specific German institutions because of missing or incorrect affiliation data. Even with these absolute differences, the two databases produce highly correlated lists of institutional outputs, so the order of institutions by publication volume stays nearly the same. This matters for anyone who wants a free alternative to paid bibliometric tools: relative comparisons look reliable, while absolute counts do not yet.

Core claim

Institutional publication counts in OpenAlex are consistently lower than those in Scopus for the same set of top-tier journal articles, which the authors attribute to missing or incorrectly assigned affiliations in OpenAlex. At the same time, the correlations between institutional outputs in the two databases are very high, indicating that relative institutional rankings remain stable.

What carries the argument

The side-by-side count of publications per German institution, obtained by matching affiliation metadata from OpenAlex and Scopus on the identical set of top-tier journal articles.

If this is right

  • Relative rankings of German institutions by publication output can be produced with OpenAlex without large distortions.
  • Absolute publication counts from OpenAlex cannot yet be used for evaluation or funding decisions that depend on precise numbers.
  • OpenAlex's wider journal coverage does not automatically improve institutional attribution for German organizations.
  • Targeted fixes to affiliation metadata would allow OpenAlex to support both comparative and absolute analyses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar affiliation gaps may exist for institutions outside Germany, pointing to a broader metadata limitation in OpenAlex.
  • If affiliation matching improves, OpenAlex could become a practical substitute for paid databases in many ranking exercises.
  • Users could test the findings by running the same comparison on a different country or journal set to check stability.

Load-bearing premise

Scopus affiliation records are treated as the correct reference point against which OpenAlex shortfalls are measured.

What would settle it

A manual review of a random sample of the top-tier journal articles to determine, for each one, which database correctly lists the German institutional affiliations.

Figures

Figures reproduced from arXiv: 2605.01337 by Andrey Lovakov, Ivan Sterligov.

Figure 1
Figure 1. Figure 1: An example of an OpenAlex profile for a large university as seen in web UI. [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An example of a Scopus profile for a large university as seen in web UI. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Relative difference in total number of publications for 2074 journals in Scopus [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Mean relative difference and correlation coefficients between total number of [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Mean relative difference and correlation coefficients between total number of [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Mean relative difference and correlation coefficients between total number of [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
read the original abstract

OpenAlex has recently emerged as a leading alternative to proprietary bibliometric sources. However, concerns remain regarding the quality of its metadata, especially the institutional profiles which are crucial for evaluating organizations. This study assesses the quality of affiliation data in OpenAlex using German research institutions. Publications from top-tier journals were analyzed and institutional publication counts in OpenAlex were systematically compared with counts in Scopus. The results show that OpenAlex generally contains more publications at the journal level, reflecting its broader coverage. However, institutional publication counts in OpenAlex are consistently lower, indicating missing or incorrectly assigned affiliations. Nevertheless, the correlations between institutional outputs in both databases are very high, suggesting that relative institutional rankings remain stable. These findings suggest that OpenAlex is suitable for comparative institutional analyses in academic research but requires further improvement in affiliation metadata before it can be used for evaluation contexts that rely on absolute publication counts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The manuscript compares publication counts for German institutions in a set of top-tier journals between OpenAlex and Scopus. It finds that OpenAlex shows broader coverage at the journal level but consistently lower institutional counts, which the authors interpret as evidence of missing or incorrectly assigned affiliations in OpenAlex. High correlations between the two sources are reported, supporting the claim that relative institutional rankings remain stable and that OpenAlex is suitable for comparative analyses but not for absolute counts in evaluation contexts.

Significance. If the direction of the observed discrepancies can be confirmed, the work supplies a timely empirical benchmark for users of OpenAlex in institutional-level bibliometrics, especially for German research organizations. The reported high rank correlations constitute a concrete, falsifiable strength that directly informs practical decisions about when open databases can substitute for proprietary ones. The study also highlights metadata quality issues that remain relevant as open alternatives gain adoption.

major comments (1)
  1. [Methods] Methods section: Scopus affiliation metadata is treated as the reference baseline for attributing lower OpenAlex institutional counts to missing or incorrect affiliations, yet no manual audit or sample validation of discrepant records against original papers or institutional CVs is described. This assumption is load-bearing for the central inference; without it, the discrepancies could stem from Scopus over-attribution, divergent disambiguation rules, or multi-affiliation handling instead.
minor comments (3)
  1. [Abstract] Abstract and Methods: the criteria used to define the set of 'top-tier journals' are not stated, leaving open the possibility of selection effects that could affect the generalizability of the coverage comparison.
  2. [Results] Results: exact procedures for matching institution names and handling multi-affiliation strings across the two databases are not detailed, which would help readers assess the robustness of the reported count differences and correlations.
  3. The manuscript would benefit from reporting the time window of the publication data and any confidence intervals or sensitivity checks on the correlation coefficients.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We are grateful to the referee for their detailed review and valuable suggestion. Below we respond to the major comment and indicate the changes we will implement in the revised manuscript.

read point-by-point responses
  1. Referee: [Methods] Methods section: Scopus affiliation metadata is treated as the reference baseline for attributing lower OpenAlex institutional counts to missing or incorrect affiliations in OpenAlex, yet no manual audit or sample validation of discrepant records against original papers or institutional CVs is described. This assumption is load-bearing for the central inference; without it, the discrepancies could stem from Scopus over-attribution, divergent disambiguation rules, or multi-affiliation handling instead.

    Authors: We thank the referee for pointing out this important caveat. We agree that without direct validation, other factors could contribute to the discrepancies. In the revised manuscript, we will modify the Methods section to clarify that Scopus serves as a comparative benchmark, not a verified gold standard, and add a new subsection in the Discussion addressing potential alternative causes such as divergent disambiguation rules or multi-affiliation handling. We will also explicitly state the limitation regarding the lack of manual sample validation. That said, the fact that OpenAlex includes a larger number of publications in the selected journals overall, yet shows lower counts when restricted to German institutions, provides supporting evidence for our primary interpretation of missing or incorrect affiliations. We believe these revisions will address the concern while preserving the core findings on relative rankings. revision: partial

Circularity Check

0 steps flagged

No circularity: direct empirical counts and correlations on external databases

full rationale

The paper conducts a straightforward comparison of publication counts and rank correlations between OpenAlex and Scopus for German institutions in selected top-tier journals. All reported results (higher journal-level coverage in OpenAlex, lower institutional counts, high correlations) are observational outputs from direct data extraction and simple statistical summaries. No equations, fitted parameters, predictions, ansatzes, or self-citations appear in the derivation chain. The analysis is fully self-contained against the two external bibliographic sources and does not reduce any claim to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The comparison treats Scopus as the reference standard and attributes all discrepancies to OpenAlex metadata quality.

axioms (1)
  • domain assumption Scopus affiliation data is sufficiently accurate and complete to serve as ground truth for detecting OpenAlex errors
    The paper uses Scopus counts to identify undercounting in OpenAlex without independent validation of Scopus itself.

pith-pipeline@v0.9.0 · 5446 in / 1141 out tokens · 54549 ms · 2026-05-10T16:14:34.944160+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

5 extracted references · 4 canonical work pages

  1. [1]

    Aagaard, K., Bloch, C., & Schneider, J. W. (2015). Impacts of performance-based research funding systems: The case of the Norwegian Publication Indicator. Research Evaluation, 24(2), 106–117. https://doi.org/10.1093/reseval/rvv003 Abramo, G., & D’Angelo, C. A. (2015). The relationship between the number of authors of a publication, its citations and the i...

  2. [2]

    https://doi.org/10.1016/j.joi.2015.07.003 Alonso-Álvarez, P., & van Eck, N. J. (2025). Coverage and metadata completeness and accuracy of African research publications in OpenAlex: A comparative analysis. Quantitative Science Studies, 6, 1336–1357. https://doi.org/10.1162/QSS.a.396 Alperin, J. P., Portenoy, J., Demes, K., Larivière, V., & Haustein, S. (20...

  3. [3]

    https://doi.org/10.1038/s41597-023-02625-x Maddi, A., Maisonobe, M., & Boukacem-Zeghmouri, C. (2025). Geographical and disciplinary coverage of open access journals: OpenAlex, Scopus, and WoS. PLOS ONE, 20(4), e0320347. https://doi.org/10.1371/journal.pone.0320347 Nazarovets, M., Laakso, M., & Taşkın, Z. (2026). University journals in the global academic ...

  4. [4]

    https://doi.org/10.1007/s11192-023-04923-y Zheng, M., Miao, L., Bu, Y., & Larivière, V. (2025). Understanding discrepancies in the coverage of OpenAlex: The case of China. Journal of the Association for Information Science and Technology, 76(11), 1591–1601. https://doi.org/10.1002/asi.70013 Acknowledgments Data in this study were partly obtained from the ...

  5. [5]

    Papers by Leipzig University in Nature Communications in 2020-2024 that are not linked to Leipzig University’s institutional profile in OpenAlex (of those with <100 authors and not linked to related institutions) document unmatched raw affiliation string https://openalex.org/W 3087050969 Applied Quantum Systems, Felix-Bloch Institute for Solid-State Physi...