pith. machine review for the scientific record. sign in

arxiv: 2604.06229 · v1 · submitted 2026-03-30 · 💻 cs.DL

Recognition: 2 theorem links

· Lean Theorem

Discoverability matters: Open access models and the translation of science into patents

Authors on Pith no claims yet

Pith reviewed 2026-05-14 01:12 UTC · model grok-4.3

classification 💻 cs.DL
keywords open accesspatentssemantic similaritynon-patent referencesscientific citationsinnovationdiscoverabilityopen science
0
0 comments X

The pith

Fully open access publications show equal or higher semantic alignment with patented technologies than hybrid or bronze models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how open access publishing models affect which scientific papers get cited in patents and how closely those papers match the patented technologies. Patents draw citations disproportionately from hybrid and bronze OA publications, which benefit from established visibility. Yet fully OA journals in gold and diamond models deliver equal or stronger semantic similarity, measured from abstracts, with the advantage clearest for citations embedded in patent bodies rather than front matter. The core insight is that translation of science into innovation depends on discoverability within information systems more than on access rights alone.

Core claim

Patent citations disproportionately draw on publications disseminated through highly visible and institutionally established publishing channels, particularly hybrid and bronze OA models, indicating strong selection effects. However, this dominance in citation counts does not translate into stronger cognitive alignment with patented technologies. On the contrary, publications in fully OA journals (gold and diamond OA) exhibit equal or higher semantic proximity, especially when cited in the body of patents.

What carries the argument

Semantic similarity scores between patent abstracts and cited publication abstracts, separated by whether the citation appears in the front section or body text of the patent.

If this is right

  • Selection effects in patent citations favor publications from hybrid and bronze OA models due to their visibility.
  • Fully OA publications contribute to innovation at least as much through cognitive alignment as through raw citation volume.
  • The translation of science into patents hinges on how publishing models are embedded in discoverability infrastructures.
  • Body citations within patents reveal stronger alignment for fully OA work than front-section citations do.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Indexing and search systems may matter as much as open-access policies for moving knowledge into inventions.
  • Lower overall citation counts for full OA could mask higher per-citation relevance in actual technology development.
  • Extending the analysis to citation networks or field-specific patent classes could test whether the pattern holds beyond abstracts.

Load-bearing premise

Semantic similarity computed from abstracts accurately captures the cognitive alignment between scientific publications and the technologies described in patents.

What would settle it

A follow-up analysis using full-text similarity measures or expert technical review that finds lower or equal alignment for gold and diamond OA papers compared with hybrid and bronze models.

read the original abstract

Scientific research is a key input into technological innovation, yet not all scientific knowledge is equally mobilized in patents. This paper examines how different scientific publishing models shape both the selection of scientific publications cited in patents and their cognitive alignment with patented technologies. Using large-scale data on non-patent references linking patents to scientific publications, combined with metadata from OpenAlex, we compare the Open Access (OA) structure of patent-cited science to that of the scientific literature. We then assess cognitive alignment using semantic similarity between patent abstracts and the abstracts of cited publications, distinguishing between citations appearing in the front section of patents and those embedded in the body of patent texts. We find that patent citations disproportionately draw on publications disseminated through highly visible and institutionally established publishing channels, particularly hybrid and bronze OA models, indicating strong selection effects. However, this dominance in citation counts does not translate into stronger cognitive alignment with patented technologies. On the contrary, publications in fully OA journals (gold and diamond OA) exhibit equal or higher semantic proximity, especially when cited in the body of patents. These results suggest that the contribution of OA to innovation depends less on access alone than on how different publishing models are embedded in information infrastructures that shape the visibility, discoverability, and use of scientific knowledge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper examines how open access (OA) publishing models influence both the selection of scientific publications cited in patents and their cognitive alignment with patented technologies. Using large-scale non-patent references from patents linked to OpenAlex metadata, it compares the OA structure of cited science versus the broader literature and measures semantic similarity between patent abstracts and publication abstracts, distinguishing front-page versus body citations. The central claims are that patent citations disproportionately favor hybrid and bronze OA publications (indicating selection effects) but that fully OA (gold and diamond) publications exhibit equal or higher semantic proximity to the citing patents, especially in body citations, implying that discoverability and information infrastructures matter more than access alone for science-to-patent translation.

Significance. If the results hold after addressing methodological gaps, the paper provides valuable large-scale evidence that OA models affect not only citation volume but the substantive linkage between science and innovation. The distinction between front and body citations and the use of external patent and OpenAlex datasets are strengths that allow falsifiable claims about discoverability effects. This has direct implications for innovation policy, research evaluation, and OA mandates, moving beyond simple access arguments to infrastructure and visibility mechanisms.

major comments (3)
  1. [Methods (semantic similarity)] Methods section on semantic similarity: The headline result that gold and diamond OA publications show equal or higher proximity rests on abstract-to-abstract semantic similarity. Abstracts are short, audience-specific summaries (legal claims vs. scientific contributions), so measured proximity risks lexical overlap within fields rather than substantive technological alignment. The manuscript does not report full-text validation, inter-annotator checks, or robustness to alternative embeddings.
  2. [Results (citation shares)] Results section on citation shares: The analysis shows patent citations draw disproportionately from hybrid/bronze OA but does not include field-year fixed effects or disciplinary composition controls. Without these, OA-type differences may reflect the distribution of fields rather than discoverability effects, undermining the contrast with the overall literature.
  3. [Data and methods] Data and robustness: The abstract and methods description omit sample sizes, statistical controls, error handling, and robustness checks (e.g., alternative similarity thresholds or citation-type subsamples). These details are load-bearing for the claim that fully OA models exhibit higher body-citation proximity.
minor comments (2)
  1. [Introduction] Notation for OA categories (gold, diamond, hybrid, bronze) should be defined explicitly on first use with reference to the OpenAlex classification scheme.
  2. [Results figures] Figure legends for semantic similarity distributions lack axis labels and sample sizes per OA category.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. These have highlighted important areas for strengthening the methodological transparency and robustness of our analysis. We address each major comment below and indicate the revisions we will implement.

read point-by-point responses
  1. Referee: [Methods (semantic similarity)] Methods section on semantic similarity: The headline result that gold and diamond OA publications show equal or higher proximity rests on abstract-to-abstract semantic similarity. Abstracts are short, audience-specific summaries (legal claims vs. scientific contributions), so measured proximity risks lexical overlap within fields rather than substantive technological alignment. The manuscript does not report full-text validation, inter-annotator checks, or robustness to alternative embeddings.

    Authors: We acknowledge that abstracts are concise summaries and that abstract-to-abstract similarity may partly reflect lexical patterns rather than deep technological alignment. At the scale of our dataset, full-text analysis across all patent-publication pairs is not feasible due to computational costs and access restrictions for many documents. Abstract embeddings remain a standard proxy in large-scale scientometric studies of science-technology linkages. In the revision we will (i) add an explicit discussion of this limitation, (ii) report robustness results using alternative embedding models, and (iii) include a manual validation exercise on a random subsample of pairs with inter-annotator agreement metrics. revision: partial

  2. Referee: [Results (citation shares)] Results section on citation shares: The analysis shows patent citations draw disproportionately from hybrid/bronze OA but does not include field-year fixed effects or disciplinary composition controls. Without these, OA-type differences may reflect the distribution of fields rather than discoverability effects, undermining the contrast with the overall literature.

    Authors: We agree that field and year composition must be controlled to isolate discoverability effects. In the revised manuscript we will re-estimate the citation-share models with field-year fixed effects and additional disciplinary controls, reporting both the original and the controlled specifications side by side. revision: yes

  3. Referee: [Data and methods] Data and robustness: The abstract and methods description omit sample sizes, statistical controls, error handling, and robustness checks (e.g., alternative similarity thresholds or citation-type subsamples). These details are load-bearing for the claim that fully OA models exhibit higher body-citation proximity.

    Authors: We will substantially expand the Data and Methods section to report exact sample sizes for every analysis, list all statistical controls and clustering procedures, describe error handling, and add the requested robustness checks (alternative similarity thresholds and separate front-page versus body-citation subsamples). revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical observational analysis

full rationale

The paper conducts an empirical comparison of OA publishing models against patent citation patterns and semantic similarity scores derived from external datasets (OpenAlex and patent references). No mathematical derivations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the methodology or results. All reported findings (e.g., higher semantic proximity for gold/diamond OA in body citations) are direct outputs of data processing steps that remain independent of the target claims. The analysis is self-contained against external benchmarks and does not reduce any result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that abstract-based semantic similarity proxies cognitive alignment and that the non-patent reference data is representative of science-to-patent flows.

axioms (1)
  • domain assumption Semantic similarity between abstracts measures cognitive alignment with patented technologies
    Invoked to interpret higher proximity for gold/diamond OA as meaningful contribution rather than artifact.

pith-pipeline@v0.9.0 · 5556 in / 1183 out tokens · 45835 ms · 2026-05-14T01:12:02.218077+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages

  1. [1]

    Robustness checks This appendix includes supplementary analyses exploring the sensitivity of the results to alternative specifications and sample restrictions

    Descriptive Statistics of Data Distribution and Semantic Similarity OA Type Paper Count Patent-Paper Pairs Max Similarity Min Similarity Median Similarity Mean Similarity Variance Bronze 116,028 388,650 0.9890 0.7194 0.8874 0.8857 0.0007 Closed 363,614 1,067,232 0.9940 0.7069 0.8893 0.8875 0.0007 Diamond 1,245 2,531 0.9536 0.7637 0.8934 0.8905 0.0007 Gold...

  2. [2]

    Cao, C., Chen, M., Tang, X., & Liu, H. (2024). Linking information technology use with corporate entrepreneurship: The mediation role of openness to external knowledge. Information Systems Journal, 35, 71 -

  3. [3]

    P., & Narin, F

    https://doi.org/10.1111/isj.12529 Carpenter, M. P., & Narin, F. (1983). Validation study: Patent citations as indicators of science and foreign dependence. World Patent Information, 5(3), 180-185. Carpenter, M. P., Cooper, M., & Narin, F. (1980). Linkage between basic research literature and patents. Research Management, 23(2), 30-35. Chen, L. (2017). Do ...

  4. [4]

    M., Waterstraat, J., & Moehrle, M

    https://doi.org/10.1111/isj.12375 Denter, N. M., Waterstraat, J., & Moehrle, M. G. (2025). Avoiding the pitfalls of direct linkage: A novelty-driven approach to measuring scientific impact on patents. Journal of Informetrics, 19(2), 101644. Dorta-González, P., Rodríguez-Caro, A., & Dorta-González, M. I. (2025). Linking science and industry: influence of s...

  5. [5]

    Reliance on Science by Inventors: Hybrid Extraction of In-text Patent-to-Article Citations

    Li, D., Azoulay, P., & Sampat, B. N. (2017). The applied value of public investments in biomedical research. Science, 356(6333), 78-81. M. Marx & A. Fuegi, "Reliance on Science by Inventors: Hybrid Extraction of In-text Patent-to-Article Citations." forthcoming in Journal of Economics and Management Strategy. (http://doi.org/10.1111/jems.12455) M. Marx, &...

  6. [6]

    https://doi.org/10.1371/journal.pone.0320347 Magerman, T., Van Looy, B., & Song, X. (2010). Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications. Scientometrics, 82(2), 289-306. Marx, M., & Fuegi, A. (2022). Reliance on science by inventors...

  7. [7]

    Openness

    https://doi.org/10.7717/peerj.4375 Poege, F., Harhoff, D., Gaessler, F., & Baruffaldi, S. (2019). Science quality and the value of inventions. Science advances, 5(12), eaay7323. Quemener, J., Miotti, L., & Maddi, A. (2024). Technological impact of funded research: A case study of nonpatent references. Quantitative Science Studies, 5(1), 170-186. Salton, G...

  8. [8]

    Y., Wang, C

    https://doi.org/10.1371/journal.pone.0272730 Sung, H. Y., Wang, C. C., Huang, M. H., & Chen, D. Z. (2015). Measuring science-based science linkage and non-science-based linkage of patents through non-patent references. Journal of Informetrics, 9(3), 488-498. Verluise, C., Cristelli, G., Higham, K., & de Rassenfosse, G. (2025). Beyond the front page: In ‐ ...

  9. [9]

    https://doi.org/10.1007/s44216-024-00038-0 Zhou, H., Liang, L., & Acuna, D. E. (2025). Widespread reference missingness disparities in open scholarly metadata. Quantitative Science Studies, 1-13. Zuiderwijk, A., Janssen, M., & Dwivedi, Y. (2015). Acceptance and use predictors of open data technologies: Drawing upon the unified theory of acceptance and use...