Recognition: 2 theorem links
· Lean TheoremDiscoverability matters: Open access models and the translation of science into patents
Pith reviewed 2026-05-14 01:12 UTC · model grok-4.3
The pith
Fully open access publications show equal or higher semantic alignment with patented technologies than hybrid or bronze models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Patent citations disproportionately draw on publications disseminated through highly visible and institutionally established publishing channels, particularly hybrid and bronze OA models, indicating strong selection effects. However, this dominance in citation counts does not translate into stronger cognitive alignment with patented technologies. On the contrary, publications in fully OA journals (gold and diamond OA) exhibit equal or higher semantic proximity, especially when cited in the body of patents.
What carries the argument
Semantic similarity scores between patent abstracts and cited publication abstracts, separated by whether the citation appears in the front section or body text of the patent.
If this is right
- Selection effects in patent citations favor publications from hybrid and bronze OA models due to their visibility.
- Fully OA publications contribute to innovation at least as much through cognitive alignment as through raw citation volume.
- The translation of science into patents hinges on how publishing models are embedded in discoverability infrastructures.
- Body citations within patents reveal stronger alignment for fully OA work than front-section citations do.
Where Pith is reading between the lines
- Indexing and search systems may matter as much as open-access policies for moving knowledge into inventions.
- Lower overall citation counts for full OA could mask higher per-citation relevance in actual technology development.
- Extending the analysis to citation networks or field-specific patent classes could test whether the pattern holds beyond abstracts.
Load-bearing premise
Semantic similarity computed from abstracts accurately captures the cognitive alignment between scientific publications and the technologies described in patents.
What would settle it
A follow-up analysis using full-text similarity measures or expert technical review that finds lower or equal alignment for gold and diamond OA papers compared with hybrid and bronze models.
read the original abstract
Scientific research is a key input into technological innovation, yet not all scientific knowledge is equally mobilized in patents. This paper examines how different scientific publishing models shape both the selection of scientific publications cited in patents and their cognitive alignment with patented technologies. Using large-scale data on non-patent references linking patents to scientific publications, combined with metadata from OpenAlex, we compare the Open Access (OA) structure of patent-cited science to that of the scientific literature. We then assess cognitive alignment using semantic similarity between patent abstracts and the abstracts of cited publications, distinguishing between citations appearing in the front section of patents and those embedded in the body of patent texts. We find that patent citations disproportionately draw on publications disseminated through highly visible and institutionally established publishing channels, particularly hybrid and bronze OA models, indicating strong selection effects. However, this dominance in citation counts does not translate into stronger cognitive alignment with patented technologies. On the contrary, publications in fully OA journals (gold and diamond OA) exhibit equal or higher semantic proximity, especially when cited in the body of patents. These results suggest that the contribution of OA to innovation depends less on access alone than on how different publishing models are embedded in information infrastructures that shape the visibility, discoverability, and use of scientific knowledge.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines how open access (OA) publishing models influence both the selection of scientific publications cited in patents and their cognitive alignment with patented technologies. Using large-scale non-patent references from patents linked to OpenAlex metadata, it compares the OA structure of cited science versus the broader literature and measures semantic similarity between patent abstracts and publication abstracts, distinguishing front-page versus body citations. The central claims are that patent citations disproportionately favor hybrid and bronze OA publications (indicating selection effects) but that fully OA (gold and diamond) publications exhibit equal or higher semantic proximity to the citing patents, especially in body citations, implying that discoverability and information infrastructures matter more than access alone for science-to-patent translation.
Significance. If the results hold after addressing methodological gaps, the paper provides valuable large-scale evidence that OA models affect not only citation volume but the substantive linkage between science and innovation. The distinction between front and body citations and the use of external patent and OpenAlex datasets are strengths that allow falsifiable claims about discoverability effects. This has direct implications for innovation policy, research evaluation, and OA mandates, moving beyond simple access arguments to infrastructure and visibility mechanisms.
major comments (3)
- [Methods (semantic similarity)] Methods section on semantic similarity: The headline result that gold and diamond OA publications show equal or higher proximity rests on abstract-to-abstract semantic similarity. Abstracts are short, audience-specific summaries (legal claims vs. scientific contributions), so measured proximity risks lexical overlap within fields rather than substantive technological alignment. The manuscript does not report full-text validation, inter-annotator checks, or robustness to alternative embeddings.
- [Results (citation shares)] Results section on citation shares: The analysis shows patent citations draw disproportionately from hybrid/bronze OA but does not include field-year fixed effects or disciplinary composition controls. Without these, OA-type differences may reflect the distribution of fields rather than discoverability effects, undermining the contrast with the overall literature.
- [Data and methods] Data and robustness: The abstract and methods description omit sample sizes, statistical controls, error handling, and robustness checks (e.g., alternative similarity thresholds or citation-type subsamples). These details are load-bearing for the claim that fully OA models exhibit higher body-citation proximity.
minor comments (2)
- [Introduction] Notation for OA categories (gold, diamond, hybrid, bronze) should be defined explicitly on first use with reference to the OpenAlex classification scheme.
- [Results figures] Figure legends for semantic similarity distributions lack axis labels and sample sizes per OA category.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. These have highlighted important areas for strengthening the methodological transparency and robustness of our analysis. We address each major comment below and indicate the revisions we will implement.
read point-by-point responses
-
Referee: [Methods (semantic similarity)] Methods section on semantic similarity: The headline result that gold and diamond OA publications show equal or higher proximity rests on abstract-to-abstract semantic similarity. Abstracts are short, audience-specific summaries (legal claims vs. scientific contributions), so measured proximity risks lexical overlap within fields rather than substantive technological alignment. The manuscript does not report full-text validation, inter-annotator checks, or robustness to alternative embeddings.
Authors: We acknowledge that abstracts are concise summaries and that abstract-to-abstract similarity may partly reflect lexical patterns rather than deep technological alignment. At the scale of our dataset, full-text analysis across all patent-publication pairs is not feasible due to computational costs and access restrictions for many documents. Abstract embeddings remain a standard proxy in large-scale scientometric studies of science-technology linkages. In the revision we will (i) add an explicit discussion of this limitation, (ii) report robustness results using alternative embedding models, and (iii) include a manual validation exercise on a random subsample of pairs with inter-annotator agreement metrics. revision: partial
-
Referee: [Results (citation shares)] Results section on citation shares: The analysis shows patent citations draw disproportionately from hybrid/bronze OA but does not include field-year fixed effects or disciplinary composition controls. Without these, OA-type differences may reflect the distribution of fields rather than discoverability effects, undermining the contrast with the overall literature.
Authors: We agree that field and year composition must be controlled to isolate discoverability effects. In the revised manuscript we will re-estimate the citation-share models with field-year fixed effects and additional disciplinary controls, reporting both the original and the controlled specifications side by side. revision: yes
-
Referee: [Data and methods] Data and robustness: The abstract and methods description omit sample sizes, statistical controls, error handling, and robustness checks (e.g., alternative similarity thresholds or citation-type subsamples). These details are load-bearing for the claim that fully OA models exhibit higher body-citation proximity.
Authors: We will substantially expand the Data and Methods section to report exact sample sizes for every analysis, list all statistical controls and clustering procedures, describe error handling, and add the requested robustness checks (alternative similarity thresholds and separate front-page versus body-citation subsamples). revision: yes
Circularity Check
No circularity: purely empirical observational analysis
full rationale
The paper conducts an empirical comparison of OA publishing models against patent citation patterns and semantic similarity scores derived from external datasets (OpenAlex and patent references). No mathematical derivations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the methodology or results. All reported findings (e.g., higher semantic proximity for gold/diamond OA in body citations) are direct outputs of data processing steps that remain independent of the target claims. The analysis is self-contained against external benchmarks and does not reduce any result to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Semantic similarity between abstracts measures cognitive alignment with patented technologies
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Figure 2... log2 enrichment... hybrid and bronze OA... diamond OA sharply under-represented
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Descriptive Statistics of Data Distribution and Semantic Similarity OA Type Paper Count Patent-Paper Pairs Max Similarity Min Similarity Median Similarity Mean Similarity Variance Bronze 116,028 388,650 0.9890 0.7194 0.8874 0.8857 0.0007 Closed 363,614 1,067,232 0.9940 0.7069 0.8893 0.8875 0.0007 Diamond 1,245 2,531 0.9536 0.7637 0.8934 0.8905 0.0007 Gold...
work page 2009
-
[2]
Cao, C., Chen, M., Tang, X., & Liu, H. (2024). Linking information technology use with corporate entrepreneurship: The mediation role of openness to external knowledge. Information Systems Journal, 35, 71 -
work page 2024
-
[3]
https://doi.org/10.1111/isj.12529 Carpenter, M. P., & Narin, F. (1983). Validation study: Patent citations as indicators of science and foreign dependence. World Patent Information, 5(3), 180-185. Carpenter, M. P., Cooper, M., & Narin, F. (1980). Linkage between basic research literature and patents. Research Management, 23(2), 30-35. Chen, L. (2017). Do ...
-
[4]
M., Waterstraat, J., & Moehrle, M
https://doi.org/10.1111/isj.12375 Denter, N. M., Waterstraat, J., & Moehrle, M. G. (2025). Avoiding the pitfalls of direct linkage: A novelty-driven approach to measuring scientific impact on patents. Journal of Informetrics, 19(2), 101644. Dorta-González, P., Rodríguez-Caro, A., & Dorta-González, M. I. (2025). Linking science and industry: influence of s...
-
[5]
Reliance on Science by Inventors: Hybrid Extraction of In-text Patent-to-Article Citations
Li, D., Azoulay, P., & Sampat, B. N. (2017). The applied value of public investments in biomedical research. Science, 356(6333), 78-81. M. Marx & A. Fuegi, "Reliance on Science by Inventors: Hybrid Extraction of In-text Patent-to-Article Citations." forthcoming in Journal of Economics and Management Strategy. (http://doi.org/10.1111/jems.12455) M. Marx, &...
-
[6]
https://doi.org/10.1371/journal.pone.0320347 Magerman, T., Van Looy, B., & Song, X. (2010). Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications. Scientometrics, 82(2), 289-306. Marx, M., & Fuegi, A. (2022). Reliance on science by inventors...
-
[7]
https://doi.org/10.7717/peerj.4375 Poege, F., Harhoff, D., Gaessler, F., & Baruffaldi, S. (2019). Science quality and the value of inventions. Science advances, 5(12), eaay7323. Quemener, J., Miotti, L., & Maddi, A. (2024). Technological impact of funded research: A case study of nonpatent references. Quantitative Science Studies, 5(1), 170-186. Salton, G...
-
[8]
https://doi.org/10.1371/journal.pone.0272730 Sung, H. Y., Wang, C. C., Huang, M. H., & Chen, D. Z. (2015). Measuring science-based science linkage and non-science-based linkage of patents through non-patent references. Journal of Informetrics, 9(3), 488-498. Verluise, C., Cristelli, G., Higham, K., & de Rassenfosse, G. (2025). Beyond the front page: In ‐ ...
-
[9]
https://doi.org/10.1007/s44216-024-00038-0 Zhou, H., Liang, L., & Acuna, D. E. (2025). Widespread reference missingness disparities in open scholarly metadata. Quantitative Science Studies, 1-13. Zuiderwijk, A., Janssen, M., & Dwivedi, Y. (2015). Acceptance and use predictors of open data technologies: Drawing upon the unified theory of acceptance and use...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.