Recognition: unknown
AI Patents in the United States and China: Measurement, Organization, and Knowledge Flows
Pith reviewed 2026-05-10 16:09 UTC · model grok-4.3
The pith
A fine-tuned classifier identifies AI patents more accurately and shows converging innovation patterns between the United States and China with distinct organizational features and continued knowledge interdependence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We develop a high-precision classifier to measure AI patents by fine-tuning PatentSBERTa on manually labeled USPTO data. This classifier improves on existing methods with 97 percent precision and 94 percent F1 score while generalizing to Chinese patents. Applied to US patents from 1976-2023 and Chinese patents from 2010-2023, it documents rapid growth and convergence in patenting intensity and subfield composition even as China leads in annual counts. Organizational structures differ sharply with US activity concentrated in large private firms and hubs versus more diffuse Chinese activity involving universities and state-owned enterprises. AI patents carry a market-value premium for listed 1
What carries the argument
The PatentSBERTa-based classifier fine-tuned on the USPTO AI Patent Dataset, which provides high-precision identification of AI patents and extends reliably to Chinese patents through citation and lexical validation.
If this is right
- Both the US and China are experiencing rapid increases in AI patenting with converging technical focuses.
- The actors producing AI patents differ, with greater concentration in private incumbents in the US versus broader institutional participation in China.
- Holding AI patents is associated with higher market values for publicly listed firms in either country.
- Citation networks demonstrate ongoing knowledge exchange across borders rather than decoupling.
Where Pith is reading between the lines
- Applying similar classifiers to patents from other countries could reveal global AI innovation networks.
- The asymmetry in citations suggests that restrictions on technology transfer might not fully isolate Chinese AI development from frontier knowledge.
- Organizational differences may imply varying paths from patents to commercial products between the two systems.
- Subfield convergence could highlight areas where international standards or collaborations are likely to form.
Load-bearing premise
The manually labeled USPTO dataset trains a model whose performance on Chinese patents can be confirmed sufficiently through citation patterns and word usage without direct expert labels for the Chinese set.
What would settle it
Collecting a sample of Chinese patents, having human experts label them as AI-related or not, and measuring how closely the classifier matches those expert judgments.
Figures
read the original abstract
We develop a high-precision classifier to measure artificial intelligence (AI) patents by fine-tuning PatentSBERTa on manually labeled data from the USPTO's AI Patent Dataset. Our classifier substantially improves the existing USPTO approach, achieving 97.0% precision, 91.3% recall, and a 94.0% F1 score, and it generalizes well to Chinese patents based on citation and lexical validation. Applying it to granted U.S. patents (1976-2023) and Chinese patents (2010-2023), we document rapid growth in AI patenting in both countries and broad convergence in AI patenting intensity and subfield composition, even as China surpasses the United States in recent annual patent counts. The organization of AI innovation nevertheless differs sharply: U.S. AI patenting is concentrated among large private incumbents and established hubs, whereas Chinese AI patenting is more geographically diffuse and institutionally diverse, with larger roles for universities and state-owned enterprises. For listed firms, AI patents command a robust market-value premium in both countries. Cross-border citations show continued technological interdependence rather than decoupling, with Chinese AI inventors relying more heavily on U.S. frontier knowledge than vice versa.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper fine-tunes PatentSBERTa on the USPTO AI Patent Dataset to create a classifier for AI patents, reporting 97.0% precision, 91.3% recall, and 94.0% F1 on held-out USPTO data. It applies the classifier to U.S. patents (1976-2023) and Chinese patents (2010-2023), documenting rapid growth and convergence in AI patenting intensity and subfield composition (with China surpassing the U.S. in recent counts), while highlighting sharp differences in organizational structure (U.S. concentration in large private firms and hubs vs. China's more diffuse pattern involving universities and SOEs), a market-value premium for AI patents in both countries, and continued cross-border citation interdependence rather than decoupling.
Significance. If the classifier's performance on Chinese patents can be established with direct validation, the paper would deliver a valuable new dataset and set of stylized facts on the scale, institutional organization, and knowledge flows of AI innovation in the world's two largest patenting economies, with implications for understanding technological competition and convergence.
major comments (2)
- [Methodology / classifier validation] The claim that the classifier 'generalizes well to Chinese patents' (abstract and methodology section) rests solely on post-hoc citation overlap with U.S. AI patents and lexical similarity checks. No labeled Chinese test set, precision/recall metrics, or error analysis on the Chinese corpus is reported. This is load-bearing for all subsequent China-specific results on growth, subfield composition, organizational differences, and citation patterns.
- [Section on classifier development] The training and evaluation details (e.g., exact train/validation/test splits, fine-tuning hyperparameters, and robustness checks against domain shift) are not fully specified for the USPTO data, making it difficult to assess whether the reported 94% F1 is stable or sensitive to labeling choices.
minor comments (2)
- [Data and methods] Clarify the exact definition of 'AI patent' used in the USPTO training labels and how subfield categories are assigned.
- [Results on firm value] The market-value premium regressions would benefit from explicit discussion of potential selection into patenting and firm-level controls.
Simulated Author's Rebuttal
Thank you for the detailed and insightful referee report. We have carefully considered the comments and provide point-by-point responses below. We believe the suggested revisions will improve the clarity and robustness of our findings on AI patenting in the US and China.
read point-by-point responses
-
Referee: The claim that the classifier 'generalizes well to Chinese patents' (abstract and methodology section) rests solely on post-hoc citation overlap with U.S. AI patents and lexical similarity checks. No labeled Chinese test set, precision/recall metrics, or error analysis on the Chinese corpus is reported. This is load-bearing for all subsequent China-specific results on growth, subfield composition, organizational differences, and citation patterns.
Authors: We agree that direct validation on a labeled Chinese test set would be ideal and would strengthen the China-specific results. No equivalent high-quality labeled dataset for Chinese patents exists, and creating one lies beyond the scope of this study. Our generalization claim rests on indirect evidence from citation overlap with known U.S. AI patents and lexical similarity. In the revision we will add a dedicated limitations subsection, include manual error analysis on a random sample of 100 Chinese patents classified as AI, and revise language in the abstract and methods to describe the validation more cautiously as 'supporting evidence of generalization' rather than 'generalizes well.' This will increase transparency around the load-bearing assumptions. revision: partial
-
Referee: The training and evaluation details (e.g., exact train/validation/test splits, fine-tuning hyperparameters, and robustness checks against domain shift) are not fully specified for the USPTO data, making it difficult to assess whether the reported 94% F1 is stable or sensitive to labeling choices.
Authors: We thank the referee for this observation. The revised manuscript will fully specify the classifier development details, including the exact train/validation/test splits (70/15/15), all fine-tuning hyperparameters (learning rate 2e-5, 3 epochs, batch size 16), and additional robustness checks such as k-fold cross-validation and performance sensitivity across labeling thresholds and subfields. These additions will allow readers to evaluate the stability of the 94.0% F1 score. revision: yes
- Direct precision/recall metrics and error analysis on a labeled Chinese patent test set, as no such dataset was available or constructed for this study.
Circularity Check
No circularity: classifier trained and validated on independent labels with external proxies
full rationale
The paper's core measurement step fine-tunes PatentSBERTa on an externally labeled USPTO AI Patent Dataset and reports standard held-out precision/recall/F1 metrics. Extension to the Chinese corpus rests on post-hoc citation overlap and lexical similarity checks drawn from the patent databases themselves, not from any equation or definition internal to the paper. No derivation, prediction, or uniqueness claim reduces by construction to a fitted parameter, self-citation, or renamed input; the pipeline remains self-contained against external data sources and does not invoke load-bearing self-references.
Axiom & Free-Parameter Ledger
free parameters (1)
- fine-tuning hyperparameters
axioms (2)
- domain assumption The USPTO AI Patent Dataset labels are accurate and representative of AI-related inventions.
- ad hoc to paper Citation and lexical overlap patterns are valid proxies for classifier generalization to Chinese patents.
Reference graph
Works this paper leans on
-
[1]
and Gittelman, M
Alcacer, J. and Gittelman, M. (2006). Patent citations as a measure of knowledge flows: The influence of examiner citations.The review of economics and statistics, 88(4):774–779
2006
-
[2]
Allen, F., Cai, J., Gu, X., Qian, J., Zhao, L., Zhu, W., et al. (2024). Centralization or decen- tralization? the evolution of state-ownership in china.The Evolution of State-Ownership in China (October 20, 2024)
2024
-
[3]
Arora, A., Belenzon, S., and Patacconi, A. (2018). The decline of science in corporate r&d. Strategic Management Journal, 39(1):3–32
2018
-
[4]
Bekamiri, H., Hain, D. S., and Jurowetzki, R. (2021). Patentsberta: A deep nlp based hy- brid model for patent distance and classification using augmented sbert.arXiv preprint arXiv:2103.11933
-
[5]
Beltagy, I., Lo, K., and Cohan, A. (2019). Scibert: A pretrained language model for scientific text. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pages 3615–3620
2019
-
[6]
Y ., and Yuchtman, N
Beraja, M., Yang, D. Y ., and Yuchtman, N. (2023). Data-intensive innovation and the state: Evidence from ai firms in china.The Review of Economic Studies, 90(4):1701–1723
2023
-
[7]
and Huber, K
Biermann, M. and Huber, K. (2024). Tracing the international transmission of a crisis through multinational firms.The Journal of Finance, 79(3):1789–1829
2024
-
[8]
Bloom, N., Schankerman, M., and Van Reenen, J. (2013). Identifying technology spillovers and product market rivalry.Econometrica, 81(4):1347–1393
2013
-
[9]
W., Lu, Y ., Shi, H., and Zhu, W
Cong, L. W., Lu, Y ., Shi, H., and Zhu, W. (2024). Automation-induced innovation shift.Avail- able at SSRN 5049949
2024
-
[10]
and Motohashi, K
Dang, J. and Motohashi, K. (2015). Patent statistics: A good indicator for innovation in china? patent subsidy program impacts on patent quality.China Economic Review, 35:137–155
2015
-
[11]
and Leydesdorff, L
Etzkowitz, H. and Leydesdorff, L. (2000). The dynamics of innovation: from national systems and “mode 2” to a triple helix of university–industry–government relations.Research policy, 29(2):109–123
2000
-
[12]
M., Tao, H., and Zhang, Y
Fang, H., Song, Z. M., Tao, H., and Zhang, Y . (2021). An anatomy of the patent quality: China vs us.Working Paper. 29
2021
-
[13]
Fang, L., Lerner, J., Wu, C., and Zhang, Q. (2018). Corruption, government subsidies, and innovation: Evidence from china. Technical report, National Bureau of Economic Research
2018
-
[14]
V ., Pairolero, N
Giczy, A. V ., Pairolero, N. A., and Toole, A. A. (2022). Identifying artificial intelligence (ai) invention: A novel ai patent dataset.The Journal of Technology Transfer, 47(2):476–505
2022
-
[15]
and Jin, Z
Gofman, M. and Jin, Z. (2024). Artificial intelligence, education, and entrepreneurship.The Journal of Finance, 79(1):631–667
2024
-
[16]
Han, P., Jiang, W., and Mei, D. (2024). Mapping us–china technology decoupling: Policies, innovation, and firm performance.Management Science, 70(12):8386–8413
2024
-
[17]
Jiao, Y ., Zhou, G., and Zhu, W. (2021). Link complexity and cross-predictability.Available at SSRN 3512490
2021
- [18]
-
[19]
Kalyani, A., Bloom, N., Carvalho, C., Hassan, T., Lerner, J., and Tahoun, A. (2025). The diffusion of new technologies.The Quarterly Journal of Economics, 140(2):1299–1365
2025
-
[20]
Kelly, B., Papanikolaou, D., Seru, A., and Taddy, M. (2021). Measuring technological innova- tion over the long run.American Economic Review: Insights, 3(3):303–320
2021
-
[21]
Kogan, L., Papanikolaou, D., Seru, A., and Stoffman, N. (2017). Technological innovation, resource allocation, and growth.The quarterly journal of economics, 132(2):665–712
2017
-
[22]
Y ., Ma, Y ., and Zimmermann, K
Kwon, S. Y ., Ma, Y ., and Zimmermann, K. (2024). 100 years of rising corporate concentration. American Economic Review, 114(7):2111–2140
2024
-
[23]
A., Giczy, A
Pairolero, N. A., Giczy, A. V ., Torres, G., Islam Erana, T., Finlayson, M. A., and Toole, A. A. (2025). The artificial intelligence patent dataset (aipd) 2023 update.The Journal of Technol- ogy Transfer, pages 1–24
2025
-
[24]
Yes” indicatesy=1 while “Not
Trajtenberg, M. (1990). A penny for your quotes: patent citations and the value of innovations. The Rand journal of economics, pages 172–187. 30 Figure 1: Model Performance by AI subcategories This figure presents the distribution of predicted probabilities conditional on the true label of patents in the testing set using US patents. For each AI subcatego...
1990
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.