pith. machine review for the scientific record. sign in

arxiv: 2604.10529 · v1 · submitted 2026-04-12 · 💰 econ.GN · cs.AI· cs.CL· q-fin.EC· q-fin.GN

Recognition: unknown

AI Patents in the United States and China: Measurement, Organization, and Knowledge Flows

Hanming Fang, Hanyin Yan, Wu Zhu, Xian Gu

Pith reviewed 2026-05-10 16:09 UTC · model grok-4.3

classification 💰 econ.GN cs.AIcs.CLq-fin.ECq-fin.GN
keywords AI patentspatent classificationinnovation measurementUS-China comparisonknowledge flowstechnological convergencemarket valueintellectual property
0
0 comments X

The pith

A fine-tuned classifier identifies AI patents more accurately and shows converging innovation patterns between the United States and China with distinct organizational features and continued knowledge interdependence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors build a machine learning model that classifies patents as AI-related with greater accuracy than the USPTO's current method by training on manually labeled examples. They apply this tool to large collections of US and Chinese patents to measure growth, composition, and organization of AI innovation over time. The results indicate that both nations are increasing AI patenting at similar rates and shifting toward comparable subfields, though China has higher recent volumes. US patents concentrate among large private companies in tech centers while Chinese ones involve more diverse institutions like universities and state firms across wider areas. These patents boost company values in both countries and citation patterns reveal ongoing reliance on each other's technological advances rather than separation.

Core claim

We develop a high-precision classifier to measure AI patents by fine-tuning PatentSBERTa on manually labeled USPTO data. This classifier improves on existing methods with 97 percent precision and 94 percent F1 score while generalizing to Chinese patents. Applied to US patents from 1976-2023 and Chinese patents from 2010-2023, it documents rapid growth and convergence in patenting intensity and subfield composition even as China leads in annual counts. Organizational structures differ sharply with US activity concentrated in large private firms and hubs versus more diffuse Chinese activity involving universities and state-owned enterprises. AI patents carry a market-value premium for listed 1

What carries the argument

The PatentSBERTa-based classifier fine-tuned on the USPTO AI Patent Dataset, which provides high-precision identification of AI patents and extends reliably to Chinese patents through citation and lexical validation.

If this is right

  • Both the US and China are experiencing rapid increases in AI patenting with converging technical focuses.
  • The actors producing AI patents differ, with greater concentration in private incumbents in the US versus broader institutional participation in China.
  • Holding AI patents is associated with higher market values for publicly listed firms in either country.
  • Citation networks demonstrate ongoing knowledge exchange across borders rather than decoupling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Applying similar classifiers to patents from other countries could reveal global AI innovation networks.
  • The asymmetry in citations suggests that restrictions on technology transfer might not fully isolate Chinese AI development from frontier knowledge.
  • Organizational differences may imply varying paths from patents to commercial products between the two systems.
  • Subfield convergence could highlight areas where international standards or collaborations are likely to form.

Load-bearing premise

The manually labeled USPTO dataset trains a model whose performance on Chinese patents can be confirmed sufficiently through citation patterns and word usage without direct expert labels for the Chinese set.

What would settle it

Collecting a sample of Chinese patents, having human experts label them as AI-related or not, and measuring how closely the classifier matches those expert judgments.

Figures

Figures reproduced from arXiv: 2604.10529 by Hanming Fang, Hanyin Yan, Wu Zhu, Xian Gu.

Figure 4
Figure 4. Figure 4: Model Performance on Chinese AI Patent Classification [PITH_FULL_IMAGE:figures/full_fig_p035_4.png] view at source ↗
Figure 10
Figure 10. Figure 10: Geographic Distribution of AI Patents This figure shows the geographic distribution of AI patents in the US and China across four time periods. Panel (a) plots the spatial locations of assignees for US AI patents, and Panel (b) presents the corresponding distribution for Chinese AI patents. Each map displays a kernel-based relative density surface, where larger markers indicate higher local concentrations… view at source ↗
read the original abstract

We develop a high-precision classifier to measure artificial intelligence (AI) patents by fine-tuning PatentSBERTa on manually labeled data from the USPTO's AI Patent Dataset. Our classifier substantially improves the existing USPTO approach, achieving 97.0% precision, 91.3% recall, and a 94.0% F1 score, and it generalizes well to Chinese patents based on citation and lexical validation. Applying it to granted U.S. patents (1976-2023) and Chinese patents (2010-2023), we document rapid growth in AI patenting in both countries and broad convergence in AI patenting intensity and subfield composition, even as China surpasses the United States in recent annual patent counts. The organization of AI innovation nevertheless differs sharply: U.S. AI patenting is concentrated among large private incumbents and established hubs, whereas Chinese AI patenting is more geographically diffuse and institutionally diverse, with larger roles for universities and state-owned enterprises. For listed firms, AI patents command a robust market-value premium in both countries. Cross-border citations show continued technological interdependence rather than decoupling, with Chinese AI inventors relying more heavily on U.S. frontier knowledge than vice versa.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper fine-tunes PatentSBERTa on the USPTO AI Patent Dataset to create a classifier for AI patents, reporting 97.0% precision, 91.3% recall, and 94.0% F1 on held-out USPTO data. It applies the classifier to U.S. patents (1976-2023) and Chinese patents (2010-2023), documenting rapid growth and convergence in AI patenting intensity and subfield composition (with China surpassing the U.S. in recent counts), while highlighting sharp differences in organizational structure (U.S. concentration in large private firms and hubs vs. China's more diffuse pattern involving universities and SOEs), a market-value premium for AI patents in both countries, and continued cross-border citation interdependence rather than decoupling.

Significance. If the classifier's performance on Chinese patents can be established with direct validation, the paper would deliver a valuable new dataset and set of stylized facts on the scale, institutional organization, and knowledge flows of AI innovation in the world's two largest patenting economies, with implications for understanding technological competition and convergence.

major comments (2)
  1. [Methodology / classifier validation] The claim that the classifier 'generalizes well to Chinese patents' (abstract and methodology section) rests solely on post-hoc citation overlap with U.S. AI patents and lexical similarity checks. No labeled Chinese test set, precision/recall metrics, or error analysis on the Chinese corpus is reported. This is load-bearing for all subsequent China-specific results on growth, subfield composition, organizational differences, and citation patterns.
  2. [Section on classifier development] The training and evaluation details (e.g., exact train/validation/test splits, fine-tuning hyperparameters, and robustness checks against domain shift) are not fully specified for the USPTO data, making it difficult to assess whether the reported 94% F1 is stable or sensitive to labeling choices.
minor comments (2)
  1. [Data and methods] Clarify the exact definition of 'AI patent' used in the USPTO training labels and how subfield categories are assigned.
  2. [Results on firm value] The market-value premium regressions would benefit from explicit discussion of potential selection into patenting and firm-level controls.

Simulated Author's Rebuttal

2 responses · 1 unresolved

Thank you for the detailed and insightful referee report. We have carefully considered the comments and provide point-by-point responses below. We believe the suggested revisions will improve the clarity and robustness of our findings on AI patenting in the US and China.

read point-by-point responses
  1. Referee: The claim that the classifier 'generalizes well to Chinese patents' (abstract and methodology section) rests solely on post-hoc citation overlap with U.S. AI patents and lexical similarity checks. No labeled Chinese test set, precision/recall metrics, or error analysis on the Chinese corpus is reported. This is load-bearing for all subsequent China-specific results on growth, subfield composition, organizational differences, and citation patterns.

    Authors: We agree that direct validation on a labeled Chinese test set would be ideal and would strengthen the China-specific results. No equivalent high-quality labeled dataset for Chinese patents exists, and creating one lies beyond the scope of this study. Our generalization claim rests on indirect evidence from citation overlap with known U.S. AI patents and lexical similarity. In the revision we will add a dedicated limitations subsection, include manual error analysis on a random sample of 100 Chinese patents classified as AI, and revise language in the abstract and methods to describe the validation more cautiously as 'supporting evidence of generalization' rather than 'generalizes well.' This will increase transparency around the load-bearing assumptions. revision: partial

  2. Referee: The training and evaluation details (e.g., exact train/validation/test splits, fine-tuning hyperparameters, and robustness checks against domain shift) are not fully specified for the USPTO data, making it difficult to assess whether the reported 94% F1 is stable or sensitive to labeling choices.

    Authors: We thank the referee for this observation. The revised manuscript will fully specify the classifier development details, including the exact train/validation/test splits (70/15/15), all fine-tuning hyperparameters (learning rate 2e-5, 3 epochs, batch size 16), and additional robustness checks such as k-fold cross-validation and performance sensitivity across labeling thresholds and subfields. These additions will allow readers to evaluate the stability of the 94.0% F1 score. revision: yes

standing simulated objections not resolved
  • Direct precision/recall metrics and error analysis on a labeled Chinese patent test set, as no such dataset was available or constructed for this study.

Circularity Check

0 steps flagged

No circularity: classifier trained and validated on independent labels with external proxies

full rationale

The paper's core measurement step fine-tunes PatentSBERTa on an externally labeled USPTO AI Patent Dataset and reports standard held-out precision/recall/F1 metrics. Extension to the Chinese corpus rests on post-hoc citation overlap and lexical similarity checks drawn from the patent databases themselves, not from any equation or definition internal to the paper. No derivation, prediction, or uniqueness claim reduces by construction to a fitted parameter, self-citation, or renamed input; the pipeline remains self-contained against external data sources and does not invoke load-bearing self-references.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The measurement pipeline rests on the accuracy of the USPTO-provided labeled training set and the untested assumption that lexical and citation similarity suffice to transfer the classifier to Chinese patents; no new physical entities or ad-hoc constants are introduced.

free parameters (1)
  • fine-tuning hyperparameters
    Learning rate, batch size, and epochs for PatentSBERTa are chosen to maximize F1 on the labeled USPTO data.
axioms (2)
  • domain assumption The USPTO AI Patent Dataset labels are accurate and representative of AI-related inventions.
    These labels form the sole supervised training signal for the classifier.
  • ad hoc to paper Citation and lexical overlap patterns are valid proxies for classifier generalization to Chinese patents.
    No direct human-labeled Chinese test set is mentioned; validation relies on these indirect signals.

pith-pipeline@v0.9.0 · 5531 in / 1459 out tokens · 41095 ms · 2026-05-10T16:09:57.593899+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 2 canonical work pages

  1. [1]

    and Gittelman, M

    Alcacer, J. and Gittelman, M. (2006). Patent citations as a measure of knowledge flows: The influence of examiner citations.The review of economics and statistics, 88(4):774–779

  2. [2]

    Allen, F., Cai, J., Gu, X., Qian, J., Zhao, L., Zhu, W., et al. (2024). Centralization or decen- tralization? the evolution of state-ownership in china.The Evolution of State-Ownership in China (October 20, 2024)

  3. [3]

    Arora, A., Belenzon, S., and Patacconi, A. (2018). The decline of science in corporate r&d. Strategic Management Journal, 39(1):3–32

  4. [4]

    S., and Jurowetzki, R

    Bekamiri, H., Hain, D. S., and Jurowetzki, R. (2021). Patentsberta: A deep nlp based hy- brid model for patent distance and classification using augmented sbert.arXiv preprint arXiv:2103.11933

  5. [5]

    Beltagy, I., Lo, K., and Cohan, A. (2019). Scibert: A pretrained language model for scientific text. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pages 3615–3620

  6. [6]

    Y ., and Yuchtman, N

    Beraja, M., Yang, D. Y ., and Yuchtman, N. (2023). Data-intensive innovation and the state: Evidence from ai firms in china.The Review of Economic Studies, 90(4):1701–1723

  7. [7]

    and Huber, K

    Biermann, M. and Huber, K. (2024). Tracing the international transmission of a crisis through multinational firms.The Journal of Finance, 79(3):1789–1829

  8. [8]

    Bloom, N., Schankerman, M., and Van Reenen, J. (2013). Identifying technology spillovers and product market rivalry.Econometrica, 81(4):1347–1393

  9. [9]

    W., Lu, Y ., Shi, H., and Zhu, W

    Cong, L. W., Lu, Y ., Shi, H., and Zhu, W. (2024). Automation-induced innovation shift.Avail- able at SSRN 5049949

  10. [10]

    and Motohashi, K

    Dang, J. and Motohashi, K. (2015). Patent statistics: A good indicator for innovation in china? patent subsidy program impacts on patent quality.China Economic Review, 35:137–155

  11. [11]

    and Leydesdorff, L

    Etzkowitz, H. and Leydesdorff, L. (2000). The dynamics of innovation: from national systems and “mode 2” to a triple helix of university–industry–government relations.Research policy, 29(2):109–123

  12. [12]

    M., Tao, H., and Zhang, Y

    Fang, H., Song, Z. M., Tao, H., and Zhang, Y . (2021). An anatomy of the patent quality: China vs us.Working Paper. 29

  13. [13]

    Fang, L., Lerner, J., Wu, C., and Zhang, Q. (2018). Corruption, government subsidies, and innovation: Evidence from china. Technical report, National Bureau of Economic Research

  14. [14]

    V ., Pairolero, N

    Giczy, A. V ., Pairolero, N. A., and Toole, A. A. (2022). Identifying artificial intelligence (ai) invention: A novel ai patent dataset.The Journal of Technology Transfer, 47(2):476–505

  15. [15]

    and Jin, Z

    Gofman, M. and Jin, Z. (2024). Artificial intelligence, education, and entrepreneurship.The Journal of Finance, 79(1):631–667

  16. [16]

    Han, P., Jiang, W., and Mei, D. (2024). Mapping us–china technology decoupling: Policies, innovation, and firm performance.Management Science, 70(12):8386–8413

  17. [17]

    Jiao, Y ., Zhou, G., and Zhu, W. (2021). Link complexity and cross-predictability.Available at SSRN 3512490

  18. [18]

    Jurowetzki, R., Hain, D., Mateos-Garcia, J., and Stathoulopoulos, K. (2021). The privatization of ai research (-ers): Causes and potential consequences–from university-industry interaction to public research brain-drain?arXiv preprint arXiv:2102.01648

  19. [19]

    Kalyani, A., Bloom, N., Carvalho, C., Hassan, T., Lerner, J., and Tahoun, A. (2025). The diffusion of new technologies.The Quarterly Journal of Economics, 140(2):1299–1365

  20. [20]

    Kelly, B., Papanikolaou, D., Seru, A., and Taddy, M. (2021). Measuring technological innova- tion over the long run.American Economic Review: Insights, 3(3):303–320

  21. [21]

    Kogan, L., Papanikolaou, D., Seru, A., and Stoffman, N. (2017). Technological innovation, resource allocation, and growth.The quarterly journal of economics, 132(2):665–712

  22. [22]

    Y ., Ma, Y ., and Zimmermann, K

    Kwon, S. Y ., Ma, Y ., and Zimmermann, K. (2024). 100 years of rising corporate concentration. American Economic Review, 114(7):2111–2140

  23. [23]

    A., Giczy, A

    Pairolero, N. A., Giczy, A. V ., Torres, G., Islam Erana, T., Finlayson, M. A., and Toole, A. A. (2025). The artificial intelligence patent dataset (aipd) 2023 update.The Journal of Technol- ogy Transfer, pages 1–24

  24. [24]

    Yes” indicatesy=1 while “Not

    Trajtenberg, M. (1990). A penny for your quotes: patent citations and the value of innovations. The Rand journal of economics, pages 172–187. 30 Figure 1: Model Performance by AI subcategories This figure presents the distribution of predicted probabilities conditional on the true label of patents in the testing set using US patents. For each AI subcatego...