pith. sign in

arxiv: 2606.23495 · v1 · pith:2QUMHNTPnew · submitted 2026-06-22 · 💻 cs.SE

Ensuring Open Source Integrity: The Intersection of Copy-Based Reuse and License Compliance

Pith reviewed 2026-06-26 07:18 UTC · model grok-4.3

classification 💻 cs.SE
keywords copy-based reuselicense complianceopen sourcesoftware reuselicense noncompliancedependency analysiscode copying
0
0 comments X

The pith

Nearly two in five instances of copy-based code reuse across open source projects carry a potential license noncompliance risk.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a network of direct code copies between open source repositories to measure license issues that arise from such reuse. It reports that 39.4 percent of project combinations involving copies are at risk, especially when licenses are unclear. Reuse happens more from projects with permissive licenses like MIT and Apache. Standard dependency tools catch only 2.43 percent of this copying activity. This matters because it shows that much of the legal compliance burden in software development goes undetected by current methods.

Core claim

Using a large approximation of open source code, the authors map instances of direct copying between projects and quantify that 39.4% of such project pairs risk license noncompliance. They further model that code from permissive licenses is more likely to be copied, while public domain licenses see less reuse, and that dependency analysis reveals only a small fraction of the copying.

What carries the argument

the copy-based code reuse network that maps direct copying across projects

Load-bearing premise

The method used to detect copied code accurately distinguishes actual copying from coincidental similarities, and the license information extracted from projects is accurate enough to assess noncompliance.

What would settle it

A manual audit of a sample of flagged project pairs that finds either no evidence of copying or correct license compliance in most cases would undermine the risk estimate.

Figures

Figures reproduced from arXiv: 2606.23495 by Audris Mockus, Bogdan Vasilescu, Mahmoud Jahanshahi.

Figure 1
Figure 1. Figure 1: Simple Model - Odds Ratios and 95% Confidence Intervals. [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Full Model - Odds Ratios and 95% Confidence Intervals. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Top 10 License Types - 1 Reused Blob, High Sensitivity [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 6
Figure 6. Figure 6: Top 10 License Types - 10 Reused Blobs, Low Sensitivity [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Top 10 License Types - 10 Reused Blobs, High Sensitivity [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
read the original abstract

As other creative work, source code is protected by copyright. The owner can license the work, e.g., to permit copy and other kinds of use, and even start legal proceeding against license violators. However, source code can be reused in subtle ways, e.g., via copying without explicit package manager dependencies, making it hard to reason about potential license noncompliance. Using the World of Code infrastructure approximating the entirety of open source software, in this paper we create a copy-based code reuse network mapping direct copying across projects, and use it to quantify the extent of potential license noncompliance across the entire open source ecosystem. In addition, we estimate regression models to understand whether code copying is affected by the origin project's license, and, if so, how it varies with other project characteristics. We find that code in repositories with permissive licenses, such as MIT and Apache, shows higher likelihood of reuse across programming languages. In contrast, copyleft licenses, like the GPL, exhibit mixed effects. Public domain licenses, despite their aim of allowing unrestricted use, are associated with lower likelihood of copy-based reuse. A widespread potential license noncompliance appears to accompany copy-based reuse, with 39.4% of project combinations at potential noncompliance risk, particularly when licenses are unclear or absent. Our findings reveal that only 2.43% of reuse detected through the copy-based network was discoverable via dependency analysis, highlighting the limitations of existing dependency-tracking tools in capturing copy-based reuse.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper uses the World of Code infrastructure to build a copy-based code reuse network across OSS projects and quantifies potential license noncompliance, reporting that 39.4% of project combinations are at risk (especially with unclear/absent licenses). Regression models examine how origin-project license type affects copy likelihood (higher for MIT/Apache, mixed for GPL, lower for public domain). Only 2.43% of detected reuse is captured by dependency analysis, underscoring limitations of package-manager tools.

Significance. If the copy detector and license metadata prove reliable, the 39.4% figure and the dependency-gap result would provide a large-scale empirical basis for the prevalence of hidden license risks in direct code copying, with direct implications for compliance tooling and OSS governance. The license-type regressions add nuance on reuse incentives. The scale of the World of Code data is a methodological strength for ecosystem claims.

major comments (2)
  1. [Abstract / Methods (copy detection)] Abstract and methods description: the central 39.4% noncompliance-risk statistic rests on the copy detector correctly identifying direct copying events rather than coincidental similarity. No precision/recall figures, manual validation sample size, or error analysis on the detector or license extractor are reported; without these the percentage cannot be interpreted as a robust ecosystem-wide estimate.
  2. [Abstract / Regression analysis] Abstract: the regression claims (permissive licenses increase reuse likelihood; copyleft shows mixed effects) are load-bearing for the secondary contribution, yet no model specification, controls for project characteristics, sample sizes, or robustness checks are described, preventing assessment of whether the reported patterns are driven by the license variable or by confounding factors.
minor comments (1)
  1. [Abstract] The exact definition of 'project combinations' used to compute the 39.4% figure should be stated explicitly (e.g., how pairs are sampled and filtered) to support replication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's feedback on the methodological rigor of our study. We address the major comments below and plan revisions accordingly.

read point-by-point responses
  1. Referee: [Abstract / Methods (copy detection)] Abstract and methods description: the central 39.4% noncompliance-risk statistic rests on the copy detector correctly identifying direct copying events rather than coincidental similarity. No precision/recall figures, manual validation sample size, or error analysis on the detector or license extractor are reported; without these the percentage cannot be interpreted as a robust ecosystem-wide estimate.

    Authors: The copy detection method is based on established techniques in the World of Code infrastructure, which has been used and validated in multiple prior studies for identifying code reuse at scale. While we did not report specific precision and recall for this particular application in the current manuscript, we can provide additional details on the detector's parameters and any internal validation. We will revise the methods section to include a discussion of potential false positives in copy detection and how they might affect the 39.4% estimate, along with any available error analysis. revision: yes

  2. Referee: [Abstract / Regression analysis] Abstract: the regression claims (permissive licenses increase reuse likelihood; copyleft shows mixed effects) are load-bearing for the secondary contribution, yet no model specification, controls for project characteristics, sample sizes, or robustness checks are described, preventing assessment of whether the reported patterns are driven by the license variable or by confounding factors.

    Authors: The full manuscript describes the regression models in the Methods section, including the use of logistic regression with controls for project size, age, and primary programming language. Sample sizes are provided in the results tables. However, to better address potential confounding, we will add explicit robustness checks (e.g., alternative model specifications and subsample analyses) in the revision. This will clarify that the license effects hold after accounting for other project characteristics. revision: yes

Circularity Check

0 steps flagged

No circularity; statistics derived from external infrastructure data

full rationale

The paper's central quantitative results, including the 39.4% noncompliance risk and regression findings on license effects, are computed directly from the World of Code copy-detection network and license metadata as external inputs. No derivation step reduces by construction to a fitted parameter, self-definition, or self-citation chain; the reported figures are empirical aggregates and model outputs on observed project combinations rather than tautological renamings or predictions forced by the analysis itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the accuracy of copy detection and license metadata extraction inside World of Code; these are treated as given rather than validated within the reported work.

axioms (1)
  • domain assumption World of Code accurately detects direct code copying events across projects and provides reliable license metadata for compliance assessment.
    Invoked when constructing the reuse network and when labeling noncompliance risk.

pith-pipeline@v0.9.1-grok · 5798 in / 1214 out tokens · 37886 ms · 2026-06-26T07:18:44.877051+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. File-Level Copying Is an Implicit Dependency in Open Source

    cs.SE 2026-07 unverdicted novelty 6.0

    File-level copying acts as an implicit dependency in open source, removing provenance signals and concentrating security risks in vendored copies and license risks in direct source reuse.

Reference graph

Works this paper leans on

56 extracted references · cited by 1 Pith paper

  1. [1]

    Do software developers understand open source licenses?

    D. A. Almeida, G. C. Murphy, G. Wilson, and M. Hoye, “Do software developers understand open source licenses?” In2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), IEEE, 2017, pp. 1–11

  2. [2]

    Codeipprompt: Intellectual property infringement assessment of code language models,

    Z. Yu, Y . Wu, N. Zhang, C. Wang, Y . V orobeychik, and C. Xiao, “Codeipprompt: Intellectual property infringement assessment of code language models,” inInternational conference on machine learning, PMLR, 2023, pp. 40 373–40 389

  3. [3]

    A first look at license compliance capability of llms in code generation,

    W. Xu, K. Gao, H. He, and M. Zhou, “A first look at license compliance capability of llms in code generation,”arXiv preprint arXiv:2408.02487, 2024

  4. [4]

    Cracks in the stack: Hidden vul- nerabilities and licensing risks in llm pre-training datasets,

    M. Jahanshahi and A. Mockus, “Cracks in the stack: Hidden vul- nerabilities and licensing risks in llm pre-training datasets,” in2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code), IEEE, 2025, pp. 104–111

  5. [5]

    To distribute or not to distribute? why licensing bugs matter,

    C. Vendome, D. M. German, M. Di Penta, G. Bavota, M. Linares- V´asquez, and D. Poshyvanyk, “To distribute or not to distribute? why licensing bugs matter,” inProceedings of the 40th International Conference on Software Engineering, 2018, pp. 268–279

  6. [6]

    Tool support for open source software license compli- ance: The first two decades of the millennium,

    T. Tuunanen, “Tool support for open source software license compli- ance: The first two decades of the millennium,”JYU dissertations, 2021

  7. [7]

    Beyond dependencies: The role of copy-based reuse in open source software development,

    M. Jahanshahi, D. Reid, and A. Mockus, “Beyond dependencies: The role of copy-based reuse in open source software development,”ACM Transactions on Software Engineering and Methodology, vol. 34, no. 8, pp. 1–49, 2025

  8. [8]

    Open source for open source license compliance,

    O. Fendt and M. C. Jaeger, “Open source for open source license compliance,” inOpen Source Systems: 15th IFIP WG 2.13 Interna- tional Conference, OSS 2019, Montreal, QC, Canada, May 26–27, 2019, Proceedings 15, Springer, 2019, pp. 133–138

  9. [9]

    Continuous open source license compli- ance,

    S. Phipps and S. Zacchiroli, “Continuous open source license compli- ance,”arXiv preprint arXiv:2011.08489, 2020

  10. [10]

    Understanding and auditing the licensing of open source software distributions,

    D. M. German, M. Di Penta, and J. Davies, “Understanding and auditing the licensing of open source software distributions,” in2010 IEEE 18th International Conference on Program Comprehension, IEEE, 2010, pp. 84–93

  11. [11]

    Jacobsen v. katzer: Federal circuit affirms economic interest of open source copyright holder,

    Y . Shagall and E. Breithaupt, “Jacobsen v. katzer: Federal circuit affirms economic interest of open source copyright holder,”Harvard Journal of Law & Technology, 2008, Accessed: 2024-09-27. [Online]. Available: https://jolt.law.harvard.edu/digest/jacobsen-v-katzer

  12. [12]

    gpl violation lawsuit, Accessed: 2024-09-27, 2007

    Software Freedom Law Center,On behalf of busybox developers, sflc files first ever u.s. gpl violation lawsuit, Accessed: 2024-09-27, 2007. [Online]. Available: https://softwarefreedom.org/news/2007/sep/20/ busybox/

  13. [13]

    Insights from open source software supply chains (keynote),

    A. Mockus, “Insights from open source software supply chains (keynote),” inProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019, pp. 3–3

  14. [14]

    Tutorial: Open source software supply chains,

    A. Mockus, “Tutorial: Open source software supply chains,” inIndia Software Engineering Conference, 2022. [Online]. Available: papers/ SSCISEC22.pdf

  15. [15]

    Mockus,Securing large language model software supply chains, ASE’23 LLMs in Software Engineering, Luxenburgh, Sep

    A. Mockus,Securing large language model software supply chains, ASE’23 LLMs in Software Engineering, Luxenburgh, Sep. 2023. [Online]. Available: papers/wocllm.pdf

  16. [16]

    Estimating the attack surface from residual vulnerabilities in open source software supply chain,

    D. Yan, Y . Niu, K. Liu, Z. Liu, Z. Liu, and T. F. Bissyand ´e, “Estimating the attack surface from residual vulnerabilities in open source software supply chain,” in2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS), IEEE, 2021, pp. 493–502

  17. [17]

    Sok: Taxonomy of attacks on open-source software supply chains,

    P. Ladisa, H. Plate, M. Martinez, and O. Barais, “Sok: Taxonomy of attacks on open-source software supply chains,” in2023 IEEE Symposium on Security and Privacy (SP), IEEE, 2023, pp. 1509–1526

  18. [18]

    Effort, co-operation and co-ordination in an open source software project: Gnome,

    S. Koch and G. Schneider, “Effort, co-operation and co-ordination in an open source software project: Gnome,”Information Systems Journal, vol. 12, no. 1, pp. 27–42, 2002

  19. [19]

    Large-scale code reuse in open source software,

    A. Mockus, “Large-scale code reuse in open source software,” inFirst International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS’07: ICSE Workshops 2007), IEEE, 2007, pp. 7– 7

  20. [20]

    Crowston and J

    K. Crowston and J. Howison,The social structure of free and open source software development, 2005

  21. [21]

    Influence of social and technical factors for evaluating contribution in github,

    J. Tsay, L. Dabbish, and J. Herbsleb, “Influence of social and technical factors for evaluating contribution in github,” inProceedings of the 36th international conference on Software engineering, 2014, pp. 356– 366

  22. [22]

    Predicting the popularity of github repositories,

    H. Borges, A. Hora, and M. T. Valente, “Predicting the popularity of github repositories,” inProceedings of the The 12th international conference on predictive models and data analytics in software engi- neering, 2016, pp. 1–10

  23. [23]

    Sustainability of open source soft- ware communities beyond a fork: How and why has the libreoffice project evolved?

    J. Gamalielsson and B. Lundell, “Sustainability of open source soft- ware communities beyond a fork: How and why has the libreoffice project evolved?”Journal of systems and Software, vol. 89, pp. 128– 145, 2014

  24. [24]

    Popularity, interoperability, and impact of programming languages in 100,000 open source projects,

    T. F. Bissyand ´e, F. Thung, D. Lo, L. Jiang, and L. R ´eveillere, “Popularity, interoperability, and impact of programming languages in 100,000 open source projects,” in2013 IEEE 37th annual computer software and applications conference, IEEE, 2013, pp. 303–312

  25. [25]

    An investigation into the impact of software licenses on copy-and-paste reuse among oss projects,

    Y . Kashima, Y . Hayase, N. Yoshida, Y . Manabe, and K. Inoue, “An investigation into the impact of software licenses on copy-and-paste reuse among oss projects,” in2011 18th Working Conference on Reverse Engineering, IEEE, 2011, pp. 28–32

  26. [26]

    The effects of open source license choice on software reuse,

    J. V . Brewer, “The effects of open source license choice on software reuse,” Ph.D. dissertation, Virginia Tech, 2012

  27. [27]

    A. M. S. Laurent,Understanding open source and free software licensing: guide to navigating licensing issues in existing & new software. ” O’Reilly Media, Inc.”, 2004

  28. [28]

    Stallman,Free software, free society: Selected essays of Richard M

    R. Stallman,Free software, free society: Selected essays of Richard M. Stallman. Lulu. com, 2002

  29. [29]

    How big media uses technology and the law to lock down culture and control creativity,

    L. Lessig, “How big media uses technology and the law to lock down culture and control creativity,”Retrieved December, vol. 5, p. 2004, 2004

  30. [30]

    Open source licensing,

    L. Rosen, “Open source licensing,”Software Freedom and Intellectual Property Law, 2005

  31. [31]

    V ¨alim¨aki,The rise of open source licensing: a challenge to the use of intellectual property in the software industry

    M. V ¨alim¨aki,The rise of open source licensing: a challenge to the use of intellectual property in the software industry. Turre publishing, 2005

  32. [32]

    An exploratory study of the evolution of software licensing,

    M. Di Penta, D. M. German, Y .-G. Gu ´eh´eneuc, and G. Antoniol, “An exploratory study of the evolution of software licensing,” in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, 2010, pp. 145–154

  33. [33]

    A large-scale empirical study of open source license usage: Practices and challenges,

    J. Wu, L. Bao, X. Yang, X. Xia, and X. Hu, “A large-scale empirical study of open source license usage: Practices and challenges,” in 2024 IEEE/ACM 21st International Conference on Mining Software Repositories (MSR), IEEE, 2024, pp. 595–606

  34. [34]

    An empirical study of license conflict in free and open source software,

    X. Cui, J. Wu, Y . Wu, X. Wang, T. Luo, S. Qu, X. Ling, and M. Yang, “An empirical study of license conflict in free and open source software,” in2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE- SEIP), IEEE, 2023, pp. 495–505

  35. [35]

    An empirical study of license violations in open source projects,

    A. Mathur, H. Choudhary, P. Vashist, W. Thies, and S. Thilagam, “An empirical study of license violations in open source projects,” in 2012 35th annual IEEE software engineering workshop, IEEE, 2012, pp. 168–176

  36. [36]

    Investigating whether and how software developers understand open source software licensing,

    D. A. Almeida, G. C. Murphy, G. Wilson, and M. Hoye, “Investigating whether and how software developers understand open source software licensing,”Empirical Software Engineering, vol. 24, pp. 211–239, 2019

  37. [37]

    From one to hundreds: Multi-licensing in the javascript ecosystem,

    J. P. Moraes, I. Polato, I. Wiese, F. Saraiva, and G. Pinto, “From one to hundreds: Multi-licensing in the javascript ecosystem,”Empirical Software Engineering, vol. 26, no. 3, p. 39, 2021

  38. [38]

    Empirical study on dependency- related license violation in the javascript package ecosystem,

    S. Qiu, D. M. German, and K. Inoue, “Empirical study on dependency- related license violation in the javascript package ecosystem,”Journal of Information Processing, vol. 29, pp. 296–304, 2021

  39. [39]

    Open-source license violations of binary software at large scale,

    M. Feng et al., “Open-source license violations of binary software at large scale,” in2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE, 2019, pp. 564– 568

  40. [40]

    An analysis of open source software licensing questions in stack exchange sites,

    M. Papoutsoglou, G. M. Kapitsaki, D. German, and L. Angelis, “An analysis of open source software licensing questions in stack exchange sites,”Journal of Systems and Software, vol. 183, p. 111 113, 2022

  41. [41]

    Applying the universal version history concept to help de-risk copy-based code reuse,

    D. Reid and A. Mockus, “Applying the universal version history concept to help de-risk copy-based code reuse,” in2023 IEEE 23rd International Working Conference on Source Code Analysis and Ma- nipulation (SCAM), IEEE, 2023, pp. 1–12

  42. [42]

    Open source license inconsistencies on github,

    T. Wolter, A. Barcomb, D. Riehle, and N. Harutyunyan, “Open source license inconsistencies on github,”ACM Transactions on Software Engineering and Methodology, vol. 32, no. 5, pp. 1–23, 2023. 13

  43. [43]

    A method to detect license inconsistencies in large-scale open source projects,

    Y . Wu, Y . Manabe, T. Kanda, D. M. German, and K. Inoue, “A method to detect license inconsistencies in large-scale open source projects,” in2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, IEEE, 2015, pp. 324–333

  44. [44]

    Lidetector: License incompatibility detection for open source software,

    S. Xu, Y . Gao, L. Fan, Z. Liu, Y . Liu, and H. Ji, “Lidetector: License incompatibility detection for open source software,”ACM Transactions on Software Engineering and Methodology, vol. 32, no. 1, pp. 1–28, 2023

  45. [45]

    Oss license identification at scale: A comprehensive dataset using world of code,

    M. Jahanshahi, D. Reid, A. McDaniel, and A. Mockus, “Oss license identification at scale: A comprehensive dataset using world of code,” in2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR), IEEE, 2025, pp. 144–148

  46. [46]

    World of code: An infrastructure for mining the universe of open source vcs data,

    Y . Ma, C. Bogart, S. Amreen, R. Zaretzki, and A. Mockus, “World of code: An infrastructure for mining the universe of open source vcs data,” in2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), IEEE, 2019, pp. 143–154

  47. [47]

    World of code: Enabling a research workflow for mining and analyzing the universe of open source vcs data,

    Y . Ma, T. Dey, C. Bogart, S. Amreen, M. Valiev, A. Tutko, D. Kennard, R. Zaretzki, and A. Mockus, “World of code: Enabling a research workflow for mining and analyzing the universe of open source vcs data,”Empirical Software Engineering, vol. 26, pp. 1–42, 2021

  48. [48]

    A dataset and an approach for identity resolution of 38 million author ids extracted from 2b git commits,

    T. Fry, T. Dey, A. Karnauch, and A. Mockus, “A dataset and an approach for identity resolution of 38 million author ids extracted from 2b git commits,” inProceedings of the 17th international conference on mining software repositories, 2020, pp. 518–522

  49. [49]

    A complete set of related git repositories identified via community detection approaches based on shared commits,

    A. Mockus, D. Spinellis, Z. Kotti, and G. J. Dusing, “A complete set of related git repositories identified via community detection approaches based on shared commits,” inProceedings of the 17th International Conference on Mining Software Repositories, 2020, pp. 513–517

  50. [50]

    Dataset: Copy-based reuse in open source software,

    M. Jahanshahi and A. Mockus, “Dataset: Copy-based reuse in open source software,” inProceedings of the 21st International Conference on Mining Software Repositories, 2024, pp. 42–47

  51. [51]

    The transformation of open source software,

    B. Fitzgerald, “The transformation of open source software,”MIS quarterly, pp. 587–598, 2006

  52. [52]

    Free software matters: Enforcing the gpl, ii,

    E. Moglen, “Free software matters: Enforcing the gpl, ii,”Column in LinuxUser Magazine (August 2001), 2001

  53. [53]

    Agresti,Categorical data analysis

    A. Agresti,Categorical data analysis. John Wiley & Sons, 2012, vol. 792

  54. [54]

    S. K. Thompson,Sampling. John Wiley & Sons, 2012, vol. 755

  55. [55]

    Collinearity: A review of methods to deal with it and a simulation study evaluating their performance,

    C. F. Dormann et al., “Collinearity: A review of methods to deal with it and a simulation study evaluating their performance,”Ecography, vol. 36, no. 1, pp. 27–46, 2013

  56. [56]

    Multicollinearity in regression analyses conducted in epidemiologic studies,

    K. P. Vatcheva, M. Lee, J. B. McCormick, and M. H. Rahbar, “Multicollinearity in regression analyses conducted in epidemiologic studies,”Epidemiology (Sunnyvale, Calif.), vol. 6, no. 2, 2016. 14 APPENDIX List of SPDX license identifiers aggregated by their respec- tive license types: Permissive: 0BSD, AFL-3.0, Apache-2.0, BSD-2, BSD-2- Clause, BSD-3-Claus...