pith. machine review for the scientific record. sign in

arxiv: 2604.14014 · v2 · submitted 2026-04-15 · 💻 cs.SE · cs.CR

Recognition: unknown

Analysis of Commit Signing on Github

Authors on Pith no claims yet

Pith reviewed 2026-05-10 12:28 UTC · model grok-4.3

classification 💻 cs.SE cs.CR
keywords commit signingGitHubsoftware supply chainsecurity frameworksdeveloper practiceskey managementempirical study
0
0 comments X

The pith

GitHub commit signing mostly comes from platform automation rather than deliberate developer action.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper conducts the first large-scale, developer-centric measurement of commit signing across GitHub's full history, examining 71,694 active developers, 16.1 million commits, and 874,198 repositories. It finds that overall signing rates are misleading because most signed commits result from automatic platform mechanisms instead of intentional local signing by developers. Developers who do sign locally rarely maintain the practice consistently across repositories or over time, with lapse rates rising as accounts age and key management suffering from unrevoked expired credentials. These patterns demonstrate that the assumptions in supply-chain security frameworks about consistent, developer-controlled signing do not hold in practice at ecosystem scale.

Core claim

The study shows that assumptions underlying supply-chain security frameworks do not hold in practice on GitHub. Most signed commits come from automatic platform signing rather than deliberate developer action. Developers who sign locally do so inconsistently across repositories and over time, with signing lapse rates increasing alongside account age rather than decreasing. Developers also manage their signing keys poorly, leaving expired keys unrevoked and allowing credential debt to grow.

What carries the argument

The classification of commits into platform-signed versus locally-signed categories, applied at scale to distinguish automatic platform behavior from deliberate developer signing across the full GitHub dataset.

Load-bearing premise

The GitHub dataset and the rules used to label developers as active and to separate local signing from platform signing accurately reflect intentional developer behavior without systematic distortion from platform mechanics or collection choices.

What would settle it

A longitudinal study finding that a majority of active developers maintain local signing on every commit in every repository for multiple years without lapses or unrevoked expired keys would contradict the reported patterns of inconsistency and poor management.

Figures

Figures reproduced from arXiv: 2604.14014 by Abubakar Sadiq Shittu, Farzin Gholamrezae, John Sadik, Scott Ruoti.

Figure 1
Figure 1. Figure 1: Dataset construction pipeline. The full list of [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Legacy SHA-1 and DSA signatures are highly [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sankey diagram showing how 71,694 active users flow from signing capability to consistency to signing source (UI vs. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Sankey diagram showing signing capability and [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of days from a user’s first observed [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of the number of unsigned non-UI [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: CDF of per-user signing rates, computed over non [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: Distribution of dead keys per user [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Communication from GitHub Support declining [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
read the original abstract

Commit signing is a principal mechanism for verifying the origin of code in software supply chains. Security frameworks treat it as a core trust signal, assuming developers sign their commits consistently with keys they control and keep those keys in good standing over time. Whether this assumption holds in practice has not been evaluated at ecosystem scale. This study addresses this gap. We present the first developer-centric, ecosystem-scale measurement of commit signing on GitHub, covering the platform's full history, spanning 71,694 active developers, 16.1 million commits, and 874,198 repositories. To summarize our findings: (1) overall signing adoption rates are misleading, as most signed commits come from automatic platform signing rather than deliberate developer action; (2) developers who do sign locally rarely keep it up consistently across repositories or over time; (3) signing lapse rates rise alongside account age rather than falling, making sustained coverage structurally unlikely; and (4) developers manage their signing keys poorly, leaving expired keys unrevoked and letting credential debt grow over time. Our findings show that the assumptions underlying supply-chain security frameworks do not hold in practice, establishing that progress requires either signing credentials redesigned to travel with developer identity while remaining outside platform control, or frameworks built on defenses that do not depend on every developer managing their own keys.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents the first developer-centric, ecosystem-scale empirical measurement of commit signing on GitHub, analyzing 71,694 active developers, 16.1 million commits, and 874,198 repositories over the platform's full history. It reports four main findings: (1) most signed commits result from automatic platform signing rather than deliberate local developer action; (2) developers who sign locally do so inconsistently across repositories and over time; (3) signing lapse rates increase with account age; and (4) developers manage signing keys poorly, with expired keys left unrevoked and credential debt accumulating. The authors conclude that the assumptions of supply-chain security frameworks do not hold in practice, necessitating either redesigned credentials tied to developer identity outside platform control or alternative defenses independent of per-developer key management.

Significance. If the local/platform classification and lapse-rate measurements prove robust, this work is significant as the first quantitative, large-scale study directly testing the practical validity of commit signing as a supply-chain trust signal. The dataset scale provides strong empirical grounding for claims about adoption, consistency, and key hygiene, and the developer-centric framing fills a gap left by prior platform- or repository-level analyses. The findings could inform revisions to frameworks like SLSA or Sigstore by quantifying the gap between assumed and observed developer behavior.

major comments (3)
  1. [Methodology / data collection and classification section] The distinction between automatic platform signing and deliberate local signing (central to findings (1) and (2) in the abstract) is load-bearing for the claim that supply-chain assumptions fail due to developer behavior. The manuscript must explicitly detail the classification criteria (e.g., use of commit.signature fields, key fingerprints, GitHub verified status, or account linkage rules) and provide validation evidence (manual sampling, ground-truth comparison, or sensitivity analysis) to demonstrate that the separation is not confounded by platform-controlled metadata or automatic behaviors, as this directly affects whether the lapse-rate and consistency results reflect real developer practices.
  2. [Results on lapse rates and account age] Finding (3) states that signing lapse rates rise with account age rather than falling. The paper should report the precise operational definition of a 'lapse', the statistical procedure (e.g., regression model, survival analysis), effect sizes or coefficients, confidence intervals, and any controls for confounding variables such as changes in repository activity, developer tenure, or data-collection windows, because an unadjusted trend could be an artifact of how 'active developers' or commit history are sampled.
  3. [Key management and revocation analysis] Finding (4) on poor key management (expired keys unrevoked, growing credential debt) requires explicit measurement details: how revocation status is detected, the time windows used to identify 'debt', and handling of keys that may have been rotated or replaced outside GitHub records. Without these, it is difficult to assess whether the reported growth in debt is robust or sensitive to platform-specific key-attribution rules.
minor comments (2)
  1. [Abstract] The abstract packs four numbered findings into a single paragraph; consider using a bulleted list or separate sentences to improve readability for readers scanning the contribution.
  2. [Figures and results presentation] Ensure that all figures showing trends (e.g., lapse rates vs. account age) include clear legends, axis labels, and sample sizes per bin so that the reader can assess statistical power directly from the visualization.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, indicating the revisions we will make to strengthen the manuscript's clarity and robustness.

read point-by-point responses
  1. Referee: The distinction between automatic platform signing and deliberate local signing (central to findings (1) and (2) in the abstract) is load-bearing for the claim that supply-chain assumptions fail due to developer behavior. The manuscript must explicitly detail the classification criteria (e.g., use of commit.signature fields, key fingerprints, GitHub verified status, or account linkage rules) and provide validation evidence (manual sampling, ground-truth comparison, or sensitivity analysis) to demonstrate that the separation is not confounded by platform-controlled metadata or automatic behaviors, as this directly affects whether the lapse-rate and consistency results reflect real developer practices.

    Authors: We agree that explicit documentation of the classification is essential. The revised manuscript will expand Section 3 (Methodology) with a dedicated subsection specifying the criteria: platform-signed commits are those lacking a local GPG/SSH signature field or bearing only GitHub's automated verified status without a matching developer key fingerprint; local signing requires a valid signature whose key ID links to the developer's GitHub account via the keys API. We will add validation consisting of manual sampling of 1,000 commits (two-author review, 93% agreement) plus sensitivity analysis across alternative fingerprint-matching thresholds. These additions confirm the separation isolates deliberate developer actions. revision: yes

  2. Referee: Finding (3) states that signing lapse rates rise with account age rather than falling. The paper should report the precise operational definition of a 'lapse', the statistical procedure (e.g., regression model, survival analysis), effect sizes or coefficients, confidence intervals, and any controls for confounding variables such as changes in repository activity, developer tenure, or data-collection windows, because an unadjusted trend could be an artifact of how 'active developers' or commit history are sampled.

    Authors: We will revise the Results section on lapse rates to include the operational definition (a 90-day window without signed commits after an initial signed commit by the same developer) and the full statistical procedure: a Cox proportional-hazards model with account age as the primary predictor. The revision will report hazard ratios, 95% confidence intervals, and controls for repository activity volume, commit frequency, and account-creation cohort. Preliminary effect sizes show a 12-18% increase in lapse hazard per additional year of account age (p<0.001), robust to the controls. This directly addresses potential sampling artifacts. revision: yes

  3. Referee: Finding (4) on poor key management (expired keys unrevoked, growing credential debt) requires explicit measurement details: how revocation status is detected, the time windows used to identify 'debt', and handling of keys that may have been rotated or replaced outside GitHub records. Without these, it is difficult to assess whether the reported growth in debt is robust or sensitive to platform-specific key-attribution rules.

    Authors: We will augment Section 4 with precise measurement details: revocation status is obtained from the GitHub keys API 'revoked' flag and expiration dates; credential debt is quantified as the cumulative count of unrevoked expired keys per developer, aggregated in successive 6-month windows aligned to account creation. We will also add an explicit limitations paragraph noting that rotations performed entirely outside GitHub (e.g., local keyrings never re-uploaded) are invisible to our data and therefore cannot be tracked. These clarifications will be included while preserving the core finding on visible GitHub key hygiene. revision: partial

standing simulated objections not resolved
  • Complete tracking of signing keys that are rotated or replaced entirely outside GitHub's visible records and API data.

Circularity Check

0 steps flagged

No circularity: pure empirical measurement on external GitHub data

full rationale

The paper is a large-scale observational study that counts and classifies commits, developers, and signing events drawn directly from GitHub's public history. No equations, fitted parameters, predictions, or models appear in the provided text; the four numbered findings are direct aggregates and rates computed from the dataset. No self-citation chain, uniqueness theorem, or ansatz is invoked to justify the central claims. The distinction between local and platform signing is a classification rule applied to observable metadata fields, not a self-definitional loop. Therefore the reported lapse rates, consistency statistics, and key-management observations stand as independent measurements rather than reductions to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an empirical measurement study. It rests on domain assumptions about data accuracy rather than new mathematical axioms or invented entities.

axioms (1)
  • domain assumption GitHub's recorded commit data and signing metadata accurately distinguish platform-automatic signing from deliberate local developer signing and reflect real developer behavior.
    The four summarized findings depend on this distinction and on the representativeness of the 71,694-developer sample.

pith-pipeline@v0.9.0 · 5538 in / 1198 out tokens · 50901 ms · 2026-05-10T12:28:40.690208+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 17 canonical work pages · 1 internal anchor

  1. [1]

    Eman Abu Ishgair, Marcela S Melara, and Santiago Torres-Arias. 2024. SoK: A Defense-Oriented Evaluation of Software Supply Chain Security.arXiv e-prints (2024), arXiv–2405

  2. [2]

    Hammad Afzali, Santiago Torres-Arias, Reza Curtmola, and Justin Cappos. 2018. le-git-imate: Towards verifiable web-based Git repositories. InProceedings of the 2018 on Asia Conference on Computer and Communications Security. 469–482

  3. [3]

    Sebastian Baltes and Paul Ralph. 2022. Sampling in Software Engineering Research: A Critical Review and Guidelines.Empirical Software Engineering 27, 4 (2022), 94. doi:10.1007/s10664-021-10072-8

  4. [4]

    Recommendation for key management: part 1 - general , shorttitle =

    Elaine Barker. 2020.Recommendation for Key Management: Part 1 – General. NIST Special Publication 800-57 Part 1 Rev. 5. National Institute of Standards and Technology. 54–55 pages. doi:10.6028/NIST.SP.800-57pt1r5 Table 2: Comparable security strengths of symmetric block cipher and asymmetric-key algorithms

  5. [5]

    2019.Transitioning the Use of Cryptographic Algorithms and Key Lengths

    Elaine Barker and Allen Roginsky. 2019.Transitioning the Use of Cryptographic Algorithms and Key Lengths. NIST Special Publication 800-131A Revision 2. National Institute of Standards and Technology. doi:10.6028/NIST.SP.800-131Ar2

  6. [6]

    Fabian Bäumer, Marcus Brinkmann, Maximilian Radoy, Jörg Schwenk, and Juraj Somorovsky. 2025. On the Security of SSH Client Signatures. InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security. 4619–4633

  7. [7]

    Christian Bird, Peter C Rigby, Earl T Barr, David J Hamilton, Daniel M German, and Prem Devanbu. 2009. The promises and perils of mining git. In2009 6th IEEE International Working Conference on Mining Software Repositories. IEEE, 1–10

  8. [8]

    Jon Callas, Lutz Donnerhacke, Hal Finney, David Shaw, and Rodney Thayer

  9. [9]

    RFC Editor

    RFC 4880: OpenPGP Message Format. RFC Editor. https://www.rfc- editor.org/rfc/rfc4880

  10. [10]

    Cybersecurity and Infrastructure Security Agency. 2025. Supply Chain Compromise of Third-Party tj-actions/changed-files (CVE-2025-30066) and reviewdog/action-setup@v1 (CVE-2025-30154). CISA Alert. https: //www.cisa.gov/news-events/alerts/2025/03/18/supply-chain-compromise- third-party-tj-actionschanged-files-cve-2025-30066-and-reviewdogaction Accessed: 2025-10-21

  11. [11]

    Hassan Onsori Delicheh and Tom Mens. 2024. Mitigating Security Issues in GitHub Actions. InProceedings of the 2024 ACM/IEEE 4th International Workshop on Engineering and Cybersecurity of Critical Systems (EnCyCriS) and 2024 IEEE/ACM Second International Workshop on Software Vulnerability(Lisbon, Portugal)(EnCyCriS/SVM ’24). Association for Computing Machi...

  12. [12]

    Encyclopædia Britannica. 2024. GitHub. https://www.britannica.com/technolog y/GitHub Accessed: 2025-12-02

  13. [13]

    Felix Fischer, Jonas Höbenreich, and Jens Grossklags. 2023. The Effectiveness of Security Interventions on GitHub. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security(Copenhagen, Denmark) (CCS ’23). Association for Computing Machinery, New York, NY, USA, 2426–2440. doi:10.1145/3576915.3623174

  14. [14]

    GitHub. 2025. Best practices for using the REST API. https://docs.github.com/en /rest/guides/best-practices-for-using-the-rest-api. Accessed: 2025-10-12

  15. [15]

    GitHub. 2025. Enterprise Managed Users. https://docs.github.com/en/enterprise- cloud@latest/admin/concepts/identity-and-access-management/enterprise- managed-users Accessed: 2026-03-25

  16. [16]

    GitHub. 2025. GitHub GraphQL API Documentation. https://docs.github.com/en /graphql. Accessed: 2025-10-12

  17. [17]

    GitHub. 2025. GitHub REST API Documentation. https://docs.github.com/en/rest. Accessed: 2025-10-12

  18. [18]

    GitHub. 2025. Rate limits and node limits for the GraphQL API. https://docs.githu b.com/en/graphql/overview/rate-limits-and-node-limits-for-the-graphql-api. Accessed: 2025-10-12

  19. [19]

    GitHub. 2025. Rate limits for the REST API. https://docs.github.com/en/rest/over view/rate-limits-for-the-rest-api. Accessed: 2025-10-12

  20. [20]

    GitHub, Inc. [n. d.]. Managing commit signature verification. GitHub Docs. https: //docs.github.com/en/authentication/managing-commit-signature-verification

  21. [21]

    GitHub, Inc. 2025. About GitHub. https://github.com/about. Accessed: 2025-10-12

  22. [22]

    GitHub, Inc. 2025. REST API Endpoints for Users. https://docs.github.com/en/r est/users/users. Accessed: 2025-10-12

  23. [23]

    Danielle Gonzalez, Thomas Zimmermann, Patrice Godefroid, and Max Schäfer

  24. [24]

    In2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

    Anomalicious: Automated detection of anomalous and potentially malicious commits on github. In2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 258–267

  25. [25]

    Klemmer, Marcel Fourné, Oliver Wiese, Dominik Wermke, and Sascha Fahl

    Jan-Ulrich Holtgrave, Kay Friedrich, Fabian Fischer, Nicolas Huaman, Niklas Busch, Jan H. Klemmer, Marcel Fourné, Oliver Wiese, Dominik Wermke, and Sascha Fahl. 2025. Attributing Open-Source Contributions is Critical but Difficult: A Systematic Analysis of GitHub Practices and Their Impact on Software Supply Chain Security. https://www.ndss-symposium.org/...

  26. [26]

    Russ Housley. 2009. RFC 5652: Cryptographic Message Syntax (CMS). RFC Editor. https://www.rfc-editor.org/rfc/rfc5652

  27. [27]

    Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. 2014. The promises and perils of mining github. In Proceedings of the 11th working conference on mining software repositories. 92–101

  28. [28]

    Kelechi G Kalu, Sofia Okorafor, Tanmay Singla, Sophie Chen, Santiago Torres- Arias, and James C Davis. 2025. Why Johnny Signs with Next-Generation Tools: A Usability Case Study of Sigstore.arXiv preprint arXiv:2503.00271(2025)

  29. [29]

    Kelechi G Kalu, Tanmay Singla, Chinenye Okafor, Santiago Torres-Arias, and James C Davis. 2025. An industry interview study of software signing for supply chain security. In34th USENIX Security Symposium (USENIX Security 25). 81–100

  30. [30]

    Kelechi G Kalu, Hieu Tran, Santiago Torres-Arias, Sooyeon Jeong, and James C Davis. 2026. A Longitudinal Study of Usability in Identity-Based Software Signing. arXiv preprint arXiv:2603.17133(2026)

  31. [31]

    Erin Kenneally and David Dittrich. 2012. The menlo report: Ethical principles guiding information and communication technology research.A vailable at SSRN 2445102(2012)

  32. [32]

    Sabrina Klivan, Sandra Höltervennhoff, Rebecca Panskus, Karola Marky, and Sascha Fahl. 2024. Everyone for Themselves? A Qualitative Study about Individual Security Setups of Open Source Software Contributors. In2024 IEEE Symposium on Security and Privacy (SP). IEEE, 1065–1082. doi:10.1109/SP54263.2024.00214

  33. [33]

    Tadayoshi Kohno, Yasemin Acar, and Wulf Loh. 2023. Ethical frameworks and computer security trolley problems: Foundations for conversations. In32nd USENIX Security Symposium (USENIX Security 23). 5145–5162

  34. [34]

    Agata Kruzikova, Jakub Suchanek, Milan Broz, Martin Ukrop, and Vashek Matyas

  35. [35]

    InProceedings of the 19th International Conference on A vailability, Reliability and Security(Vienna, Austria)(ARES ’24)

    What Johnny thinks about using two-factor authentication on GitHub: A survey among open-source developers. InProceedings of the 19th International Conference on A vailability, Reliability and Security(Vienna, Austria)(ARES ’24). Association for Computing Machinery, New York, NY, USA, Article 185, 11 pages. doi:10.1145/3664476.3670885

  36. [36]

    Piergiorgio Ladisa, Henrik Plate, Matias Martinez, and Olivier Barais. 2023. SoK: Taxonomy of Attacks on Open-Source Software Supply Chains. In2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 1–18. doi:10.1109/SP46215.2023.10179304

  37. [37]

    Leona Lassak, Elleen Pan, Blase Ur, and Maximilian Golla. 2024. Why Aren’t We Using Passkeys? Obstacles Companies Face Deploying FIDO2 Passwordless Authentication. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA, 7231–7248. https://www.usenix.o rg/conference/usenixsecurity24/presentation/lassak

  38. [38]

    2012.Version Control with Git: Powerful tools and techniques for collaborative software development

    Jon Loeliger and Matthew McCullough. 2012.Version Control with Git: Powerful tools and techniques for collaborative software development. O’Reilly Media, Inc

  39. [39]

    Sanam Ghorbani Lyastani, Michael Schilling, Michaela Neumayr, Michael Backes, and Sven Bugiel. 2020. Is FIDO2 the Kingslayer of User Authentication? A Comparative Usability Study of FIDO2 Passwordless Authentication. In2020 IEEE Symposium on Security and Privacy (SP). IEEE, 268–285. doi:10.1109/SP40000.20 20.00047

  40. [40]

    Patrick E McKnight and Julius Najab. 2010. Mann-whitney U test.The Corsini encyclopedia of psychology(2010), 1–1

  41. [41]

    Kane Meissel and Esther S Yao. 2024. Using Cliff’s delta as a non-parametric effect size measure: an accessible web app and R tutorial.Practical Assessment, Research, and Evaluation29, 1 (2024). Conference acronym ’XX, June 03–05, 2018, Woodstock, NY anonymous et al

  42. [42]

    Arvind Narayanan and Vitaly Shmatikov. 2008. Robust de-anonymization of large sparse datasets. In2008 IEEE Symposium on Security and Privacy (sp 2008). IEEE, 111–125

  43. [43]

    2023.Digital Signature Standard (DSS)

    National Institute of Standards and Technology. 2023.Digital Signature Standard (DSS). Federal Information Processing Standard FIPS 186-5. National Institute of Standards and Technology. https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.1 86-5.pdf

  44. [44]

    Marc Ohm, Henrik Plate, Arnold Sykosch, and Michael Meier. 2020. Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks. InDetection of Intrusions and Malware, and Vulnerability Assessment: 17th International Conference, DIMV A 2020, Lisbon, Portugal, June 24–26, 2020, Proceedings(Lisbon, Portugal). Springer-Verlag, Berlin, ...

  45. [45]

    Paul Ohm. 2009. Broken promises of privacy: Responding to the surprising failure of anonymization.UCLA l. Rev.57 (2009), 1701

  46. [46]

    Daniel Olszewski, Tyler Tucker, Kevin R. B. Butler, and Patrick Traynor. 2025. SoK: towards a unified approach of applied replicability for computer security. In Proceedings of the 34th USENIX Conference on Security Symposium(Seattle, WA, USA)(SEC ’25). USENIX Association, USA, Article 25, 20 pages

  47. [47]

    Hassan Onsori Delicheh, Alexandre Decan, and Tom Mens. 2024. Quantifying security issues in reusable JavaScript actions in GitHub workflows. InProceedings of the 21st International Conference on Mining Software Repositories. 692–703. doi:10.1145/3643991.3644899

  48. [48]

    Palo Alto Networks Unit 42. 2025. GitHub Actions Supply Chain Attack: A Targeted Attack on Coinbase Expanded to the Widespread tj-actions/changed-files Incident. Unit 42 Threat Assessment. https: //unit42.paloaltonetworks.com/github-actions-supply-chain-attack/ Accessed: 2025-10-21

  49. [49]

    Foster Provost, David Jensen, and Tim Oates. 1999. Efficient Progressive Sampling. InProceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(San Diego, California, USA)(KDD ’99). ACM, New York, NY, USA, 23–32. doi:10.1145/312129.312188

  50. [50]

    Matteo Riondato and Eli Upfal. 2015. Mining frequent itemsets through progressive sampling with Rademacher averages. InProceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1005–1014

  51. [51]

    Riseup. [n. d.]. OpenPGP Best Practices. https://riseup.net/en/security/message- security/openpgp/gpg-best-practices. Accessed: 2026-01-18

  52. [52]

    Taylor R Schorlemmer, Ethan H Burmane, Kelechi G Kalu, Santiago Torres-Arias, and James C Davis. 2025. Establishing provenance before coding: Traditional and next-generation software signing.IEEE Security & Privacy23, 2 (March 2025), 14–22. doi:10.1109/MSEC.2025.3537616

  53. [53]

    Schorlemmer, Kelechi G

    Taylor R. Schorlemmer, Kelechi G. Kalu, Luke Chigges, Kyung Myung Ko, Eman Abu Ishgair, Saurabh Bagchi, Santiago Torres-Arias, and James C. Davis

  54. [54]

    Exploring the orthogonality and linearity of backdoor attacks,

    Signing in Four Public Software Package Registries: Quantity, Quality, and Influencing Factors . In2024 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 1160–1178. doi:10.1109/SP54263.2024 .00215

  55. [55]

    Anupam Sharma, Sreyashi Karmakar, Gayatri Priyadarsini Kancherla, and Abhishek Bichhawat. 2025. On the Prevalence and Usage of Commit Signing on GitHub: A Longitudinal and Cross-Domain Study. InProceedings of the 29th International Conference on Evaluation and Assessment in Software Engineering (EASE ’25). Association for Computing Machinery, New York, NY...

  56. [56]

    Sonatype. [n. d.]. Working with PGP Signatures. https://central.sonatype.org/p ublish/requirements/gpg/. Accessed: 2026-01-18

  57. [57]

    2021.Git v2.34.0 Release Notes

    The Git Project. 2021.Git v2.34.0 Release Notes. https://raw.githubusercontent.co m/git/git/master/Documentation/RelNotes/2.34.0.txt See section ’UI, Workflows & Features’

  58. [58]

    The Linux Foundation. 2025. SLSA specification (Version 1.2). https://slsa .dev/spec/v1.2/ Supply-chain Levels for Software Artifacts (SLSA). Accessed: 2026-01-02

  59. [59]

    2021.PROTOCOL.sshsig: SSH Signature Format

    The OpenSSH Project. 2021.PROTOCOL.sshsig: SSH Signature Format. ht tps://cvsweb.openbsd.org/src/usr.bin/ssh/PROTOCOL.sshsig Defines MAGIC_PREAMBLE "SSHSIG"

  60. [60]

    Santiago Torres-Arias, Hammad Afzali, Trishank Karthik Kuppusamy, Reza Curtmola, and Justin Cappos. 2019. in-toto: Providing farm-to-table guarantees for bits and bytes. In28th USENIX Security Symposium (USENIX Security 19). 1393–1410. https://www.usenix.org/system/files/sec19-torres-arias.pdf

  61. [61]

    Laurie Williams, Giacomo Benedetti, Sivana Hamer, Ranindya Paramitha, Imranur Rahman, Mahzabin Tamanna, Greg Tystahl, Nusrat Zahan, Patrick Morrison, Yasemin Acar, Michel Cukier, Christian Kästner, Alexandros Kapravelos, Dominik Wermke, and William Enck. 2025. Research Directions in Software Supply Chain Security.ACM Trans. Softw. Eng. Methodol.34, 5, Art...

  62. [62]

    Yueke Zhang, Anda Liang, Xiaohan Wang, Pamela Wisniewski, Fengwei Zhang, Kevin Leach, and Yu Huang. 2025. Who’s Pushing the Code? An Exploration of GitHub Impersonation. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE Computer Society, 602–602

  63. [63]

    But the data is already public

    Michael Zimmer. 2020. “But the data is already public”: on the ethics of research in Facebook. InThe ethics of information technologies. Routledge, 229–241

  64. [64]

    Markus Zimmermann, Cristian-Alexandru Staicu, Cam Tenny, and Michael Pradel

  65. [65]

    InProceedings of the 28th USENIX Conference on Security Symposium(Santa Clara, CA, USA)(SEC’19)

    Smallworld with high risks: a study of security threats in the npm ecosystem. InProceedings of the 28th USENIX Conference on Security Symposium(Santa Clara, CA, USA)(SEC’19). USENIX Association, USA, 995–1010

  66. [66]

    signers” 94.29% of users who ever signed are UI-only; no local signing observed — — Unique — provenance paradox finding Provenance paradox / badge ambiguity “Verified

    Matthew Zook, Solon Barocas, Danah Boyd, Kate Crawford, Emily Keller, Seeta Peña Gangadharan, Alyssa Goodman, Rachelle Hollander, Barbara A Koenig, Jacob Metcalf, et al . 2017. Ten simple rules for responsible big data research. e1005399 pages. A Open Science We provide an anonymized artifact bundle that includes code to enumerate and sample GitHub accoun...