pith. machine review for the scientific record. sign in

arxiv: 2603.24501 · v2 · submitted 2026-03-25 · 💻 cs.SE

Recognition: 1 theorem link

· Lean Theorem

Efficiency for Experts, Visibility for Newcomers: A Case Study of Label-Code Alignment in Kubernetes

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:25 UTC · model grok-4.3

classification 💻 cs.SE
keywords label-diff congruencepull requestscode reviewKubernetescontributor experienceopen source collaborationGitHub labelsreview participation
0
0 comments X

The pith

In Kubernetes, alignment between pull request labels and modified files leads to fewer review participants for core developers but more for newcomers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces label-diff congruence as the match between a pull request's area labels and the files changed in its diff. Analysis of over 18,000 Kubernetes pull requests shows this alignment is common, stable across years, and often corrected during review. It does not affect merge speed, yet it correlates with different discussion patterns depending on contributor experience. Core developers see quieter reviews with high congruence, while one-time contributors receive more engagement. A reader would care because the finding points to a concrete, monitorable property that already shapes collaboration in a major project without requiring new processes.

Core claim

Label-diff congruence reaches 46.6 percent perfect alignment across the sample and is actively maintained through label corrections in 9.2 percent of pull requests. Quantile and negative binomial regressions stratified by experience level show that, among core developers, higher congruence associates with 18 percent fewer review participants, whereas among one-time contributors it associates with 28 percent more participants. The same alignment shows no reliable link to time-to-merge.

What carries the argument

label-diff congruence, the degree of alignment between pull request area labels and the set of files changed in the diff

If this is right

  • Higher label-diff congruence predicts 18 percent fewer participants in reviews among core developers.
  • Higher label-diff congruence predicts 28 percent more participants in reviews among one-time contributors.
  • Label-diff congruence remains stable over a decade and is routinely corrected in roughly one in eleven pull requests.
  • Congruence shows no detectable association with time-to-merge in the stratified models.
  • Projects using similar area-label conventions can treat divergence between labels and diffs as a detectable signal of coordination friction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Projects could build diff-based label suggestion tools to raise baseline congruence without extra manual effort.
  • The experience-stratified pattern may appear in other large GitHub repositories that use area labels, offering a low-cost diagnostic for collaboration health.
  • Deliberately improving congruence during review might increase newcomer retention by raising their visibility in discussion threads.
  • Automated dashboards that flag low-congruence pull requests could help maintainers intervene early on potential coordination mismatches.

Load-bearing premise

The measured associations between congruence levels and review characteristics such as participant count reflect behavioral effects of alignment rather than being produced by unmeasured differences in pull request size, complexity, or topic.

What would settle it

Re-running the quantile and negative binomial models after adding full controls for pull request size, file count, and topic category, and checking whether the coefficients for congruence on participant count remain statistically significant and in the same direction.

Figures

Figures reproduced from arXiv: 2603.24501 by Giuseppe Destefanis, Matteo Vaccargiu, Roberto Tonelli, Ronnie de Souza Santos, Sabrina Aufiero, Silvia Bartolucci.

Figure 1
Figure 1. Figure 1: Overview of the methodology. Data from the Kubernetes repository is processed to construct label–diff congruence [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Quarterly median congruence with Theil–Sen ro [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Congruence effects by contributor tier (IRRs with 95% CIs). Left: comments. Right: participants. Gray dashed line [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
read the original abstract

Labels on platforms such as GitHub support triage and coordination, yet little is known about how well they align with code modifications or how such alignment affects collaboration across contributor experience levels. We present a case study of the Kubernetes project, introducing label-diff congruence - the alignment between pull request labels and modified files - and examining its prevalence, stability, behavioral validation, and relationship to collaboration outcomes across contributor tiers. We analyse 18,020 pull requests (2014--2025) with area labels and complete file diffs, validate alignment through analysis of over one million review comments and label corrections, and test associations with time-to-merge and discussion characteristics using quantile regression and negative binomial models stratified by contributor experience. Congruence is prevalent (46.6\% perfect alignment), stable over years, and routinely maintained (9.2\% of PRs corrected during review). It does not predict merge speed but shapes discussion: among core developers (81\% of the sample), higher congruence predicts quieter reviews (18\% fewer participants), whereas among one-time contributors it predicts more engagement (28\% more participants). Label-diff congruence influences how collaboration unfolds during review, supporting efficiency for experienced developers and visibility for newcomers. For projects with similar labeling conventions, monitoring alignment can help detect coordination friction and provide guidance when labels and code diverge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces label-diff congruence (alignment between PR labels and modified files) in the Kubernetes project. It analyzes 18,020 PRs (2014-2025) with area labels and complete diffs, validates the measure against over one million review comments and label corrections, and reports that congruence is prevalent (46.6% perfect alignment), stable over time, and maintained during review (9.2% of PRs corrected). Using quantile regression and negative binomial models stratified by contributor experience, the study finds no association with time-to-merge but differential associations with collaboration: higher congruence predicts 18% fewer participants among core developers (81% of sample) and 28% more participants among one-time contributors. The central claim is that label-diff congruence influences review dynamics, supporting efficiency for experienced developers and visibility for newcomers.

Significance. The work provides a large-scale, empirically grounded case study of labeling practices in a major open-source project, with direct behavioral validation from review data. The stratification by experience tier and use of appropriate count and quantile models are strengths. If the associations prove robust to controls for PR characteristics, the findings could inform coordination practices in similar projects and contribute to the literature on tooling and collaboration in software engineering.

major comments (1)
  1. [Modeling and Results] The regression specifications (quantile regression for participant count and negative binomial for discussion volume, described in the modeling section) do not include controls for PR size or complexity such as number of modified files, total diff size, or topic area. This is load-bearing for the claim that congruence 'influences how collaboration unfolds' with opposing effects by experience tier, because the reported 18% and 28% differences in participants could arise from selection (simpler PRs among core developers, more complex among newcomers) rather than behavioral effects of label alignment.
minor comments (2)
  1. [Abstract] The abstract states that models are 'stratified by contributor experience' but provides no details on included covariates, robustness checks, or precise effect sizes (e.g., coefficient magnitudes or confidence intervals) beyond the percentage differences.
  2. [Introduction and Methods] The definition of 'label-diff congruence' is introduced without an explicit formula or pseudocode; a short formal definition (e.g., as a ratio or matching score between label set and file paths) would improve reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our modeling approach. We agree that additional controls for PR size and complexity are needed to strengthen causal interpretation and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: The regression specifications (quantile regression for participant count and negative binomial for discussion volume, described in the modeling section) do not include controls for PR size or complexity such as number of modified files, total diff size, or topic area. This is load-bearing for the claim that congruence 'influences how collaboration unfolds' with opposing effects by experience tier, because the reported 18% and 28% differences in participants could arise from selection (simpler PRs among core developers, more complex among newcomers) rather than behavioral effects of label alignment.

    Authors: We agree this is a valid concern and that the current specifications leave room for selection effects. In the revised manuscript we will add the following controls to both the quantile regression (participant count) and negative binomial (discussion volume) models, estimated separately by experience tier: (1) number of modified files, (2) total diff size in lines changed, and (3) topic area (entered as a categorical variable based on the primary area label). We will re-run the models, report the updated incidence-rate ratios and quantile coefficients for label-diff congruence, and discuss whether the 18% and 28% differentials remain robust. Any attenuation or change in significance will be interpreted explicitly in the results and discussion sections. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational empirical analysis with no derivations or self-referential reductions

full rationale

The paper performs a case study on 18,020 Kubernetes PRs, measuring label-diff congruence prevalence, validating it via review comments and corrections, and testing associations with collaboration metrics via quantile regression and negative binomial models stratified by contributor experience. No equations, fitted parameters renamed as predictions, self-citation chains, or ansatzes are present. All claims derive from external GitHub data and standard statistical tests; the central claim (congruence influences collaboration differently by experience) is an observed association, not a quantity reduced to its own inputs by construction. This matches the default expectation for non-circular empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the newly introduced label-diff congruence metric and standard statistical assumptions for regression models applied to observational GitHub data. No free parameters are explicitly fitted beyond model coefficients; the main invented entity is the congruence measure itself.

axioms (1)
  • domain assumption Regression models assume no unmeasured confounding variables affect both label-code alignment and review characteristics.
    Invoked implicitly when interpreting associations from quantile regression and negative binomial models as behavioral effects.
invented entities (1)
  • label-diff congruence no independent evidence
    purpose: Quantifies alignment between pull request area labels and the set of modified files in the code diff.
    Newly defined construct central to the analysis; no independent evidence outside this study.

pith-pipeline@v0.9.0 · 5553 in / 1262 out tokens · 43148 ms · 2026-05-15T00:25:06.125550+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages

  1. [1]

    Amirali Alami, Kasia MacLean, Emad Shihab, and Foutse Khomh. 2025. The Role of Intrinsic Drivers and the Impact of LLMs in Code Review: Examining Accountability for Code Quality.ACM Transactions on Software Engineering and Methodology34, 8, Article 233 (2025). doi:10.1145/3721127

  2. [2]

    John Anvik, Lyndon Hiew, and Gail C Murphy. 2006. Who should fix this bug?. In Proceedings of the 28th international conference on Software engineering. 361–370

  3. [3]

    Sabrina Aufiero, Matteo Vaccargiu, Silvia Bartolucci, Fabio Caccioli, and Giuseppe Destefanis. 2026. Coordination at Scale in Large Distributed Development: The Case of Kubernetes. In23rd International Conference on Mining Software Reposito- ries (MSR ’26). 1–12. doi:10.1145/3793302.3793342

  4. [4]

    Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and chal- lenges of modern code review. InProceedings of the 2013 International Conference on Software Engineering (ICSE ’13). 712–721. doi:10.1109/ICSE.2013.6606617

  5. [5]

    Yoav Benjamini and Yosef Hochberg. 1995. Controlling the False Dis- covery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B (Methodological)57, 1 (1995), 289–300. arXiv:https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.2517- 6161.1995.tb02031.x doi:10.1111/j.2517-6161.1995.tb02031.x

  6. [6]

    Amiangshu Bosu, Michaela Greiler, and Christian Bird. 2015. Characteristics of Useful Code Reviews: An Empirical Study at Microsoft. In2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. 146–156. doi:10.1109/MSR. 2015.21

  7. [7]

    Jordi Cabot, Javier Luis Cánovas Izquierdo, Valerio Cosentino, and Belén Rolandi

  8. [8]

    In2015 IEEE 22nd International Conference on Software Analy- sis, Evolution, and Reengineering (SANER)

    Exploring the Use of Labels to Categorize Issues in Open-Source Soft- ware Projects. In2015 IEEE 22nd International Conference on Software Analy- sis, Evolution, and Reengineering (SANER). IEEE, Montréal, Canada, 550–554. doi:10.1109/SANER.2015.7081846

  9. [9]

    2013.Regression analysis of count data

    Adrian Colin Cameron and Pravin K Trivedi. 2013.Regression analysis of count data. Number 53. Cambridge university press

  10. [10]

    Marcelo Cataldo, James D Herbsleb, and Kathleen M Carley. 2008. Socio-technical congruence: a framework for assessing the impact of technical and work de- pendencies on software development productivity. InProceedings of the Second ACM-IEEE international symposium on Empirical software engineering and mea- surement. 2–11

  11. [11]

    Kenneth Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography.Computational linguistics16, 1 (1990), 22–29

  12. [12]

    Paolo Ciancarini, Giancarlo Succi, Artem Kruglov, Evgeny Kovrigin, Witold Pedrycz, and Manuel Mazzara. 2023. Do social interactions affect code review in modern code development?Frontiers in Computer Science5, Article 1178040 (2023). doi:10.3389/fcomp.2023.1178040

  13. [13]

    1999.Mathematical methods of statistics

    Harald Cramér. 1999.Mathematical methods of statistics. Vol. 9. Princeton university press

  14. [14]

    Giuseppe Destefanis, Silvia Bartolucci, and Daniel Feitosa. 2026. Mining Kuber- netes Repositories: The Cloud was Not Built in a Day. InProceedings of the 23rd International Conference on Mining Software Repositories(Rio de Janeiro, Brazil) (MSR ’26). ACM, New York, NY, USA. doi:10.1145/3793302.3793322

  15. [15]

    Nargis Fatima, Sumaira Nazir, and Suriayati Chuprat. 2019. Individual, Social and Personnel Factors Influencing Modern Code Review Process. In2019 IEEE Conference on Open Systems (ICOS). 40–45. doi:10.1109/ICOS47562.2019.8975708

  16. [16]

    Shepherd, Igor Wiese, Christoph Treude, Marco Au- rélio Gerosa, and Igor Steinmacher

    Felipe Fronchetti, David C. Shepherd, Igor Wiese, Christoph Treude, Marco Au- rélio Gerosa, and Igor Steinmacher. 2023. Do CONTRIBUTING Files Pro- vide Information about OSS Newcomers’ Onboarding Barriers?. InProceed- ings of the 31st ACM Joint European Software Engineering Conference and Sym- posium on the Foundations of Software Engineering (ESEC/FSE 20...

  17. [17]

    Georgios Gousios, Andy Zaidman, Margaret-Anne Storey, and Arie Van Deursen

  18. [18]

    In2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol

    Work practices and challenges in pull-based development: The integrator’s perspective. In2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 358–368

  19. [19]

    Herbsleb and Audris Mockus

    James D. Herbsleb and Audris Mockus. 2003. An empirical study of speed and communication in globally distributed software development.IEEE Transactions on software engineering29, 6 (2003), 481–494

  20. [20]

    Kim Herzig, Sascha Just, and Andreas Zeller. 2013. It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In2013 35th international conference on software engineering (ICSE). IEEE, 392–401

  21. [21]

    2013.Applied logistic regression

    David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant. 2013.Applied logistic regression. John Wiley & Sons

  22. [22]

    Claus Hunsen, Janet Siegmund, and Sven Apel. 2020. On the fulfillment of coordination requirements in open-source software projects: An exploratory study.Empirical Software Engineering25, 6 (2020), 4379–4426

  23. [23]

    Gaeul Jeong, Sunghun Kim, and Thomas Zimmermann. 2009. Improving bug triage with bug tossing graphs. InProceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. 111–120

  24. [24]

    Joselito Jr., Lidia P. G. Nascimento, Alcemir Santos, and Ivan Machado. 2024. Issue Labeling Dynamics in Open-Source Projects: A Comprehensive Analysis. InXVIII Brazilian Symposium on Software Components, Architectures, and Reuse (SBCARS). Curitiba, PR, Brazil

  25. [25]

    Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. 2014. The promises and perils of mining github. In Proceedings of the 11th working conference on mining software repositories. 92–101

  26. [26]

    Rafael Kallis, Andrea Di Sorbo, Gerardo Canfora, and Sebastiano Panichella

  27. [27]

    In2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)

    Ticket Tagger: Machine Learning Driven Issue Classification. In2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, Cleveland, OH, USA, 406–409. doi:10.1109/ICSME.2019.00070

  28. [28]

    Maurice G Kendall. 1938. A new measure of rank correlation.Biometrika30, 1-2 (1938), 81–93

  29. [29]

    SayedHassan Khatoonabadi, Ahmad Abdellatif, Diego Elias Costa, and Emad Shihab. 2024. Predicting the First Response Latency of Maintainers and Contrib- utors in Pull Requests.IEEE Transactions on Software Engineering50, 10 (2024), 2529–2543. doi:10.1109/TSE.2024.3443741

  30. [30]

    Jindae Kim and Seonah Lee. 2021. An Empirical Study on Using Multi-Labels for Issues in GitHub.IEEE Access9 (2021), 134984–134997. doi:10.1109/ACCESS. 2021.3116061

  31. [31]

    Roger Koenker and Gilbert Bassett. 1978. Regression Quantiles.Econometrica46, 1 (1978), 33–50. http://www.jstor.org/stable/1913643

  32. [32]

    Kruskal and W

    William H. Kruskal and W. Allen Wallis. 1952. Use of Ranks in One- Criterion Variance Analysis.J. Amer. Statist. Assoc.47, 260 (1952), 583–621. arXiv:https://doi.org/10.1080/01621459.1952.10483441 doi:10.1080/01621459.1952. 10483441

  33. [33]

    H. B. Mann and D. R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other.The Annals of Mathematical Statistics18, 1 (1947), 50–60. http://www.jstor.org/stable/2236101

  34. [34]

    Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan. 2016. An empirical study of the impact of modern code review practices on software quality.Empirical Software Engineering21 (2016), 2146–2189. doi:10.1007/s10664- 015-9381-9

  35. [35]

    Audris Mockus, Roy T Fielding, and James D Herbsleb. 2002. Two case studies of open source software development: Apache and Mozilla.ACM Transactions on Software Engineering and Methodology (TOSEM)11, 3 (2002), 309–346

  36. [36]

    Gustavo Pinto, Igor Steinmacher, and Marco Aurélio Gerosa. 2016. Understanding the Impressions, Motivations, and Barriers of One-Time Code Contributors to FLOSS Projects: A Survey.IEEE Software33, 2 (2016), 187–194. doi:10.1109/MS. 2016.36

  37. [37]

    Shade Ruangwan, Patanamon Thongtanunam, Akinori Ihara, and Kenichi Mat- sumoto. 2019. The impact of human factors on the participation decision of reviewers in modern code review.Empirical Software Engineering24 (2019), 973–1016. doi:10.1007/s10664-018-9646-1

  38. [38]

    Fernanda Santos, Joseph Vargovich, Bianca Trinkenreich, et al. 2023. Tag that issue: applying API-domain labels in issue tracking systems.Empirical Software Engineering28 (2023), 116. doi:10.1007/s10664-023-10329-4

  39. [39]

    Italo Santos, Katia Romero Felizardo, Igor Steinmacher, and Marco A. Gerosa

  40. [40]

    doi:10.1016/j.infsof.2024.107568

    Software solutions for newcomers’ onboarding in software projects: A systematic literature review.Information and Software Technology177 (2025), 107568. doi:10.1016/j.infsof.2024.107568

  41. [41]

    German, Igor Steinmacher, and Marco A

    Italo Santos, Katia Romero Felizardo, Bianca Trinkenreich, Daniel M. German, Igor Steinmacher, and Marco A. Gerosa. 2025. Exploring the Untapped: Student Perceptions and Participation in OSS. InProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering(Clarion Hotel Trondheim, Trondheim, Norway)(FSE Companion ’25). Ass...

  42. [42]

    Italo Santos, João Felipe Pimentel, Igor Wiese, Igor Steinmacher, Anita Sarma, and Marco A. Gerosa. 2023. Designing for Cognitive Diversity: Improving the GitHub Experience for Newcomers. In2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS). 1–12. doi:10. 1109/ICSE-SEIS58686.2023.00007

  43. [43]

    Italo Santos, Igor Wiese, Igor Steinmacher, Anita Sarma, and Marco A. Gerosa

  44. [44]

    In2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

    Hits and Misses: Newcomers’ ability to identify Skills needed for OSS tasks. In2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 174–183. doi:10.1109/SANER53432.2022.00032

  45. [45]

    Pranab Kumar Sen. 1968. Estimates of the Regression Coefficient Based on Kendall’s Tau.J. Amer. Statist. Assoc.63, 324 (1968), 1379–1389. arXiv:https://www.tandfonline.com/doi/pdf/10.1080/01621459.1968.10480934 doi:10.1080/01621459.1968.10480934

  46. [46]

    Red- miles

    Igor Steinmacher, Tayana Uchôa Conte, Marco Aurélio Gerosa, and David F. Red- miles. 2015. Social Barriers Faced by Newcomers Placing Their First Contribution in Open Source Software Projects. InProceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW ’15). 1379–1392. doi:10.1145/2675133.2675215

  47. [47]

    Vaccargiu, S

    M. Vaccargiu, S. Aufiero, C. Ba, S. Bartolucci, R. Clegg, D. Graziotin, R. Neykova, R. Tonelli, and G. Destefanis. 2025. Mining a Decade of Event Impacts on Contributor Dynamics in Ethereum: A Longitudinal Study. In2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR). 552–563. doi:10.1109/MSR66628.2025.00088 EASE 2026, 9–12 Ju...

  48. [48]

    Matteo Vaccargiu, Sabrina Aufiero, Silvia Bartolucci, Rumyana Neykova, Roberto Tonelli, and Giuseppe Destefanis. 2026. Developer engagement in open-source software’s green transition.Communications Sustainability1 (2026), 41. doi:10. 1038/s44458-026-00050-w

  49. [49]

    Matteo Vaccargiu, Silvia Bartolucci, Nicole Novielli, Marco Ortu, Roberto Tonelli, and Giuseppe Destefanis. 2026. Emotional expression in open- source: How project function shapes communication.Information and Software Technology 191 (2026), 108003. doi:10.1016/j.infsof.2025.108003

  50. [50]

    Matteo Vaccargiu, Riccardo Lai, Maria Ilaria Lunesu, Andrea Pinna, and Giuseppe Destefanis. 2026. Patterns of Bot Participation and Emotional Influence in Open- Source Development. In7th International Workshop on Bots and Agents in Software Engineering (BoatSE ’26). 1–7. doi:10.1145/3786161.3788455

  51. [51]

    Matteo Vaccargiu, Rumyana Neykova, Nicole Novielli, Marco Ortu, and Giuseppe Destefanis. 2025. More Than Code: Technical and Emotional Dynamics in Solidity’s Development. In2025 IEEE/ACM 18th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE). 260–271. doi:10.1109/CHASE66643.2025.00036

  52. [52]

    Roel Wieringa and Maya Daneva. 2015. Six strategies for generalizing software engineering theories.Science of Computer Programming101 (2015), 136–152. doi:10.1016/j.scico.2014.11.013 Towards general theories of software engineering

  53. [53]

    Edwin B. Wilson. 1927. Probable Inference, the Law of Succession, and Statistical Inference.J. Amer. Statist. Assoc.22, 158 (1927), 209–212. arXiv:https://www.tandfonline.com/doi/pdf/10.1080/01621459.1927.10502953 doi:10.1080/01621459.1927.10502953

  54. [54]

    Ohlsson, Björn Regnell, and Anders Wesslén

    Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell, and Anders Wesslén. 2012.Experimentation in Software Engineering. Vol. 236. Springer

  55. [55]

    Yue Yu, Huaimin Wang, Vladimir Filkov, Premkumar Devanbu, and Bogdan Vasilescu. 2015. Wait For It: Determinants of Pull Request Evaluation Latency on GitHub. In2015 12th Working Conference on Mining Software Repositories (MSR). IEEE, 367–371. doi:10.1109/MSR.2015.42

  56. [56]

    Guoliang Zhao, Daniel Alencar da Costa, and Ying Zou. 2019. Improving the pull requests review process using learning-to-rank algorithms.Empirical Software Engineering24 (2019), 2140–2170. doi:10.1007/s10664-019-09696-8