pith. machine review for the scientific record. sign in

arxiv: 2601.23142 · v2 · submitted 2026-01-30 · 💻 cs.SE

Recognition: 2 theorem links

· Lean Theorem

Do Good, Stay Longer? Temporal Patterns and Predictors of Newcomer-to-Core Transitions in Conventional OSS and OSS4SG

Authors on Pith no claims yet

Pith reviewed 2026-05-16 09:16 UTC · model grok-4.3

classification 💻 cs.SE
keywords open source softwarecontributor retentioncore contributorssocial good projectstemporal patternsnewcomer transitionsOSS sustainabilityonboarding
0
0 comments X

The pith

Open source projects with a social-good mission retain contributors more than twice as often and turn nearly 20 percent more into core developers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares how newcomers move from first contributions to core roles in open source projects that focus on societal impact versus standard projects. It finds that social-good projects hold onto new contributors at 2.2 times the rate of conventional ones and give them a 19.6 percent higher chance of reaching core status. Broad early exploration of the project stands out as a strong predictor of success. Contributors who spend time learning the code before making heavy contributions reach core status two to three times faster than those who start with intensive work right away. Social-good projects support several workable paths to core roles while conventional projects mostly follow one pattern.

Core claim

OSS4SG projects retain contributors at 2.2X higher rates and contributors have 19.6% higher probability of achieving core status. Early broad project exploration predicts core achievement with 22.2% importance. Contributors who follow a Late Spike temporal pattern achieve core status 2.4-2.9X faster than those following an Early Spike pattern. OSS4SG supports two effective temporal patterns for core transitions while conventional OSS concentrates on one dominant pathway.

What carries the argument

Temporal contribution patterns (Early Spike versus Late Spike) analyzed across project mission types (OSS4SG versus conventional OSS) as predictors of newcomer retention and core status achievement.

If this is right

  • Newcomers increase their odds of long-term involvement by choosing projects that align with their personal values.
  • Spending time learning the project before intensifying contributions leads to core status 2.4-2.9 times faster than immediate heavy involvement.
  • Broad early exploration of the project codebase is a stronger predictor of core achievement than any single contribution pattern.
  • OSS4SG projects provide multiple viable routes to core status while conventional OSS relies on one primary route.
  • Maintainers in conventional projects could improve retention by encouraging initial learning periods before demanding intensive contributions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Mission alignment may function as an intrinsic motivator that sustains effort beyond what technical factors alone can achieve.
  • The same temporal patterns of learning before committing could appear in other volunteer-driven online communities that require skill acquisition.
  • Onboarding designs that deliberately slow initial output in favor of exploration might shorten time-to-core across many collaborative platforms.
  • Demographic or cultural differences between contributor pools in the two project types could interact with mission focus and deserve direct measurement.

Load-bearing premise

Observed differences in retention and core transitions arise mainly from the project's social-good mission rather than from differences in project size, popularity, or who chooses to join each type of project.

What would settle it

A study that matches OSS4SG and conventional OSS projects on size, age, popularity, and contributor demographics and then finds no remaining difference in retention or core transition rates.

Figures

Figures reproduced from arXiv: 2601.23142 by Amr Mohamed, Mariam Guizani, Mohamed Ouf.

Figure 1
Figure 1. Figure 1: Distribution comparison of structural metrics and [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of most common pathways between [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Three temporal patterns of contribution intensity [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Temporal pattern effectiveness ranked by time-to [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

Open Source Software (OSS) sustainability relies on newcomers transitioning to core contributors, but this pipeline is broken, with most newcomers becoming inactive after initial contributions. Open Source Software for Social Good (OSS4SG) projects, which prioritize societal impact as their primary mission, may be associated with different newcomer-to-core transition outcomes than conventional OSS projects. We compared 375 projects (190 OSS4SG, 185 OSS), analyzing 92,721 contributors and 3.5 million commits. OSS4SG projects retain contributors at 2.2X higher rates and contributors have 19.6% higher probability of achieving core status. Early broad project exploration predicts core achievement (22.2% importance); conventional OSS concentrates on one dominant pathway (61.62% of transitions) while OSS4SG provides multiple pathways. Contrary to intuition, contributors who invest time learning the project before intensifying their contributions (Late Spike pattern) achieve core status 2.4-2.9X faster (21 weeks) than those who contribute intensively from day one (Early Spike pattern, 51-60 weeks). OSS4SG supports two effective temporal patterns while only Late Spike achieves fastest time-to-core in conventional OSS. Our findings suggest that finding a project aligned with personal values and taking time to understand the codebase before major contributions are key strategies for achieving core status. Our findings show that project mission is associated with measurably different environments for newcomer-to-core transitions and provide evidence-based guidance for newcomers and maintainers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript analyzes newcomer-to-core transitions in 190 OSS4SG and 185 conventional OSS projects using data from 92,721 contributors and 3.5 million commits. It reports that OSS4SG projects retain contributors at 2.2 times higher rates and offer a 19.6% higher probability of achieving core status. Key predictors include early broad project exploration (22.2% importance), and temporal patterns where 'Late Spike' contributors reach core status 2.4-2.9 times faster than 'Early Spike' ones. The paper concludes that project mission (social good vs conventional) is associated with different transition environments and provides guidance for contributors.

Significance. If the reported differences hold after accounting for potential confounders, this work would be significant for understanding OSS sustainability, particularly highlighting how mission-driven projects may foster better retention and core transitions. The large-scale empirical analysis and identification of specific temporal patterns provide actionable insights for both newcomers and project maintainers in the software engineering community.

major comments (3)
  1. [Abstract and Results] Abstract and central results: the 2.2X retention multiplier and 19.6% higher core-transition probability are presented as associated with project mission without any reported regression controls, propensity-score matching, or stratification for project size (contributor count, commit volume), popularity (stars/forks), age, or domain. Because OSS4SG projects were likely selected on observables that correlate with these variables, the attribution to mission remains vulnerable to confounding.
  2. [Methods] Methods and feature-importance section: the 22.2% importance score for early broad exploration and the classification of temporal patterns (Late Spike vs Early Spike) lack reported details on the underlying model (e.g., random forest or XGBoost), cross-validation procedure, handling of missing contributor data, and sensitivity to the core-status threshold. These omissions make it impossible to evaluate whether the reported speed advantage (21 weeks vs 51-60 weeks) is robust.
  3. [Results] Temporal-patterns analysis: the claim that OSS4SG supports two effective pathways while conventional OSS supports only Late Spike is load-bearing for the multi-pathway conclusion, yet no project-level covariates are included in the survival or transition models. If Late-Spike contributors are disproportionately drawn from larger OSS4SG projects, the 2.4-2.9X speed differential cannot be credited to mission.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'multiple pathways' is used without enumerating them; a short list or reference to the relevant figure would improve readability.
  2. [Results] Ensure every reported multiplier and percentage (2.2X, 19.6%, 22.2%, 2.4-2.9X) is accompanied by a confidence interval or standard error in the main text and tables.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful for the referee's constructive comments, which have prompted us to strengthen the causal claims and methodological transparency in our work. We have made substantial revisions to address all major concerns, including adding controls for confounding, detailed methods descriptions, and robustness checks for the temporal patterns. We believe these changes significantly improve the paper and hope it now meets the standards for publication.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and central results: the 2.2X retention multiplier and 19.6% higher core-transition probability are presented as associated with project mission without any reported regression controls, propensity-score matching, or stratification for project size (contributor count, commit volume), popularity (stars/forks), age, or domain. Because OSS4SG projects were likely selected on observables that correlate with these variables, the attribution to mission remains vulnerable to confounding.

    Authors: We thank the referee for highlighting this important point. The original analysis was primarily descriptive. To address potential confounding, we have now incorporated multivariate regression models controlling for project size, popularity, age, and domain. We also performed propensity score matching on these observables. After these adjustments, the key differences remain statistically significant, supporting the association with project mission. We have updated the abstract and results sections to include these controls and matching results. revision: yes

  2. Referee: [Methods] Methods and feature-importance section: the 22.2% importance score for early broad exploration and the classification of temporal patterns (Late Spike vs Early Spike) lack reported details on the underlying model (e.g., random forest or XGBoost), cross-validation procedure, handling of missing contributor data, and sensitivity to the core-status threshold. These omissions make it impossible to evaluate whether the reported speed advantage (21 weeks vs 51-60 weeks) is robust.

    Authors: We agree that additional methodological details are necessary for reproducibility. In the revised manuscript, we specify that we used a Random Forest classifier for feature importance, with 5-fold cross-validation to assess performance. Missing data (less than 5% of contributors) was handled via multiple imputation. We also conducted sensitivity analyses varying the core-status threshold and report that the Late Spike advantage persists across thresholds. These details have been added to the Methods section. revision: yes

  3. Referee: [Results] Temporal-patterns analysis: the claim that OSS4SG supports two effective pathways while conventional OSS supports only Late Spike is load-bearing for the multi-pathway conclusion, yet no project-level covariates are included in the survival or transition models. If Late-Spike contributors are disproportionately drawn from larger OSS4SG projects, the 2.4-2.9X speed differential cannot be credited to mission.

    Authors: We appreciate this concern. We have extended the survival analysis (Cox proportional hazards models) to include project-level covariates such as size, popularity, and age as controls. The models show that the hazard ratio for Late Spike remains significantly higher even after controlling for these factors. Furthermore, we stratified the analysis by project size quartiles and found the multi-pathway advantage in OSS4SG holds within each stratum. We have added these results to the revised manuscript, including new tables and figures. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparisons rest on direct data analysis

full rationale

The paper performs statistical comparisons of retention rates, core-transition probabilities, and temporal patterns across 375 projects using commit histories. No equations or derivations are presented that reduce to self-definition, fitted parameters renamed as predictions, or load-bearing self-citations. The 22.2% importance figure is a model output but does not create a circular loop because the headline retention (2.2X) and probability (19.6%) differences are computed directly from the observed contributor data rather than being forced by the importance model. The analysis is therefore self-contained against the external commit dataset.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard OSS literature definitions of 'newcomer' and 'core status' plus the assumption that project mission can be cleanly categorized without major overlap or mislabeling; no explicit free parameters are named in the abstract, but thresholds for activity patterns are implicit.

free parameters (1)
  • core status threshold
    Definition of when a contributor reaches core status is not detailed but must involve some activity cutoff that could be chosen or fitted.
axioms (1)
  • domain assumption Project mission (OSS4SG vs conventional) can be reliably and non-overlappingly classified from stated goals.
    The split into 190 OSS4SG and 185 conventional projects assumes accurate categorization without confounding by other project traits.

pith-pipeline@v0.9.0 · 5582 in / 1485 out tokens · 38503 ms · 2026-05-16T09:16:14.088552+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Same Project, Different Start: How Contribution Events Shape Activity and Retention in Open Source

    cs.HC 2026-04 unverdicted novelty 7.0

    Event-based contributors show higher core-contributor rates and longer retention than organic ones, with mentorship linked to steady engagement but also mentor dependency after programs end.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Adam Alami, Raúl Pardo, and Johan Linåker. 2024. Free open source communities sustainability: Does it make a difference in software quality?Empirical Software Engineering29, 5 (2024), 114

  2. [2]

    Sadika Amreen, Audris Mockus, Russell Zaretzki, Christopher Bogart, and Yuxia Zhang. 2020. ALFAA: Active Learning Fingerprint based Anti-Aliasing for cor- recting developer identity errors in version control systems.Empirical Software Engineering25, 2 (2020), 1136–1167

  3. [3]

    Boris Baldassari and Philippe Preux. 2014. Understanding software evolution: The Maisqual Ant data set. InProceedings of the 11th working conference on mining software repositories. ACM, New York, NY, USA, 424–427. Do Good, Stay Longer? Temporal Patterns and Predictors of Newcomer-to-Core Transitions in Conventional OSS and OSS4SG XX, June 03–05, 2025, Lo...

  4. [4]

    Baum, Ted Petrie, G

    Leonard E. Baum, Ted Petrie, G. Soules, and N. Weiss. 1970. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains.The Annals of Mathematical Statistics41, 1 (1970), 164–171

  5. [5]

    Berndt and James Clifford

    Donald J. Berndt and James Clifford. 1994. Using Dynamic Time Warping to Find Patterns in Time Series. InProceedings of the AAAI-94 Workshop on Knowledge Discovery in Databases (KDD-94). AAAI Press, Seattle, WA, USA, 359–370. https: //aaai.org/papers/359-ws94-03-031/

  6. [6]

    Thomas Bock, Nils Alznauer, Mitchell Joblin, and Sven Apel. 2023. Automatic core-developer identification on GitHub: A validation study.ACM Transactions on Software Engineering and Methodology32, 6 (2023), 1–29

  7. [7]

    Carlo Bonferroni. 1936. Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R istituto superiore di scienze economiche e commericiali di firenze 8 (1936), 3–62

  8. [8]

    Fabio Calefato, Marco Aurelio Gerosa, Giuseppe Iaffaldano, Filippo Lanubile, and Igor Steinmacher. 2022. Will you come back to contribute? Investigating the inactivity of OSS core developers in GitHub.Empirical Software Engineering27, 3 (2022), 76

  9. [9]

    Fabio Calefato, Marco Aurélio Gerosa, Giuseppe Iaffaldano, Filippo Lanubile, and Igor Steinmacher. 2022. Will You Come Back to Contribute? Investigating the Inactivity of OSS Core Developers in GitHub.Empirical Software Engineering27, 3, Article 76 (2022), 42 pages. https://doi.org/10.1007/s10664-021-10012-6

  10. [10]

    Chayn. 2025. Little Window. https://github.com/chaynHQ/little-window. Ac- cessed: March 12, 2025

  11. [11]

    Silva, and André Hora

    Jailton Coelho, Marco Túlio Valente, Luciana L. Silva, and André Hora. 2018. Why We Engage in FLOSS: Answers from Core Developers. InProceedings of the 11th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE ’18). ACM, New York, NY, USA, 41–44. https://doi.org/10. 1145/3195836.3195848

  12. [12]

    D. R. Cox. 1972. Regression Models and Life-Tables.Journal of the Royal Statistical Society: Series B (Methodological)34, 2 (1972), 187–220. https://doi.org/10.1111/j. 2517-6161.1972.tb00899.x

  13. [13]

    2024.Digital Public Goods Alliance Registry

    Digital Public Goods Alliance. 2024.Digital Public Goods Alliance Registry. https: //digitalpublicgoods.net/registry/ Accessed: 2025-07-18

  14. [14]

    Dimagi. 2025. CommCare. https://www.dimagi.com/commcare/. Accessed: March 12, 2025

  15. [15]

    Zihan Fang, Madeline Endres, Thomas Zimmermann, Denae Ford, Westley Weimer, Kevin Leach, and Yu Huang. 2023. A Four-Year Study of Student Contri- butions to OSS vs. OSS4SG with a Lightweight Intervention. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, New York, NY...

  16. [16]

    Fabio Ferreira, Luciana Lourdes Silva, and Marco Tulio Valente. 2020. Turnover in Open-Source Projects: The Case of Core Developers. InProceedings of the XXXIV SBES. SBC, 447–456

  17. [17]

    Silva, and Marco Túlio Valente

    Fabio Ferreira, Luciana L. Silva, and Marco Túlio Valente. 2020. Turnover in Open-Source Projects: The Case of Core Developers. InProceedings of the 34th Brazilian Symposium on Software Engineering (SBES ’20). ACM, New York, NY, USA, 447–456. https://doi.org/10.1145/3422392.3422433

  18. [18]

    Armstrong Foundjem, Ellis Eghan, and Bram Adams. 2021. Onboarding vs. diversity, productivity and quality—empirical study of the openstack ecosystem. In2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE/ACM, New York, NY, USA, 1033–1045

  19. [19]

    Marco Gerosa, Igor Wiese, Bianca Trinkenreich, Georg Link, Gregorio Robles, Christoph Treude, Igor Steinmacher, and Anita Sarma. 2021. The Shifting Sands of Motivation: Revisiting What Drives Contributors in Open Source. InProceedings of the 43rd International Conference on Software Engineering (ICSE). IEEE/ACM, 1–12

  20. [20]

    Corrado Gini. 1921. Measurement of Inequality of Incomes.The Economic Journal 31, 121 (1921), 124–126. https://doi.org/10.2307/2223319

  21. [21]

    GitHub. 2025. Dependabot options reference. https://docs.github.com/en/code- security/dependabot/working-with-dependabot/dependabot-options- reference. GitHub Docs. Accessed: 2025-09-09

  22. [22]

    GitHub, Inc. 2024. GitHub REST API v3. https://docs.github.com/en/rest. Ac- cessed: 2024-02-16

  23. [23]

    Georgios Gousios, Martin Pinzger, and Arie van Deursen. 2014. An exploratory study of the pull-based software development model. InProceedings of the 36th international conference on software engineering. ACM, New York, NY, USA, 345– 355

  24. [24]

    Mariam Guizani, Aileen Abril Castro-Guzman, Anita Sarma, and Igor Steinmacher

  25. [25]

    In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)

    Rules of Engagement: Why and How Companies Participate in OSS. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE/ACM, New York, NY, USA, 2617–2629

  26. [26]

    Mariam Guizani, Thomas Zimmermann, Anita Sarma, and Denae Ford. 2022. Attracting and retaining oss contributors with a maintainer dashboard. InProceed- ings of the 2022 ACM/IEEE 44th International Conference on Software Engineering: Software Engineering in Society. ACM/IEEE, New York, NY, USA, 36–40

  27. [27]

    Yuan Huang, Denae Ford, and Thomas Zimmermann. 2021. Leaving My Finger- prints: Motivations and Challenges of Contributing to OSS for Social Good. In Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE). IEEE/ACM, New York, NY, USA, 1020–1032

  28. [28]

    Elgun Jabrayilzade, Mikhail Evtikhiev, Eray Tüzün, and Vladimir Kovalenko

  29. [29]

    InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP ’22)

    Bus Factor in Practice. InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP ’22). ACM, New York, NY, USA, 97–106. https://doi.org/10.1145/3510457.3513082

  30. [30]

    Mitchell Joblin, Sven Apel, Claus Hunsen, and Wolfgang Mauerer. 2017. Clas- sifying Developers into Core and Peripheral: An Empirical Study on Count and Network Metrics. InProceedings of the 39th International Conference on Software Engineering (ICSE). IEEE/ACM, Piscataway, NJ, USA, 164–174. https: //doi.org/10.1109/ICSE.2017.23

  31. [31]

    Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. 2014. The promises and perils of mining github. In Proceedings of the 11th working conference on mining software repositories. ACM, New York, NY, USA, 92–101

  32. [32]

    Nonparametric Estimation from Incomplete Observations,

    Edward L. Kaplan and Paul Meier. 1958. Nonparametric Estimation from In- complete Observations.J. Amer. Statist. Assoc.53, 282 (1958), 457–481. https: //doi.org/10.1080/01621459.1958.10501452

  33. [33]

    Damien Legay, Alexandre Decan, and Tom Mens. 2018. On the Impact of Pull Request Decisions on Future Contributions.arXiv preprint arXiv:1812.06269 (2018), 1–25. arXiv:cs.SE/1812.06269 https://arxiv.org/abs/1812.06269

  34. [34]

    Valentina Lenarduzzi, Vili Nikkola, Nyyti Saarimäki, and Davide Taibi. 2021. Does code quality affect pull request acceptance? An empirical study.Journal of Systems and Software171 (2021), 110806

  35. [35]

    Addi Malviya-Thakur and Audris Mockus. 2024. The Role of Data Filtering in Open Source Software Ranking and Selection. InProceedings of the 1st IEEE/ACM International Workshop on Methodological Issues with Empirical Studies in Software Engineering. IEEE/ACM, New York, NY, USA, 7–12

  36. [36]

    H. B. Mann and D. R. Whitney. 1947. On a Test of Whether One of Two Random Variables Is Stochastically Larger than the Other.Annals of Mathematical Statistics 18, 1 (1947), 50–60. https://doi.org/10.1214/aoms/1177730491

  37. [37]

    Jennifer Marlow, Laura Dabbish, and Jim Herbsleb. 2013. Impression Formation in Online Peer Production: Activity Traces and Personal Profiles in Github. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work (CSCW ’13). ACM, New York, NY, USA, 117–128

  38. [38]

    Patrick E McKight and Julius Najab. 2010. Kruskal-wallis test.The corsini encyclopedia of psychology2, 1 (2010), 1–1

  39. [39]

    Mohamed Ouf, Shayan Noei, Zeph Van Iterson, Mariam Guizani, and Ying Zou

  40. [40]

    Conventional OSS.arXiv preprint arXiv:2601.03430(2026), 1–20

    An Empirical Analysis of Community and Coding Patterns in OSS4SG vs. Conventional OSS.arXiv preprint arXiv:2601.03430(2026), 1–20

  41. [41]

    Ovio. 2021. Contribute to open-source. Be part of the future! https://ovio.org/. Online; accessed 2021

  42. [42]

    Julia Pantiuchina, Bo Lin, Fabio Zampetti, Massimiliano Di Penta, Michele Lanza, and Gabriele Bavota. 2021. Why Do Developers Reject Refactorings in Open- Source Projects?ACM Transactions on Software Engineering and Methodology (TOSEM)31, 2 (2021), 1–23

  43. [43]

    François Petitjean, Alain Ketterlin, and Pierre Gançarski. 2011. A global averaging method for dynamic time warping, with applications to clustering.Pattern recognition44, 3 (2011), 678–693

  44. [44]

    Gustavo Pinto, Igor Steinmacher, and Marco Aurélio Gerosa. 2016. More Common Than You Think: An In-depth Study of Casual Contributors. InIEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER) (SANER 2016), Vol. 1. IEEE, Piscataway, NJ, USA, 112–123

  45. [45]

    Kromrey, Jesse Coraggio, and Jim Skowronek

    Jeanine Romano, Jeffrey D. Kromrey, Jesse Coraggio, and Jim Skowronek. 2006. Appropriate Statistics for Ordinal Level Data: Should We Really Be Using t- test and Cohen’s d for Evaluating Group Differences on the NSSE and Other Surveys?. InAnnual Meeting of the Florida Association of Institutional Research (FAIR). FAIR, Cocoa Beach, FL, 1–33. Provides Clif...

  46. [46]

    Rousseeuw

    Peter J. Rousseeuw. 1987. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis.J. Comput. Appl. Math.20 (1987), 53–65. https://doi.org/10.1016/0377-0427(87)90125-7

  47. [47]

    Hiroaki Sakoe and Seibi Chiba. 1971. A Dynamic Programming Approach to Continuous Speech Recognition. InProceedings of the 7th International Congress on Acoustics. Akademiai Kiado, Budapest, 65–68

  48. [48]

    A. J. Scott and M. Knott. 1974. A Cluster Analysis Method for Grouping Means in the Analysis of Variance.Biometrics30, 3 (1974), 507–512. https://doi.org/10. 2307/2529204

  49. [49]

    Leif Singer, Fernando Figueira Filho, Brendan Cleary, Christoph Treude, Margaret- Anne Storey, and Kurt Schneider. 2013. Mutual Assessment in the Social Program- mer Ecosystem: An Empirical Investigation of Developer Profile Aggregators. InProceedings of the 2013 Conference on Computer Supported Cooperative Work (CSCW ’13). ACM, New York, NY, USA, 103–116

  50. [50]

    Param Vir Singh, Yong Tan, and Nara Youn. 2011. A hidden Markov model of developer learning dynamics in open source software projects.Information Systems Research22, 4 (2011), 790–807

  51. [51]

    Igor Steinmacher, Tayana Conte, Marco Aurélio Gerosa, and David Redmiles

  52. [52]

    InProceedings of the 18th ACM conference on XX, June 03–05, 2025, Location, State M

    Social barriers faced by newcomers placing their first contribution in open source software projects. InProceedings of the 18th ACM conference on XX, June 03–05, 2025, Location, State M. Ouf, A. Mohamed, M. Guizani Computer supported cooperative work & social computing. ACM, New York, NY, USA, 1379–1392

  53. [53]

    Xin Tan, Minghui Zhou, and Li Zhang. 2024. How to Gain Commit Rights in Modern Top Open Source Communities? arXiv:cs.SE/2405.01803 https: //arxiv.org/abs/2405.01803

  54. [54]

    Thorndike

    Robert L. Thorndike. 1953. Who Belongs in the Family?Psychometrika18, 4 (1953), 267–276. https://doi.org/10.1007/BF02289263

  55. [55]

    Bianca Trinkenreich, Mariam Guizani, Igor Wiese, Anita Sarma, and Igor Stein- macher. 2020. Hidden figures: Roles and pathways of successful oss contributors. Proceedings of the ACM on human-computer interaction4, CSCW2 (2020), 1–22

  56. [56]

    M Vaccargiu, R Neykova, N Novielli, M Ortu, and G Destefanis. 2025. More than code: Technical and emotional dynamics in Solidity’s development. InProceedings of the 2025 IEEE/ACM 18th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE/ACM, New York, NY, USA, 1–12

  57. [57]

    Tao Wang, Yang Zhang, Gang Yin, Yue Yu, and Huaimin Wang. 2018. Who will become a long-term contributor? A prediction model based on the early phase behaviors. InProceedings of the 10th Asia-Pacific Symposium on Internetware. ACM, New York, NY, USA, 1–10

  58. [58]

    Hassan, and Shanping Li

    Xin Xia, Lingfeng Bao, David Lo, Ahmed E. Hassan, and Shanping Li. 2021. A Large Scale Study of Long-Time Contributor Prediction for GitHub Projects.IEEE Transactions on Software Engineering47, 6 (2021), 1277–1298. https://doi.org/10. 1109/TSE.2019.2918536

  59. [59]

    Wenxin Xiao, Hao He, Weiwei Xu, Yuxia Zhang, and Minghui Zhou. 2023. How early participation determines long-term sustained activity in github projects?. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, New York, NY, USA, 29–41

  60. [61]

    Hassan, and Naoyasu Ubayashi

    Kazuhiro Yamashita, Shane McIntosh, Yasutaka Kamei, Ahmed E. Hassan, and Naoyasu Ubayashi. 2015. Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects. InProceedings of the 14th International Workshop on Principles of Software Evolution (IWPSE ’15). ACM, New York, NY, USA, 46–55. https://doi.org/10...

  61. [62]

    Marcelo Serrano Zanetti, Emre Sarigol, Ingo Scholtes, Claudio Juan Tessone, and Frank Schweitzer. 2012. A quantitative study of social organisation in open source software communities.arXiv preprint arXiv:1208.4289(2012), 1–15

  62. [63]

    Minghui Zhou and Audris Mockus. 2012. What make long term contributors: Will- ingness and opportunity in OSS community. In2012 34th International Conference on Software Engineering (ICSE). IEEE, Piscataway, NJ, USA, 518–528

  63. [64]

    Jiaxin Zhu and Jun Wei. 2019. An empirical study of multiple names and email addresses in oss version control repositories. In2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE/ACM, Piscataway, NJ, USA, 409–420