arxiv: 2601.23142 · v2 · submitted 2026-01-30 · 💻 cs.SE

Recognition: 2 theorem links

· Lean Theorem

Do Good, Stay Longer? Temporal Patterns and Predictors of Newcomer-to-Core Transitions in Conventional OSS and OSS4SG

Mohamed Ouf , Amr Mohamed , Mariam Guizani

Authors on Pith no claims yet

Pith reviewed 2026-05-16 09:16 UTC · model grok-4.3

classification 💻 cs.SE

keywords open source softwarecontributor retentioncore contributorssocial good projectstemporal patternsnewcomer transitionsOSS sustainabilityonboarding

0 comments

The pith

Open source projects with a social-good mission retain contributors more than twice as often and turn nearly 20 percent more into core developers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares how newcomers move from first contributions to core roles in open source projects that focus on societal impact versus standard projects. It finds that social-good projects hold onto new contributors at 2.2 times the rate of conventional ones and give them a 19.6 percent higher chance of reaching core status. Broad early exploration of the project stands out as a strong predictor of success. Contributors who spend time learning the code before making heavy contributions reach core status two to three times faster than those who start with intensive work right away. Social-good projects support several workable paths to core roles while conventional projects mostly follow one pattern.

Core claim

OSS4SG projects retain contributors at 2.2X higher rates and contributors have 19.6% higher probability of achieving core status. Early broad project exploration predicts core achievement with 22.2% importance. Contributors who follow a Late Spike temporal pattern achieve core status 2.4-2.9X faster than those following an Early Spike pattern. OSS4SG supports two effective temporal patterns for core transitions while conventional OSS concentrates on one dominant pathway.

What carries the argument

Temporal contribution patterns (Early Spike versus Late Spike) analyzed across project mission types (OSS4SG versus conventional OSS) as predictors of newcomer retention and core status achievement.

If this is right

Newcomers increase their odds of long-term involvement by choosing projects that align with their personal values.
Spending time learning the project before intensifying contributions leads to core status 2.4-2.9 times faster than immediate heavy involvement.
Broad early exploration of the project codebase is a stronger predictor of core achievement than any single contribution pattern.
OSS4SG projects provide multiple viable routes to core status while conventional OSS relies on one primary route.
Maintainers in conventional projects could improve retention by encouraging initial learning periods before demanding intensive contributions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Mission alignment may function as an intrinsic motivator that sustains effort beyond what technical factors alone can achieve.
The same temporal patterns of learning before committing could appear in other volunteer-driven online communities that require skill acquisition.
Onboarding designs that deliberately slow initial output in favor of exploration might shorten time-to-core across many collaborative platforms.
Demographic or cultural differences between contributor pools in the two project types could interact with mission focus and deserve direct measurement.

Load-bearing premise

Observed differences in retention and core transitions arise mainly from the project's social-good mission rather than from differences in project size, popularity, or who chooses to join each type of project.

What would settle it

A study that matches OSS4SG and conventional OSS projects on size, age, popularity, and contributor demographics and then finds no remaining difference in retention or core transition rates.

Figures

Figures reproduced from arXiv: 2601.23142 by Amr Mohamed, Mariam Guizani, Mohamed Ouf.

**Figure 3.** Figure 3: Comparison of most common pathways between [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Three temporal patterns of contribution intensity [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Temporal pattern effectiveness ranked by time-to [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

Open Source Software (OSS) sustainability relies on newcomers transitioning to core contributors, but this pipeline is broken, with most newcomers becoming inactive after initial contributions. Open Source Software for Social Good (OSS4SG) projects, which prioritize societal impact as their primary mission, may be associated with different newcomer-to-core transition outcomes than conventional OSS projects. We compared 375 projects (190 OSS4SG, 185 OSS), analyzing 92,721 contributors and 3.5 million commits. OSS4SG projects retain contributors at 2.2X higher rates and contributors have 19.6% higher probability of achieving core status. Early broad project exploration predicts core achievement (22.2% importance); conventional OSS concentrates on one dominant pathway (61.62% of transitions) while OSS4SG provides multiple pathways. Contrary to intuition, contributors who invest time learning the project before intensifying their contributions (Late Spike pattern) achieve core status 2.4-2.9X faster (21 weeks) than those who contribute intensively from day one (Early Spike pattern, 51-60 weeks). OSS4SG supports two effective temporal patterns while only Late Spike achieves fastest time-to-core in conventional OSS. Our findings suggest that finding a project aligned with personal values and taking time to understand the codebase before major contributions are key strategies for achieving core status. Our findings show that project mission is associated with measurably different environments for newcomer-to-core transitions and provide evidence-based guidance for newcomers and maintainers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript analyzes newcomer-to-core transitions in 190 OSS4SG and 185 conventional OSS projects using data from 92,721 contributors and 3.5 million commits. It reports that OSS4SG projects retain contributors at 2.2 times higher rates and offer a 19.6% higher probability of achieving core status. Key predictors include early broad project exploration (22.2% importance), and temporal patterns where 'Late Spike' contributors reach core status 2.4-2.9 times faster than 'Early Spike' ones. The paper concludes that project mission (social good vs conventional) is associated with different transition environments and provides guidance for contributors.

Significance. If the reported differences hold after accounting for potential confounders, this work would be significant for understanding OSS sustainability, particularly highlighting how mission-driven projects may foster better retention and core transitions. The large-scale empirical analysis and identification of specific temporal patterns provide actionable insights for both newcomers and project maintainers in the software engineering community.

major comments (3)

[Abstract and Results] Abstract and central results: the 2.2X retention multiplier and 19.6% higher core-transition probability are presented as associated with project mission without any reported regression controls, propensity-score matching, or stratification for project size (contributor count, commit volume), popularity (stars/forks), age, or domain. Because OSS4SG projects were likely selected on observables that correlate with these variables, the attribution to mission remains vulnerable to confounding.
[Methods] Methods and feature-importance section: the 22.2% importance score for early broad exploration and the classification of temporal patterns (Late Spike vs Early Spike) lack reported details on the underlying model (e.g., random forest or XGBoost), cross-validation procedure, handling of missing contributor data, and sensitivity to the core-status threshold. These omissions make it impossible to evaluate whether the reported speed advantage (21 weeks vs 51-60 weeks) is robust.
[Results] Temporal-patterns analysis: the claim that OSS4SG supports two effective pathways while conventional OSS supports only Late Spike is load-bearing for the multi-pathway conclusion, yet no project-level covariates are included in the survival or transition models. If Late-Spike contributors are disproportionately drawn from larger OSS4SG projects, the 2.4-2.9X speed differential cannot be credited to mission.

minor comments (2)

[Abstract] Abstract: the phrase 'multiple pathways' is used without enumerating them; a short list or reference to the relevant figure would improve readability.
[Results] Ensure every reported multiplier and percentage (2.2X, 19.6%, 22.2%, 2.4-2.9X) is accompanied by a confidence interval or standard error in the main text and tables.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful for the referee's constructive comments, which have prompted us to strengthen the causal claims and methodological transparency in our work. We have made substantial revisions to address all major concerns, including adding controls for confounding, detailed methods descriptions, and robustness checks for the temporal patterns. We believe these changes significantly improve the paper and hope it now meets the standards for publication.

read point-by-point responses

Referee: [Abstract and Results] Abstract and central results: the 2.2X retention multiplier and 19.6% higher core-transition probability are presented as associated with project mission without any reported regression controls, propensity-score matching, or stratification for project size (contributor count, commit volume), popularity (stars/forks), age, or domain. Because OSS4SG projects were likely selected on observables that correlate with these variables, the attribution to mission remains vulnerable to confounding.

Authors: We thank the referee for highlighting this important point. The original analysis was primarily descriptive. To address potential confounding, we have now incorporated multivariate regression models controlling for project size, popularity, age, and domain. We also performed propensity score matching on these observables. After these adjustments, the key differences remain statistically significant, supporting the association with project mission. We have updated the abstract and results sections to include these controls and matching results. revision: yes
Referee: [Methods] Methods and feature-importance section: the 22.2% importance score for early broad exploration and the classification of temporal patterns (Late Spike vs Early Spike) lack reported details on the underlying model (e.g., random forest or XGBoost), cross-validation procedure, handling of missing contributor data, and sensitivity to the core-status threshold. These omissions make it impossible to evaluate whether the reported speed advantage (21 weeks vs 51-60 weeks) is robust.

Authors: We agree that additional methodological details are necessary for reproducibility. In the revised manuscript, we specify that we used a Random Forest classifier for feature importance, with 5-fold cross-validation to assess performance. Missing data (less than 5% of contributors) was handled via multiple imputation. We also conducted sensitivity analyses varying the core-status threshold and report that the Late Spike advantage persists across thresholds. These details have been added to the Methods section. revision: yes
Referee: [Results] Temporal-patterns analysis: the claim that OSS4SG supports two effective pathways while conventional OSS supports only Late Spike is load-bearing for the multi-pathway conclusion, yet no project-level covariates are included in the survival or transition models. If Late-Spike contributors are disproportionately drawn from larger OSS4SG projects, the 2.4-2.9X speed differential cannot be credited to mission.

Authors: We appreciate this concern. We have extended the survival analysis (Cox proportional hazards models) to include project-level covariates such as size, popularity, and age as controls. The models show that the hazard ratio for Late Spike remains significantly higher even after controlling for these factors. Furthermore, we stratified the analysis by project size quartiles and found the multi-pathway advantage in OSS4SG holds within each stratum. We have added these results to the revised manuscript, including new tables and figures. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparisons rest on direct data analysis

full rationale

The paper performs statistical comparisons of retention rates, core-transition probabilities, and temporal patterns across 375 projects using commit histories. No equations or derivations are presented that reduce to self-definition, fitted parameters renamed as predictions, or load-bearing self-citations. The 22.2% importance figure is a model output but does not create a circular loop because the headline retention (2.2X) and probability (19.6%) differences are computed directly from the observed contributor data rather than being forced by the importance model. The analysis is therefore self-contained against the external commit dataset.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard OSS literature definitions of 'newcomer' and 'core status' plus the assumption that project mission can be cleanly categorized without major overlap or mislabeling; no explicit free parameters are named in the abstract, but thresholds for activity patterns are implicit.

free parameters (1)

core status threshold
Definition of when a contributor reaches core status is not detailed but must involve some activity cutoff that could be chosen or fitted.

axioms (1)

domain assumption Project mission (OSS4SG vs conventional) can be reliably and non-overlappingly classified from stated goals.
The split into 190 OSS4SG and 185 conventional projects assumes accurate categorization without confounding by other project traits.

pith-pipeline@v0.9.0 · 5582 in / 1485 out tokens · 38503 ms · 2026-05-16T09:16:14.088552+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

OSS4SG projects retain contributors at 2.2X higher rates... Late Spike pattern achieve core status 2.4-2.9X faster
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DTW clustering... Scott-Knott ranking of time-to-core

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Same Project, Different Start: How Contribution Events Shape Activity and Retention in Open Source
cs.HC 2026-04 unverdicted novelty 7.0

Event-based contributors show higher core-contributor rates and longer retention than organic ones, with mentorship linked to steady engagement but also mentor dependency after programs end.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Adam Alami, Raúl Pardo, and Johan Linåker. 2024. Free open source communities sustainability: Does it make a difference in software quality?Empirical Software Engineering29, 5 (2024), 114

work page 2024
[2]

Sadika Amreen, Audris Mockus, Russell Zaretzki, Christopher Bogart, and Yuxia Zhang. 2020. ALFAA: Active Learning Fingerprint based Anti-Aliasing for cor- recting developer identity errors in version control systems.Empirical Software Engineering25, 2 (2020), 1136–1167

work page 2020
[3]

Boris Baldassari and Philippe Preux. 2014. Understanding software evolution: The Maisqual Ant data set. InProceedings of the 11th working conference on mining software repositories. ACM, New York, NY, USA, 424–427. Do Good, Stay Longer? Temporal Patterns and Predictors of Newcomer-to-Core Transitions in Conventional OSS and OSS4SG XX, June 03–05, 2025, Lo...

work page 2014
[4]

Baum, Ted Petrie, G

Leonard E. Baum, Ted Petrie, G. Soules, and N. Weiss. 1970. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains.The Annals of Mathematical Statistics41, 1 (1970), 164–171

work page 1970
[5]

Berndt and James Clifford

Donald J. Berndt and James Clifford. 1994. Using Dynamic Time Warping to Find Patterns in Time Series. InProceedings of the AAAI-94 Workshop on Knowledge Discovery in Databases (KDD-94). AAAI Press, Seattle, WA, USA, 359–370. https: //aaai.org/papers/359-ws94-03-031/

work page 1994
[6]

Thomas Bock, Nils Alznauer, Mitchell Joblin, and Sven Apel. 2023. Automatic core-developer identification on GitHub: A validation study.ACM Transactions on Software Engineering and Methodology32, 6 (2023), 1–29

work page 2023
[7]

Carlo Bonferroni. 1936. Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R istituto superiore di scienze economiche e commericiali di firenze 8 (1936), 3–62

work page 1936
[8]

Fabio Calefato, Marco Aurelio Gerosa, Giuseppe Iaffaldano, Filippo Lanubile, and Igor Steinmacher. 2022. Will you come back to contribute? Investigating the inactivity of OSS core developers in GitHub.Empirical Software Engineering27, 3 (2022), 76

work page 2022
[9]

Fabio Calefato, Marco Aurélio Gerosa, Giuseppe Iaffaldano, Filippo Lanubile, and Igor Steinmacher. 2022. Will You Come Back to Contribute? Investigating the Inactivity of OSS Core Developers in GitHub.Empirical Software Engineering27, 3, Article 76 (2022), 42 pages. https://doi.org/10.1007/s10664-021-10012-6

work page doi:10.1007/s10664-021-10012-6 2022
[10]

Chayn. 2025. Little Window. https://github.com/chaynHQ/little-window. Ac- cessed: March 12, 2025

work page 2025
[11]

Silva, and André Hora

Jailton Coelho, Marco Túlio Valente, Luciana L. Silva, and André Hora. 2018. Why We Engage in FLOSS: Answers from Core Developers. InProceedings of the 11th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE ’18). ACM, New York, NY, USA, 41–44. https://doi.org/10. 1145/3195836.3195848

work page arXiv 2018
[12]

D. R. Cox. 1972. Regression Models and Life-Tables.Journal of the Royal Statistical Society: Series B (Methodological)34, 2 (1972), 187–220. https://doi.org/10.1111/j. 2517-6161.1972.tb00899.x

work page doi:10.1111/j 1972
[13]

2024.Digital Public Goods Alliance Registry

Digital Public Goods Alliance. 2024.Digital Public Goods Alliance Registry. https: //digitalpublicgoods.net/registry/ Accessed: 2025-07-18

work page 2024
[14]

Dimagi. 2025. CommCare. https://www.dimagi.com/commcare/. Accessed: March 12, 2025

work page 2025
[15]

Zihan Fang, Madeline Endres, Thomas Zimmermann, Denae Ford, Westley Weimer, Kevin Leach, and Yu Huang. 2023. A Four-Year Study of Student Contri- butions to OSS vs. OSS4SG with a Lightweight Intervention. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, New York, NY...

work page 2023
[16]

Fabio Ferreira, Luciana Lourdes Silva, and Marco Tulio Valente. 2020. Turnover in Open-Source Projects: The Case of Core Developers. InProceedings of the XXXIV SBES. SBC, 447–456

work page 2020
[17]

Silva, and Marco Túlio Valente

Fabio Ferreira, Luciana L. Silva, and Marco Túlio Valente. 2020. Turnover in Open-Source Projects: The Case of Core Developers. InProceedings of the 34th Brazilian Symposium on Software Engineering (SBES ’20). ACM, New York, NY, USA, 447–456. https://doi.org/10.1145/3422392.3422433

work page doi:10.1145/3422392.3422433 2020
[18]

Armstrong Foundjem, Ellis Eghan, and Bram Adams. 2021. Onboarding vs. diversity, productivity and quality—empirical study of the openstack ecosystem. In2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE/ACM, New York, NY, USA, 1033–1045

work page 2021
[19]

Marco Gerosa, Igor Wiese, Bianca Trinkenreich, Georg Link, Gregorio Robles, Christoph Treude, Igor Steinmacher, and Anita Sarma. 2021. The Shifting Sands of Motivation: Revisiting What Drives Contributors in Open Source. InProceedings of the 43rd International Conference on Software Engineering (ICSE). IEEE/ACM, 1–12

work page 2021
[20]

Corrado Gini. 1921. Measurement of Inequality of Incomes.The Economic Journal 31, 121 (1921), 124–126. https://doi.org/10.2307/2223319

work page doi:10.2307/2223319 1921
[21]

GitHub. 2025. Dependabot options reference. https://docs.github.com/en/code- security/dependabot/working-with-dependabot/dependabot-options- reference. GitHub Docs. Accessed: 2025-09-09

work page 2025
[22]

GitHub, Inc. 2024. GitHub REST API v3. https://docs.github.com/en/rest. Ac- cessed: 2024-02-16

work page 2024
[23]

Georgios Gousios, Martin Pinzger, and Arie van Deursen. 2014. An exploratory study of the pull-based software development model. InProceedings of the 36th international conference on software engineering. ACM, New York, NY, USA, 345– 355

work page 2014
[24]

Mariam Guizani, Aileen Abril Castro-Guzman, Anita Sarma, and Igor Steinmacher

work page
[25]

In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)

Rules of Engagement: Why and How Companies Participate in OSS. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE/ACM, New York, NY, USA, 2617–2629

work page 2023
[26]

Mariam Guizani, Thomas Zimmermann, Anita Sarma, and Denae Ford. 2022. Attracting and retaining oss contributors with a maintainer dashboard. InProceed- ings of the 2022 ACM/IEEE 44th International Conference on Software Engineering: Software Engineering in Society. ACM/IEEE, New York, NY, USA, 36–40

work page 2022
[27]

Yuan Huang, Denae Ford, and Thomas Zimmermann. 2021. Leaving My Finger- prints: Motivations and Challenges of Contributing to OSS for Social Good. In Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE). IEEE/ACM, New York, NY, USA, 1020–1032

work page 2021
[28]

Elgun Jabrayilzade, Mikhail Evtikhiev, Eray Tüzün, and Vladimir Kovalenko

work page
[29]

InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP ’22)

Bus Factor in Practice. InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP ’22). ACM, New York, NY, USA, 97–106. https://doi.org/10.1145/3510457.3513082

work page doi:10.1145/3510457.3513082
[30]

Mitchell Joblin, Sven Apel, Claus Hunsen, and Wolfgang Mauerer. 2017. Clas- sifying Developers into Core and Peripheral: An Empirical Study on Count and Network Metrics. InProceedings of the 39th International Conference on Software Engineering (ICSE). IEEE/ACM, Piscataway, NJ, USA, 164–174. https: //doi.org/10.1109/ICSE.2017.23

work page doi:10.1109/icse.2017.23 2017
[31]

Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. 2014. The promises and perils of mining github. In Proceedings of the 11th working conference on mining software repositories. ACM, New York, NY, USA, 92–101

work page 2014
[32]

Nonparametric Estimation from Incomplete Observations,

Edward L. Kaplan and Paul Meier. 1958. Nonparametric Estimation from In- complete Observations.J. Amer. Statist. Assoc.53, 282 (1958), 457–481. https: //doi.org/10.1080/01621459.1958.10501452

work page doi:10.1080/01621459.1958.10501452 1958
[33]

Damien Legay, Alexandre Decan, and Tom Mens. 2018. On the Impact of Pull Request Decisions on Future Contributions.arXiv preprint arXiv:1812.06269 (2018), 1–25. arXiv:cs.SE/1812.06269 https://arxiv.org/abs/1812.06269

work page internal anchor Pith review Pith/arXiv arXiv 2018
[34]

Valentina Lenarduzzi, Vili Nikkola, Nyyti Saarimäki, and Davide Taibi. 2021. Does code quality affect pull request acceptance? An empirical study.Journal of Systems and Software171 (2021), 110806

work page 2021
[35]

Addi Malviya-Thakur and Audris Mockus. 2024. The Role of Data Filtering in Open Source Software Ranking and Selection. InProceedings of the 1st IEEE/ACM International Workshop on Methodological Issues with Empirical Studies in Software Engineering. IEEE/ACM, New York, NY, USA, 7–12

work page 2024
[36]

H. B. Mann and D. R. Whitney. 1947. On a Test of Whether One of Two Random Variables Is Stochastically Larger than the Other.Annals of Mathematical Statistics 18, 1 (1947), 50–60. https://doi.org/10.1214/aoms/1177730491

work page doi:10.1214/aoms/1177730491 1947
[37]

Jennifer Marlow, Laura Dabbish, and Jim Herbsleb. 2013. Impression Formation in Online Peer Production: Activity Traces and Personal Profiles in Github. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work (CSCW ’13). ACM, New York, NY, USA, 117–128

work page 2013
[38]

Patrick E McKight and Julius Najab. 2010. Kruskal-wallis test.The corsini encyclopedia of psychology2, 1 (2010), 1–1

work page 2010
[39]

Mohamed Ouf, Shayan Noei, Zeph Van Iterson, Mariam Guizani, and Ying Zou

work page
[40]

Conventional OSS.arXiv preprint arXiv:2601.03430(2026), 1–20

An Empirical Analysis of Community and Coding Patterns in OSS4SG vs. Conventional OSS.arXiv preprint arXiv:2601.03430(2026), 1–20

work page arXiv 2026
[41]

Ovio. 2021. Contribute to open-source. Be part of the future! https://ovio.org/. Online; accessed 2021

work page 2021
[42]

Julia Pantiuchina, Bo Lin, Fabio Zampetti, Massimiliano Di Penta, Michele Lanza, and Gabriele Bavota. 2021. Why Do Developers Reject Refactorings in Open- Source Projects?ACM Transactions on Software Engineering and Methodology (TOSEM)31, 2 (2021), 1–23

work page 2021
[43]

François Petitjean, Alain Ketterlin, and Pierre Gançarski. 2011. A global averaging method for dynamic time warping, with applications to clustering.Pattern recognition44, 3 (2011), 678–693

work page 2011
[44]

Gustavo Pinto, Igor Steinmacher, and Marco Aurélio Gerosa. 2016. More Common Than You Think: An In-depth Study of Casual Contributors. InIEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER) (SANER 2016), Vol. 1. IEEE, Piscataway, NJ, USA, 112–123

work page 2016
[45]

Kromrey, Jesse Coraggio, and Jim Skowronek

Jeanine Romano, Jeffrey D. Kromrey, Jesse Coraggio, and Jim Skowronek. 2006. Appropriate Statistics for Ordinal Level Data: Should We Really Be Using t- test and Cohen’s d for Evaluating Group Differences on the NSSE and Other Surveys?. InAnnual Meeting of the Florida Association of Institutional Research (FAIR). FAIR, Cocoa Beach, FL, 1–33. Provides Clif...

work page 2006
[46]

Rousseeuw

Peter J. Rousseeuw. 1987. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis.J. Comput. Appl. Math.20 (1987), 53–65. https://doi.org/10.1016/0377-0427(87)90125-7

work page doi:10.1016/0377-0427(87)90125-7 1987
[47]

Hiroaki Sakoe and Seibi Chiba. 1971. A Dynamic Programming Approach to Continuous Speech Recognition. InProceedings of the 7th International Congress on Acoustics. Akademiai Kiado, Budapest, 65–68

work page 1971
[48]

A. J. Scott and M. Knott. 1974. A Cluster Analysis Method for Grouping Means in the Analysis of Variance.Biometrics30, 3 (1974), 507–512. https://doi.org/10. 2307/2529204

work page 1974
[49]

Leif Singer, Fernando Figueira Filho, Brendan Cleary, Christoph Treude, Margaret- Anne Storey, and Kurt Schneider. 2013. Mutual Assessment in the Social Program- mer Ecosystem: An Empirical Investigation of Developer Profile Aggregators. InProceedings of the 2013 Conference on Computer Supported Cooperative Work (CSCW ’13). ACM, New York, NY, USA, 103–116

work page 2013
[50]

Param Vir Singh, Yong Tan, and Nara Youn. 2011. A hidden Markov model of developer learning dynamics in open source software projects.Information Systems Research22, 4 (2011), 790–807

work page 2011
[51]

Igor Steinmacher, Tayana Conte, Marco Aurélio Gerosa, and David Redmiles

work page
[52]

InProceedings of the 18th ACM conference on XX, June 03–05, 2025, Location, State M

Social barriers faced by newcomers placing their first contribution in open source software projects. InProceedings of the 18th ACM conference on XX, June 03–05, 2025, Location, State M. Ouf, A. Mohamed, M. Guizani Computer supported cooperative work & social computing. ACM, New York, NY, USA, 1379–1392

work page 2025
[53]

Xin Tan, Minghui Zhou, and Li Zhang. 2024. How to Gain Commit Rights in Modern Top Open Source Communities? arXiv:cs.SE/2405.01803 https: //arxiv.org/abs/2405.01803

work page arXiv 2024
[54]

Thorndike

Robert L. Thorndike. 1953. Who Belongs in the Family?Psychometrika18, 4 (1953), 267–276. https://doi.org/10.1007/BF02289263

work page doi:10.1007/bf02289263 1953
[55]

Bianca Trinkenreich, Mariam Guizani, Igor Wiese, Anita Sarma, and Igor Stein- macher. 2020. Hidden figures: Roles and pathways of successful oss contributors. Proceedings of the ACM on human-computer interaction4, CSCW2 (2020), 1–22

work page 2020
[56]

M Vaccargiu, R Neykova, N Novielli, M Ortu, and G Destefanis. 2025. More than code: Technical and emotional dynamics in Solidity’s development. InProceedings of the 2025 IEEE/ACM 18th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE/ACM, New York, NY, USA, 1–12

work page 2025
[57]

Tao Wang, Yang Zhang, Gang Yin, Yue Yu, and Huaimin Wang. 2018. Who will become a long-term contributor? A prediction model based on the early phase behaviors. InProceedings of the 10th Asia-Pacific Symposium on Internetware. ACM, New York, NY, USA, 1–10

work page 2018
[58]

Hassan, and Shanping Li

Xin Xia, Lingfeng Bao, David Lo, Ahmed E. Hassan, and Shanping Li. 2021. A Large Scale Study of Long-Time Contributor Prediction for GitHub Projects.IEEE Transactions on Software Engineering47, 6 (2021), 1277–1298. https://doi.org/10. 1109/TSE.2019.2918536

work page arXiv 2021
[59]

Wenxin Xiao, Hao He, Weiwei Xu, Yuxia Zhang, and Minghui Zhou. 2023. How early participation determines long-term sustained activity in github projects?. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, New York, NY, USA, 29–41

work page 2023
[61]

Hassan, and Naoyasu Ubayashi

Kazuhiro Yamashita, Shane McIntosh, Yasutaka Kamei, Ahmed E. Hassan, and Naoyasu Ubayashi. 2015. Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects. InProceedings of the 14th International Workshop on Principles of Software Evolution (IWPSE ’15). ACM, New York, NY, USA, 46–55. https://doi.org/10...

work page doi:10.1145/2804360.2804366 2015
[62]

Marcelo Serrano Zanetti, Emre Sarigol, Ingo Scholtes, Claudio Juan Tessone, and Frank Schweitzer. 2012. A quantitative study of social organisation in open source software communities.arXiv preprint arXiv:1208.4289(2012), 1–15

work page internal anchor Pith review Pith/arXiv arXiv 2012
[63]

Minghui Zhou and Audris Mockus. 2012. What make long term contributors: Will- ingness and opportunity in OSS community. In2012 34th International Conference on Software Engineering (ICSE). IEEE, Piscataway, NJ, USA, 518–528

work page 2012
[64]

Jiaxin Zhu and Jun Wei. 2019. An empirical study of multiple names and email addresses in oss version control repositories. In2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE/ACM, Piscataway, NJ, USA, 409–420

work page 2019