pith. sign in

arxiv: 2606.20966 · v1 · pith:4S3T4JVJnew · submitted 2026-06-18 · 💻 cs.SE

Beyond the Grave: An Empirical Study of Dormancy and Revival in Scientific Open-Source Software

Pith reviewed 2026-06-26 15:57 UTC · model grok-4.3

classification 💻 cs.SE
keywords scientific open source softwaredormancyrevivalabandonmentinactivity thresholdslifecycle archetypescontributor continuitysoftware sustainability
0
0 comments X

The pith

A fixed inactivity threshold cannot reliably classify scientific open-source software as abandoned.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines dormancy and revival in scientific OSS by identifying 2,984 candidates from 18,247 repositories and manually coding a stratified sample of 750 under an adjudication protocol. It finds that shifting the inactivity cutoff changes the count of abandoned projects from 18,030 to 8,010, and that dormancy causes, revival mechanisms, and sustainability patterns vary widely. Among resolvable cases, feature or milestone freezes outnumber research completion by 5.4 to 1, while non-sustained recoveries outnumber sustained ones by 2.14 to 1, and 11.5 percent of revivals appear to be artifacts. Lifecycle archetype associates more strongly with sustainability than revival mechanism or work type. The authors conclude that gap duration, lifecycle archetype, and contributor continuity together discriminate better than any single threshold.

Core claim

The study shows that dormancy cause remains unresolvable from repository evidence for 52.5 percent of projects. Among the rest, feature or milestone freezes outnumber research-output completion by 5.4 to 1. Non-sustained recovery outnumbers sustained recovery by 2.14 to 1, and 11.5 percent of apparent revivals are bot-only or single-spike artifacts. Lifecycle archetype correlates more strongly with revival sustainability than either revival mechanism or work type on the structurally independent subset. Therefore a fixed inactivity threshold is insufficient, and gap duration, lifecycle archetype, and contributor continuity together supply more discriminating information.

What carries the argument

The rule-based classifier that maps coded repository evidence onto five dimensions: dormancy cause (T1), revival mechanism (T2), nature of revival work (T3), revival sustainability (T4), and lifecycle archetype (T5).

If this is right

  • Many projects counted as abandoned under common thresholds are actually dormant and may revive.
  • Sustainability of recovery depends more on the project's lifecycle archetype than on the specific revival trigger or work performed.
  • Contributor continuity supplies a stronger signal for lasting recovery than the type of revival work.
  • 11.5 percent of projects that appear revived are in fact bot-driven or single-spike artifacts.
  • Changing the inactivity cutoff produces large swings in the number of projects labeled abandoned.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Abandonment detectors for scientific software should combine gap length with archetype labels rather than rely on a single cutoff.
  • Funding and archival policies could use archetype information to decide whether to treat a project as dormant rather than terminated.
  • The same multi-dimensional coding approach could be applied to non-scientific OSS to test whether the same patterns appear outside the scientific domain.

Load-bearing premise

The manual coding of the stratified sample of 750 projects accurately reflects dormancy and revival characteristics across the full set of scientific OSS without significant selection or interpretation bias.

What would settle it

Re-coding an independent sample of projects from the same corpus and testing whether the reported association between lifecycle archetype and sustainability still holds at the same strength.

read the original abstract

Background. Inactivity thresholds classify scientific open-source software (OSS) as abandoned but cannot distinguish permanent abandonment from temporary dormancy; moving the cutoff from 1 to 36 months changes the abandoned count in the SciCat corpus from 18,030 to 8,010. Aims. We characterize dormancy causes, revival mechanisms, recovery durability, and lifecycle archetypes in dormant-revived scientific OSS. Method. From 18,247 SciCat repositories we identify 2,984 dormant-revived candidates and field-code a stratified sample of 750 projects with 75 analyst-coders under a two-phase adjudication protocol (post-adjudication kappa 0.779-0.857). A rule-based classifier produces five dimensions: dormancy cause (T1), revival mechanism (T2), nature of revival work (T3), revival sustainability (T4), and lifecycle archetype (T5). Results. Dormancy cause is unresolvable from repository evidence for 52.5% of projects; among resolvable cases, feature/milestone freeze outnumbers research-output completion 5.4:1. Non-sustained recovery outnumbers sustained 2.14:1; 11.5% of apparent revivals are bot-only or single-spike artifacts. Lifecycle archetype is more strongly associated with sustainability than revival mechanism or work type (medium effect on the structurally-independent subset). Conclusions. A fixed inactivity threshold is insufficient to reliably classify scientific OSS abandonment. Gap duration, lifecycle archetype, and contributor continuity together provide more discriminating information than any single threshold.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper claims that fixed inactivity thresholds are insufficient to classify scientific OSS abandonment, as shifting the cutoff from 1 to 36 months changes the abandoned count in SciCat from 18,030 to 8,010. From 18,247 repositories it identifies 2,984 dormant-revived candidates, codes a stratified sample of 750 projects via 75 coders under two-phase adjudication (kappa 0.779-0.857), and derives five dimensions (T1 dormancy cause, T2 revival mechanism, T3 work nature, T4 sustainability, T5 archetype). Among resolvable cases feature/milestone freeze dominates; non-sustained recoveries outnumber sustained 2.14:1; archetype associates more strongly with sustainability than other factors. It concludes that gap duration, archetype, and contributor continuity together discriminate better than any single threshold.

Significance. If the central empirical patterns hold after addressing sampling limitations, the work supplies concrete evidence on the prevalence of unresolvable dormancy cases (52.5%), revival artifacts (11.5%), and the relative strength of lifecycle archetype for predicting sustained recovery. The threshold-sensitivity result stands independently and directly challenges current practice in scientific OSS health assessment.

major comments (1)
  1. [Abstract / Results] Abstract and Results (T4/T5 associations): the claim that gap duration, lifecycle archetype (T5), and contributor continuity supply more discriminating information than any fixed threshold for distinguishing temporary dormancy from permanent abandonment is estimated only on the 2,984 revived candidates (and the 750 coded subset). No parallel coding or comparison exists for projects that experienced comparable gaps but never revived, so the superior-discrimination assertion rests on an incomplete contrast and cannot be directly supported by the reported associations (even the medium effect on the independent subset).
minor comments (2)
  1. [Methods] Methods: the rule-based classifier for T1–T5 is described at high level; an explicit validation step against the adjudicated sample (e.g., precision/recall per dimension) would strengthen reproducibility.
  2. [Results] The 52.5% unresolvable rate is acknowledged but its impact on the generalizability of the resolvable-case ratios (5.4:1, 2.14:1) could be quantified with sensitivity bounds.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for highlighting an important scope limitation in our discrimination analysis. We address the comment directly below and indicate where the manuscript will be revised for precision.

read point-by-point responses
  1. Referee: [Abstract / Results] Abstract and Results (T4/T5 associations): the claim that gap duration, lifecycle archetype (T5), and contributor continuity supply more discriminating information than any fixed threshold for distinguishing temporary dormancy from permanent abandonment is estimated only on the 2,984 revived candidates (and the 750 coded subset). No parallel coding or comparison exists for projects that experienced comparable gaps but never revived, so the superior-discrimination assertion rests on an incomplete contrast and cannot be directly supported by the reported associations (even the medium effect on the independent subset).

    Authors: We agree that the associations between gap duration, T5 archetype, contributor continuity and T4 sustainability are measured exclusively within the 2,984 revived candidates (and the coded subset). No matched sample of non-revived projects with comparable gaps was coded, so a direct head-to-head contrast between revived and permanently abandoned cases is not available. The manuscript's claim that these factors supply 'more discriminating information than any single threshold' therefore rests on (a) the independent threshold-sensitivity result (abandoned count falling from 18,030 to 8,010) and (b) the relative strength of archetype versus other predictors inside the revived set. We will revise the abstract and Results section to qualify the discrimination statement as applying to the prediction of sustained recovery among projects that have already shown revival activity, and to note the absence of a non-revived contrast group as a limitation. This is a partial revision. revision: partial

standing simulated objections not resolved
  • Direct empirical comparison of revived versus non-revived projects with matched gap lengths would require new sampling and coding outside the current study design.

Circularity Check

0 steps flagged

No circularity: purely observational empirical study with direct data coding and associations

full rationale

The paper performs no derivations, equations, parameter fitting, or model-based predictions. It identifies 2,984 candidates via explicit gap-and-revival criteria in repository data, manually codes a stratified sample of 750 under an adjudication protocol, applies a rule-based classifier to produce categorical dimensions (T1-T5), and reports observed frequencies plus associations (e.g., archetype with sustainability). All reported results are direct tabulations or statistical associations computed from the coded observations; none reduce by construction to the inputs via self-definition, renaming, or self-citation chains. The sample conditioning on revival is a methodological scope limitation affecting generalizability, not a circular reduction of the reported statistics to their own selection criteria. No load-bearing self-citations or ansatzes are invoked to justify the central empirical claims.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is an empirical study relying on human coding of repository data; the main assumptions are about the reliability of that coding process and the representativeness of the sample.

axioms (1)
  • domain assumption The coding protocol and adjudication process accurately capture the true dormancy causes and revival mechanisms from repository data.
    The study depends on the validity of the manual coding by 75 analysts under the two-phase protocol.

pith-pipeline@v0.9.1-grok · 5824 in / 1237 out tokens · 19551 ms · 2026-06-26T15:57:23.105561+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 13 canonical work pages

  1. [1]

    Onthe abandonment and survival of open source projects: An empirical investigation

    1 GuilhermeAvelino, EleniConstantinou, MarcoTulioValente, andAlexanderSerebrenik. Onthe abandonment and survival of open source projects: An empirical investigation. InProceedings of the 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 1–12. IEEE, 2019.doi:10.1109/ESEM.2019.8870181. 2 Fabio Calefato, Ma...

  2. [2]

    3 Gemma Catolino, Fabio Palomba, Andy Zaidman, and Filomena Ferrucci

    doi:10.1007/s10664-021-10012-6. 3 Gemma Catolino, Fabio Palomba, Andy Zaidman, and Filomena Ferrucci. Not all bugs are the same: Understanding, characterizing, and classifying the root cause of bugs.Journal of Systems and Software, 152:165–181, 2019.doi:10.1016/j.jss.2019.03.002. 4 Jailton Coelho and Marco Tulio Valente. Why modern open source projects fa...

  3. [3]

    6 Mehdi Golzadeh, Alexandre Decan, Damien Legay, and Tom Mens

    doi:10.1007/s11334-017-0287-0. 6 Mehdi Golzadeh, Alexandre Decan, Damien Legay, and Tom Mens. A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments.Journal of Systems and Software, 175:110911, 2021.doi:10.1016/j.jss.2021.110911. 3 https://anonymous.4open.science/r/ESEM2026ReplicationPackage-598E 20 Dormancy and...

  4. [4]

    Kelemen, S

    URL: https://api.semanticscholar.org/CorpusID:85459292, doi:10.1109/SoHeal.2019.00009. 10 Arne Nils Johanson and Wilhelm Hasselbring. Software engineering for computational science: Past, present, future.Computing in Science & Engineering, 20(2):90–109, 2018.doi:10.1109/ MCSE.2018.021651343. 11 Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Si...

  5. [5]

    The promises and perils of mining GitHub,

    doi:10.1145/2597073.2597074. 12 Klaus Krippendorff.Content Analysis: An Introduction to Its Methodology. Sage Publications, Thousand Oaks, CA, 2 edition,

  6. [6]

    Richard and Koch, Gary G

    13 J. Richard Landis and Gary G. Koch. The measurement of observer agreement for categorical data.Biometrics, 33(1):159–174, 1977.doi:10.2307/2529310. 14 Grischa Liebel and Shalini Chakraborty. Ethical issues in empirical studies using student subjects: Re-visiting practices and perceptions.Empirical Software Engineering, 26:40,

  7. [7]

    15 Addi Malviya Thakur, Reed Milewicz, Mahmoud Jahanshahi, Lavínia Paganini, Bogdan Vasilescu, and Audris Mockus

    doi:10.1007/s10664-021-09958-4. 15 Addi Malviya Thakur, Reed Milewicz, Mahmoud Jahanshahi, Lavínia Paganini, Bogdan Vasilescu, and Audris Mockus. Scientific open-source software is less likely to become abandoned than one might think! Lessons from curating a catalog of maintained scientific software. Proceedings of the ACM on Software Engineering, 2(FSE),...

  8. [8]

    doi:10.1109/ICSE55347.2025. 00004. 18 Samim Mirhosseini and Chris Parnin. Can automated pull requests encourage software developers to upgrade out-of-date dependencies? InProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 84–94. IEEE Press, 2017.doi:10.1109/ASE.2017.8115621. 19 Paul Ralph. Toward metho...

  9. [9]

    20 Paul Ralph

    doi:10.1109/ TSE.2018.2796554. 20 Paul Ralph. ACM SIGSOFT empirical standards released.ACM SIGSOFT Software Engi- neering Notes, 46(1):19,

  10. [10]

    org/EmpiricalStandards/; full author list (42 contributors) in arXiv:2010.03525

    Standards collection available at https://www2.sigsoft. org/EmpiricalStandards/; full author list (42 contributors) in arXiv:2010.03525. doi: 10.1145/3437479.3437483. 21 Johnny Saldaña.The Coding Manual for Qualitative Researchers. SAGE Publications, Thousand Oaks, CA, 4th edition,

  11. [11]

    22 Carolyn B. Seaman. Qualitative methods in empirical studies of software engineering.IEEE Transactions on Software Engineering, 25(4):557–572, 1999.doi:10.1109/32.799955. A. Malviya Thakur, B. Vasilescu, and A. Mockus 21 23 Igor Steinmacher, Marco Aurelio Graciotto Silva, Marco Aurelio Gerosa, and David F. Redmiles. A systematic literature review on the...

  12. [12]

    doi:10.1016/j.infsof.2014. 11.001. 24 Steve Stemler. An overview of content analysis.Practical Assessment, Research, and Evaluation, 7(1):17, 2001.doi:10.7275/z6fm-2e34. 25 Klaas-Jan Stol and Brian Fitzgerald. The ABC of software engineering research.ACM Transactions on Software Engineering and Methodology, 27(3):1–51,

  13. [13]

    28 Mairieli Wessel, Bruno Mendes de Souza, Igor Steinmacher, Igor Scaliante Wiese, Ivanilton Polato, Ana Paula Chaves, and Marco Aurelio Gerosa

    doi:10.1145/3757462. 28 Mairieli Wessel, Bruno Mendes de Souza, Igor Steinmacher, Igor Scaliante Wiese, Ivanilton Polato, Ana Paula Chaves, and Marco Aurelio Gerosa. The power of bots: Characterizing and understanding bots in OSS projects. InProceedings of the ACM on Human-Computer Interaction (CSCW), volume 2, pages 1–19, 2018.doi:10.1145/3274451. 29 Cla...

  14. [14]

    Springer, 2 edn

    doi:10.1007/978-3-642-29044-2. 30 Minghui Zhou and Audris Mockus. What make long term contributors: Willingness and opportunity in OSS community. InProceedings of the 34th International Conference on Software Engineering (ICSE), pages 518–528. IEEE, 2012.doi:10.1109/ICSE.2012.6227164. 31 Minghui Zhou and Audris Mockus. Who will stay in the FLOSS community...