pith. machine review for the scientific record. sign in

arxiv: 2605.03574 · v1 · submitted 2026-05-05 · ⚛️ physics.soc-ph · cs.SE

Recognition: unknown

Long-Range Correlation in Code Commit Dynamics as a Novel Indicator of Software Product Stability: A Detrended Fluctuation Analysis Study

Goran Mitevski

Authors on Pith no claims yet

Pith reviewed 2026-05-07 04:14 UTC · model grok-4.3

classification ⚛️ physics.soc-ph cs.SE
keywords detrended fluctuation analysissoftware stabilitycode commitslong-range correlationsfractal scalingtime series analysissoftware engineeringversion control
0
0 comments X

The pith

The fractal scaling exponent alpha from Detrended Fluctuation Analysis on code commit time series indicates software product stability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether stable software development processes exhibit long-range temporal correlations in their commit patterns, as measured by the scaling exponent alpha from Detrended Fluctuation Analysis applied to sequences of lines of code added per commit. The authors compare two 712-day periods from the same project, one labeled stable and one unstable by the lead engineer using crash data. They find alpha around 0.70 for the stable period and 0.57 for the unstable one, both above the random baseline of 0.5, and note that the unstable period had far more commits yet weaker correlations. This suggests that the organization of development activity over time, rather than sheer volume, relates to product stability. A sympathetic reader would care because it offers a potential new metric for monitoring code health directly from version control data without relying on post-release crash reports.

Core claim

The paper claims that the scaling exponent alpha obtained from DFA on the unaggregated time series of lines of code added per commit event is higher in periods of software stability (alpha = 0.70) than in unstable periods (alpha = 0.57), supporting the idea that stable products emerge from development processes with long-range temporal correlations where each commit is influenced by patterns extending weeks or months into the past.

What carries the argument

Detrended Fluctuation Analysis (DFA) applied to the time series of lines of code added per commit, which computes the fractal scaling exponent alpha to quantify long-range correlations in the commit dynamics.

Load-bearing premise

The lead engineer's classification of the two periods as stable or unstable based on crash-analytics data accurately captures the true underlying product stability, and the periods differ primarily in their commit correlation structure.

What would settle it

Observing a project period labeled stable by crash data but yielding alpha below 0.6, or an unstable period with alpha above 0.7, while keeping other factors similar.

Figures

Figures reproduced from arXiv: 2605.03574 by Goran Mitevski.

Figure 1
Figure 1. Figure 1: DFA log-log plots for stable and unstable periods. The strong overall linearity suggests view at source ↗
Figure 2
Figure 2. Figure 2: Illustrative example of a definitive crossover, identifying the intersection of two distinct view at source ↗
read the original abstract

This work proposes the fractal scaling exponent alpha, estimated via Detrended Fluctuation Analysis (DFA) on the unaggregated time series of lines of code added per commit event in a software repository, as a novel process-level indicator of software product stability. The proposal rests on the hypothesis that stable software products arise from development processes characterised by long-range temporal correlations in commit behaviour: each code addition is shaped not only by the immediately preceding commits but by patterns extending weeks or months into the past and anticipating work to be done in the future. This hypothesis is tested on two non-overlapping 712-day time series of lines of code added per commit event, drawn from a closed-source software organisation and labeled as stable and unstable by the lead engineer on the basis of crash-analytics data. Applied to these series, DFA yields alpha = 0.70 (n_min = 16) for the stable period and alpha = 0.57 for the unstable period, with all estimates substantially above the shuffled-surrogate baseline (alpha ~= 0.50 +/- 0.01). Results are robust to three parameterisations (n_min in {4, 16, 48}) and validated against 1,000 surrogate time series per condition. Remarkably, the unstable period generated 3.2 times more commit events than the stable period, yet exhibited lower long-range memory, demonstrating that commit volume alone does not predict stability, and that the temporal organisation of development activity is the key variable. This result can be situated in the broader literature on fractality in human creative production, discuss methodological limitations, and outline a research programme for deploying alpha as a continuous code-health indicator in version-control pipelines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper proposes the DFA scaling exponent alpha computed on the unaggregated per-commit time series of lines of code added as a novel process-level indicator of software product stability. It hypothesizes that stable products arise from development processes with long-range temporal correlations in commit activity. The claim is tested on two non-overlapping 712-day periods from a single closed-source organization, labeled stable and unstable by the lead engineer using crash-analytics data. DFA yields alpha = 0.70 (n_min=16) for the stable period and alpha = 0.57 for the unstable period, both well above the shuffled-surrogate baseline (~0.50). Results are reported as robust across n_min in {4,16,48} and 1000 surrogates per series; the unstable period produced 3.2 times more commits yet lower alpha.

Significance. If the result holds after addressing design limitations, the work would extend fractal scaling analyses of human creative production to software engineering and supply a quantitative, commit-stream metric for code health that could be integrated into version-control pipelines. The manuscript earns credit for concrete numerical results, explicit surrogate baselines, and systematic robustness checks to n_min and 1000 surrogates. These elements provide a reproducible starting point for larger-scale validation even though the current evidence base remains narrow.

major comments (3)
  1. [§2 (Data Labeling)] §2 (Data Labeling): Stability labels for the two 712-day periods are assigned by a single lead engineer on the basis of post-hoc crash-analytics review. No inter-rater reliability, blinding, or correlation with independent objective metrics (bug density, test coverage, user-reported issues) is reported. Because the alpha difference (0.70 vs 0.57) is interpreted as evidence that long-range correlations indicate stability, this subjective labeling is load-bearing and requires external validation or multiple raters to isolate the hypothesized effect from rater-specific bias.
  2. [§3 (Time-Series Construction and DFA)] §3 (Time-Series Construction and DFA): DFA is performed on the commit-indexed (event-based) sequence rather than calendar-time binned data. The unstable period contains 3.2 times more commit events than the stable period, so the same n_min windows span different numbers of commits and different elapsed calendar times across conditions. No adjustment for commit-rate differences or supplementary time-binned analysis is described; this confound directly affects the interpretation of 'long-range' correlations and must be addressed to support the central claim.
  3. [§5 (Discussion)] §5 (Discussion): The study comprises only two non-overlapping periods from one closed-source organization. Surrogate tests rule out white-noise structure but do not address alternative explanations such as differences in team size, project phase, feature scope, or external events. With n=2, the observed alpha contrast is consistent with many confounds; the manuscript should either expand the sample or explicitly bound the generalizability of alpha as a stability indicator.
minor comments (3)
  1. [Abstract] Abstract: The adverb 'remarkably' in the sentence on commit volume is interpretive; replace with a neutral statement of the observed ratio.
  2. [Figure 1] Figure 1 (log-log DFA plots): Add explicit slope annotations, n_min markers, and surrogate confidence bands directly on the panels for immediate readability.
  3. [References] References: Confirm that the foundational DFA reference (Peng et al., 1994) and at least one recent software-engineering application of scaling methods are cited.

Simulated Author's Rebuttal

3 responses · 2 unresolved

We thank the referee for their detailed and constructive review of our manuscript. We have addressed each of the major comments point by point below, indicating the changes we will implement in the revised version.

read point-by-point responses
  1. Referee: §2 (Data Labeling): Stability labels for the two 712-day periods are assigned by a single lead engineer on the basis of post-hoc crash-analytics review. No inter-rater reliability, blinding, or correlation with independent objective metrics (bug density, test coverage, user-reported issues) is reported. Because the alpha difference (0.70 vs 0.57) is interpreted as evidence that long-range correlations indicate stability, this subjective labeling is load-bearing and requires external validation or multiple raters to isolate the hypothesized effect from rater-specific bias.

    Authors: We acknowledge that the stability classification relies on the judgment of a single lead engineer, even though it is informed by objective crash-analytics data. This is indeed a limitation of the current study. In the revised manuscript, we will provide more details on the labeling procedure in §2 and add a paragraph in §5 (Discussion) that explicitly discusses the potential for rater bias and the need for future validation using multiple raters or additional metrics such as bug density and test coverage. We cannot, however, retroactively introduce inter-rater reliability for these historical periods. revision: partial

  2. Referee: §3 (Time-Series Construction and DFA): DFA is performed on the commit-indexed (event-based) sequence rather than calendar-time binned data. The unstable period contains 3.2 times more commit events than the stable period, so the same n_min windows span different numbers of commits and different elapsed calendar times across conditions. No adjustment for commit-rate differences or supplementary time-binned analysis is described; this confound directly affects the interpretation of 'long-range' correlations and must be addressed to support the central claim.

    Authors: The event-based construction of the time series is central to our hypothesis, which focuses on the sequential dependencies between successive commits rather than on fixed calendar intervals. Nevertheless, we agree that the difference in commit rates introduces a potential confound in interpreting the scale of the correlations. In the revised version, we will add a supplementary analysis in which the series are aggregated into daily bins of total lines added and DFA is reapplied to these calendar-time series. We will also report the average calendar duration corresponding to each n_min window size in both periods to allow direct comparison. This additional analysis will be included to address the referee's concern. revision: yes

  3. Referee: §5 (Discussion): The study comprises only two non-overlapping periods from one closed-source organization. Surrogate tests rule out white-noise structure but do not address alternative explanations such as differences in team size, project phase, feature scope, or external events. With n=2, the observed alpha contrast is consistent with many confounds; the manuscript should either expand the sample or explicitly bound the generalizability of alpha as a stability indicator.

    Authors: We agree that the small sample size (two periods from a single organization) limits the ability to rule out alternative explanations. In the revised Discussion section, we will explicitly state that while the results are consistent with long-range correlations being associated with stability, the alpha difference could also arise from variations in team composition, project phase, or external factors. We will bound the generalizability of our findings accordingly and outline a programme for larger-scale, multi-organization validation studies. This revision will make the scope of the claims clearer without altering the reported results. revision: yes

standing simulated objections not resolved
  • Unable to obtain inter-rater reliability for the stability labels due to the closed-source nature of the project and the unique expertise of the single lead engineer involved in the labeling process.
  • Unable to expand the sample to additional periods or organizations because of data access limitations in this closed-source setting.

Circularity Check

0 steps flagged

No significant circularity; empirical DFA application is self-contained

full rationale

The paper's central claim rests on applying the standard Detrended Fluctuation Analysis (DFA) procedure to two independently labeled commit time series (stable vs. unstable periods assigned by lead engineer from crash-analytics data, not from alpha or any model fit). Alpha is computed directly from the observed per-commit LOC-added series via the established DFA algorithm and compared against shuffled surrogates (which yield the expected ~0.5 baseline). No equations, fitted parameters, or self-citations are used to define or derive the target quantity; the result is an empirical contrast that does not reduce by construction to the inputs. The derivation chain relies on external DFA methodology and independent labeling, making it self-contained against benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that long-range temporal correlations in commit behavior causally contribute to product stability, plus the standard mathematical assumptions underlying DFA for non-stationary series. No new entities are postulated. The only explicit free parameter is the choice of n_min window size, tested at three discrete values.

free parameters (1)
  • n_min
    Minimum segment length for DFA fluctuation calculation; tested at 4, 16, and 48 with results reported for n_min=16 as primary.
axioms (2)
  • standard math Detrended Fluctuation Analysis correctly quantifies long-range correlations in non-stationary time series when applied to the given parameterization.
    Invoked implicitly when interpreting alpha > 0.5 as evidence of long-range memory in commit dynamics.
  • domain assumption The lead engineer's crash-analytics-based labeling accurately partitions the two periods into stable and unstable product states.
    Required for the alpha contrast to be interpreted as evidence for the stability indicator.

pith-pipeline@v0.9.0 · 5615 in / 1850 out tokens · 68385 ms · 2026-05-07T04:14:11.492215+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 24 canonical work pages

  1. [1]

    Barabási, A.-L. (2005). The origin of bursts and heavy tails in human dynamics. Nature, 435(7039). https://doi.org/10.1038/nature03459

  2. [2]

    Bashan, A., Bartsch, R., Kantelhardt, J.\,W., & Havlin, S. (2008). Comparison of detrending methods for fluctuation analysis. Physica A, 387(21), 5080--5090. https://doi.org/10.1016/j.physa.2008.04.023

  3. [3]

    Batey, M. (2012). The measurement of creativity: From definitional consensus to the introduction of a new heuristic framework. Creativity Research Journal, 24(1), 55--65. https://doi.org/10.1080/10400419.2012.649181

  4. [4]

    Bhan, J., Kim, S., Kim, J., Kwon, Y., Yang, S., & Lee, K. (2006). Long-range correlations in Korean literary corpora. Chaos, Solitons & Fractals, 29(1), 69--81. https://doi.org/10.1016/j.chaos.2005.08.214

  5. [5]

    Boden, M.\,A. (2007). Creativity in a nutshell. Think, 5(15), 83--96. https://doi.org/10.1017/S147717560000230X

  6. [6]

    Butner, J., Pasupathi, M., & Vallejos, V. (2008). When the facts just don't add up: The fractal nature of conversational stories. Social Cognition, 26, 670--699. https://doi.org/10.1521/soco.2008.26.6.670

  7. [7]

    D., & Dengate, G

    Couger, J. D., & Dengate, G. (1996). Measurement of creativity of IS products. Creativity and Innovation Management, 5(4), 262--272. https://doi.org/10.1111/j.1467-8691.1996.tb00152.x

  8. [8]

    Drożdż, S., Oświęcimka, P., Kulig, A., Kwapień, J., Bazarnik, K., Grabska-Gradzińska, I., Rybicki, J., & Stanuszek, M. (2016). Quantifying origin and character of long-range correlations in narrative texts. Information Sciences, 331, 32--44. https://doi.org/10.1016/j.ins.2015.10.023

  9. [9]

    Feldman, D.\,P. (2012). Chaos and Fractals: An Elementary Introduction. Oxford University Press

  10. [10]

    Gilden, D.\,L., Thornton, T., & Mallon, M.\,W. (1995). 1/f noise in human cognition. Science, 267(5205), 1837--1839. https://doi.org/10.1126/science.7892611

  11. [11]

    Goldberger, A.\,L., Amaral, L.\,A.\,N., Glass, L., Hausdorff, J.\,M., Ivanov, P.\,C., Mark, R.\,G., Mietus, J.\,E., Moody, G.\,B., Peng, C.-K., & Stanley, H.\,E. (2000). PhysioBank, PhysioToolkit, and PhysioNet. Circulation, 101(23), e215--e220. https://doi.org/10.1161/01.CIR.101.23.e215

  12. [12]

    Guastello, S.\,J. (1998). Creative problem solving groups at the edge of chaos. The Journal of Creative Behavior, 32(1), 38--57. https://doi.org/10.1002/j.2162-6057.1998.tb00805.x

  13. [13]

    C., Chen, Z., Carpena, P., and Stanley, H

    Hu, K., Ivanov, P. C., Chen, Z., Carpena, P., and Stanley, H. E. (2001). Effect of trends on detrended fluctuation analysis. Physical Review E, 64(1), 011114. https://doi.org/10.1103/PhysRevE.64.011114

  14. [14]

    Mandelbrot, B.\,B. (1983). The Fractal Geometry of Nature. Freeman

  15. [15]

    Marmelat, V., & Delignières, D. (2012). Strong anticipation: Complexity matching in interpersonal coordination. Experimental Brain Research, 222, 137--148. https://doi.org/10.1007/s00221-012-3202-9

  16. [16]

    Mitevski, G., & Efremova, M. (2026). Commit-level time series for stable and unstable software periods [Data set]. Zenodo. https://doi.org/10.5281/zenodo.19986248

  17. [17]

    Moulder, R.\,G., Boker, S.\,M., Ramseyer, F., & Tschacher, W. (2018). Determining synchrony between behavioral time series. Psychological Methods, 23, 757--773. https://doi.org/10.1037/met0000172

  18. [18]

    Nelson, C., Brummel, B., Grove, D.\,F., Jorgenson, N., Sen, S., & Gamble, R.\,C. (2010). Measuring creativity in software development. Proceedings of ICCC-10, 205--214

  19. [19]

    and Ishibashi, S

    Ogata, H., Tokuyama, K., Nagasaka, S., Ando, A., Kusaka, I., Sato, A., ... and Ishibashi, S. (2006). Long-range negative correlation of glucose dynamics in humans and its breakdown in diabetes mellitus. American Journal of Physiology-Regulatory, Integrative and Comparative Physiology, 291(6), R1638--R1643. https://doi.org/10.1152/ajpregu.00241.2006

  20. [20]

    Paulson, J., Succi, G., & Eberlein, A. (2004). An empirical study of open-source and closed-source software products. IEEE Transactions on Software Engineering, 30, 246--256. https://doi.org/10.1109/TSE.2004.1274044

  21. [21]

    Peng, C.-K., Buldyrev, S.\,V., Havlin, S., Simons, M., Stanley, H.\,E., & Goldberger, A.\,L. (1994). Mosaic organization of DNA nucleotides. Physical Review E, 49(2), 1685--1689. https://doi.org/10.1103/PhysRevE.49.1685

  22. [22]

    E., and Goldberger, A

    Peng, C.-K., Havlin, S., Stanley, H. E., and Goldberger, A. L. (1995). Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series. Chaos, 5(1), 82--87. https://doi.org/10.1063/1.166141

  23. [23]

    Riley, M.\,A., & Turvey, M.\,T. (2002). Variability and determinism in motor behavior. Journal of Motor Behavior, 34(2), 99--125. https://doi.org/10.1080/00222890209601934

  24. [24]

    Theiler, J., Eubank, S., Longtin, A., Galdrikian, B., & Farmer, J.\,D. (1992). Testing for nonlinearity in time series: The method of surrogate data. Physica D, 58(1), 77--94. https://doi.org/10.1016/0167-2789(92)90102-S

  25. [25]

    Van Orden, G.\,C., Holden, J.\,G., & Turvey, M.\,T. (2003). Self-organization of cognitive performance. Journal of Experimental Psychology: General, 132(3), 331--350. https://doi.org/10.1037/0096-3445.132.3.331

  26. [26]

    Varela, M., Vigil, L., Rodriguez, C., Vargas, B., and García-Carretero, R. (2016). Delay in the detrended fluctuation analysis crossover point as a risk factor for type 2 diabetes mellitus. Journal of Diabetes Research, 2016, Article ID 9361958. http://dx.doi.org/10.1155/2016/9361958

  27. [27]

    Yu, M., Zhou, R., Cai, Z., Tan, C.-W., & Wang, H. (2020). Unravelling the relationship between response time and user experience in mobile applications. Internet Research, 30(5), 1353--1382. https://doi.org/10.1108/INTR-05-2019-0223