pith. sign in

arxiv: 2605.01062 · v1 · submitted 2026-05-01 · 📊 stat.ME · math.ST· stat.TH

Single Change-Point Detection via Energy Distance with Application to Genomic Data

Pith reviewed 2026-05-09 18:34 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.TH
keywords change point detectionenergy distancenonparametric testpermutation testgenomic databinary segmentationasymptotic normality
0
0 comments X

The pith

Energy distance yields an asymptotically normal statistic for any fixed split, with the scan maximum calibrated by permutation to control type I error under exchangeability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a nonparametric procedure for single change-point detection in independent sequences by measuring the energy distance between segments on either side of a candidate split. It proves that the standardized statistic for any fixed split converges to a standard normal under the null of no change, and that the global scan statistic obtained by taking the maximum over splits has its critical value calibrated via permutation test to guarantee valid type I error control whenever the observations are exchangeable. Simulations establish that the procedure remains reliable across a range of error distributions, and the single-change method is combined with binary segmentation to locate multiple changes, with an application to breast-cancer CGH microarray data.

Core claim

The central discovery is a change-point test based on energy distance whose fixed-split version is asymptotically standard normal under the null and whose global version, the maximum over candidate splits, is calibrated by permutation to achieve exact type I error control under exchangeability, while exhibiting greater robustness to distributional misspecification than competing procedures.

What carries the argument

The standardized energy-distance statistic Z_{n,k} for each candidate split k, together with the scan statistic T_n = max |Z_{n,k}| whose critical values are obtained from a permutation distribution.

If this is right

  • The procedure applies without distributional assumptions to any sequence of independent observations.
  • Permutation calibration supplies finite-sample type I error control under the stated exchangeability condition.
  • Binary segmentation extends the single-change test to consistent estimation of multiple change points.
  • The asymptotic normality result supplies a basis for power calculations under fixed or local alternatives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The robustness across error distributions suggests the method is well suited to genomic copy-number data whose marginal laws are rarely known in advance.
  • The same energy-distance construction could be applied directly to multivariate or functional observations without further modification.

Load-bearing premise

The observations are independent and the entire sequence is exchangeable under the null hypothesis of no change point.

What would settle it

A Monte Carlo experiment in which independent but non-exchangeable data under the null produce permutation p-values whose distribution deviates systematically from uniform would show that the type I error control fails.

Figures

Figures reproduced from arXiv: 2605.01062 by Suthakaran Ratnasingam.

Figure 1
Figure 1. Figure 1: Power comparison for the normal distribution under a mean change view at source ↗
Figure 2
Figure 2. Figure 2: Power comparison for the normal distribution under a variance change view at source ↗
Figure 3
Figure 3. Figure 3: Power comparison for the normal distribution under simultaneous changes in mean and variance, with view at source ↗
Figure 4
Figure 4. Figure 4: Power comparison for the skew normal distribution under a mean change view at source ↗
Figure 5
Figure 5. Figure 5: Power comparison for the skew normal distribution under a variance change view at source ↗
Figure 6
Figure 6. Figure 6: Power comparison for the skew normal distribution under simultaneous changes in mean, variance, and view at source ↗
Figure 7
Figure 7. Figure 7: Power comparison for the exponential distribution under a variance parameter change view at source ↗
Figure 8
Figure 8. Figure 8: Array CGH profile of 10 chromosomes of breast cancer cell line MDA157 view at source ↗
read the original abstract

In this paper, we develop and analyze a nonparametric procedure for detecting a single change point in sequences of independent observations using energy distance. The asymptotic properties of the test statistic are derived under both null and alternative hypotheses. Under the null hypothesis, for any fixed candidate split point, the standardized statistic $\mathcal{Z}_{n,k}$ converges to a standard normal limit. For global detection, we use the scan statistic $T_n=\max_{k\in K_\eta}|\mathcal{Z}_{n,k}|$ and calibrate critical values using a permutation test, which yields valid type I error control under exchangeability. The simulation study shows that the proposed method demonstrates much better robustness across various error distributions. To handle multiple change points in practical applications, the method is combined with a binary segmentation approach. The breast cancer cell line (MDA157) from cDNA microarray CGH data is used to illustrate the detection and estimation capabilities of the proposed method for genomic sequences.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript develops a nonparametric procedure for single change-point detection in sequences of independent observations based on the energy distance. Under the null, the standardized statistic Z_{n,k} for any fixed candidate split point k is shown to converge to a standard normal limit. Global detection uses the scan statistic T_n = max_{k in K_η} |Z_{n,k}| with critical values obtained from a permutation test, which is claimed to yield valid type-I error control under exchangeability. Simulations indicate superior robustness across error distributions relative to competitors, and the method is extended to multiple change points via binary segmentation with an illustration on breast cancer CGH microarray data.

Significance. If the central claims hold, the work supplies a distribution-free change-point procedure grounded in U-statistic theory, with exact finite-sample type-I control via permutation and demonstrated robustness in simulations. The permutation calibration for the scan statistic is a clear strength because it sidesteps the need for extreme-value asymptotics on the maximum. The genomic application shows practical relevance. Credit is given for the standard CLT argument for the fixed-split statistic and for avoiding circularity in the calibration device.

minor comments (3)
  1. The abstract states that Z_{n,k} converges to a standard normal but does not display the explicit form of the standardization (i.e., the consistent variance estimator). This should be written out in the methods section with the corresponding equation number.
  2. The simulation study asserts 'much better robustness' across error distributions; reporting power or type-I error with standard errors (or at least the number of Monte Carlo replications) would make the quantitative comparisons more convincing.
  3. The range of the candidate set K_η (the η-trimmed interval of possible split points) is referenced but not defined explicitly; a short paragraph or equation in §2 would remove ambiguity for readers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript and the recommendation for minor revision. No specific major comments were provided in the report, so we have no points to address at this time. We appreciate the recognition of the distribution-free nature of the procedure, the exact type-I error control via permutation, and the practical illustration on genomic data. We remain available to incorporate any additional feedback if the referee has further points to raise.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claims rest on standard U-statistic CLT arguments for the fixed-k energy-distance statistic Z_{n,k} converging to N(0,1) under the null (independent observations) and on the permutation test's type-I control for the scan statistic T_n under exchangeability. Both follow from general theory external to the paper's equations and do not reduce by construction to any fitted parameter, self-definition, or self-citation chain within the manuscript. The derivation is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The procedure rests on standard nonparametric assumptions rather than new postulates; no free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption Observations are independent
    Explicitly stated for the sequences under study.
  • domain assumption Exchangeability under the null hypothesis
    Required for the permutation test to deliver valid type-I error control.

pith-pipeline@v0.9.0 · 5460 in / 1367 out tokens · 39480 ms · 2026-05-09T18:34:38.020658+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

  1. [1]

    Abraham, B., & Wei, W. (1984). Inferences about the parameters of a time series model with changing variance. Metrika,31, 183-–194

  2. [2]

    Bhattacharya, G., & Johnson, R. (1968). Non-parametric tests for shift at an unknown time point.Annals of Mathematical Statistics,39, 1731—1743

  3. [3]

    C., Horv ´ath, L., & Jacobs, P

    Boniece, B. C., Horv ´ath, L., & Jacobs, P. M. (2024). Change point detection in high dimensional datawith u-statistics.TEST,33, 400–452

  4. [4]

    E., & Darkhovsky, B

    Brodsky, B. E., & Darkhovsky, B. S. (1993). Nonparametric methods in change point problems.Kluwer Academic Publishers

  5. [5]

    Chen, H., & Friedman, J. H. (2017). A new graph-based two-sample test for multivariate and non-euclidean data. Journal of the American Statistical Association,112(517), 397–409

  6. [6]

    Chen, J., & Gupta, A. (1997). Testing and locating variance change points with application to stock prices.Journal of the American Statistical Association,92, 739—747

  7. [7]

    Chernoff, H., & Zacks, S. (1964). Estimating the current mean of a normal distribution which is subject to changes in time.Annals of Mathematical Statistics,35, 999—1018. Cs¨org¨o, M., & Horv´ath, L. (1997). Limit theorems in change-point analysis.New York: Wiley & Sons

  8. [8]

    Davis, W. (1979). Robust methods for detection of shifts of the innovation variance of a time series.Technometrics, 21, 313-–320. 23

  9. [9]

    Dehling, H., Vuk, K., & Wendler, M. (2022). Change-point detection based on weighted two-sample u-statistics. Electronic Journal of Statistics,16(1), 862–891

  10. [10]

    Frick, K., Munk, A., & Sieling, H. (2014). Multiscale change point inference.Journal of the Royal Statistical Society: Series B (Statistical Methodology),76(3), 495–580

  11. [11]

    Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection.The Annals of Statistics, 42(6), 2243–2281

  12. [12]

    Gardner, L. (1969). On detecting changes in the mean of normal variables.Annals of Mathematical Statistics,40, 116–126

  13. [13]

    Gomboy, E. (2001). U-statistics for change under alternatives.Journal of Multivariate Analysis,78, 139-158

  14. [14]

    Gomboy, E., & Horv ´ath, L. (1995). An application of u-statistics to change-point analysis.Acta Sci.Math. (Szeged),60, 345–357

  15. [15]

    M., Rasch, M

    Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch ¨olkopf, B., & Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research,13(1), 723–773

  16. [16]

    Gupta, A., & Chen, J. (1996). Detecting changes of mean in multidimensional normal sequences with application to literature and geology.Computational Statistics,11, 211-–221

  17. [17]

    Hawkins, D. M. (1977). Testing a sequence of observations for a shift in location.Journal of the American Statistical Association,72(357), 180–186

  18. [18]

    Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution.Annals of Mathematical Statistics,19(3), 293-325

  19. [19]

    Hsu, D. (1977). Tests for variance shifts at an unknown time point.Applied Statistics,26, No.3, 279-–284

  20. [20]

    Kirch, C., & Stoehr, C. (2021). Sequential change point tests based on u-statistics.Scandinavian Journal of Statistics,49(3), 1184-1214

  21. [21]

    Kirch, C., & Stoehr, C. (2022). Asymptotic delay times of sequential tests based on u-statistics for early and late change points.Journal of Statistical Planning and Inference,221, 114-135. Kov´acs, S., B ¨uhlmann, P., Li, H., & Munk, A. (2020). Seeded binary segmentation: a general methodology for fast and optimal change point detection.Biometrika,113(4)...

  22. [22]

    Li, S., Xie, Y ., Dai, H., & Song, L. (2019). M-statistic for kernel change-point detection.Advances in Neural Information Processing Systems,32

  23. [23]

    S., & James, N

    Matteson, D. S., & James, N. A. (2014). ecp: An R package for nonparametric multiple change point analysis of multivariate data.Journal of Statistical Software,57(10), 1–25

  24. [24]

    Pettitt, A. N. (1979). A nonparametric approach to the change-point problem.Journal of the Royal Statistical Society Series C,28(2), 126-135

  25. [25]

    Rizzo, M. L. (2009). New goodness-of-fit tests for pareto distributions.Astin Bull,39(2), 691-715

  26. [26]

    L., & Haman, J

    Rizzo, M. L., & Haman, J. (2016). Expected distances and goodness-of-fit for the asymmetric laplace distribution. Statisc. Probab. Lett.,117, 158-164

  27. [27]

    L., & Sz´ekely, G

    Rizzo, M. L., & Sz´ekely, G. J. (2016). Energy distance.WIREs Comput. Stat.,8(1), 27–38

  28. [28]

    S., & Worsley, K

    Srivastava, M. S., & Worsley, K. J. (1986). Likelihood ratio tests for a change in the multivariate mean.ournal of the American Statistical Association,81, 199–204. 24 Sz´ekely, G. J. (2000). E-statistics: Energy of statistical samples. Technical report.BGSU, Department of Mathe- matics and Statistics,03-05. Sz´ekely, G. J., & Rizzo, M. L. (2004). Testing...

  29. [29]

    Tibshirani, R., & Wang, P. (2008). Spatial smoothing and hot spot detection for cgh data using the fused lasso. Biostatistics,9, 18-29

  30. [30]

    Wegner, L., & Wendler, M. (2024). Robust change-point detection for functional time series based on u-statistics and dependent wild bootstrap.Statistical Papers,65, 4767–4810. 25