Single Change-Point Detection via Energy Distance with Application to Genomic Data

Suthakaran Ratnasingam

arxiv: 2605.01062 · v1 · submitted 2026-05-01 · 📊 stat.ME · math.ST· stat.TH

Single Change-Point Detection via Energy Distance with Application to Genomic Data

Suthakaran Ratnasingam This is my paper

Pith reviewed 2026-05-09 18:34 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.TH

keywords change point detectionenergy distancenonparametric testpermutation testgenomic databinary segmentationasymptotic normality

0 comments

The pith

Energy distance yields an asymptotically normal statistic for any fixed split, with the scan maximum calibrated by permutation to control type I error under exchangeability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a nonparametric procedure for single change-point detection in independent sequences by measuring the energy distance between segments on either side of a candidate split. It proves that the standardized statistic for any fixed split converges to a standard normal under the null of no change, and that the global scan statistic obtained by taking the maximum over splits has its critical value calibrated via permutation test to guarantee valid type I error control whenever the observations are exchangeable. Simulations establish that the procedure remains reliable across a range of error distributions, and the single-change method is combined with binary segmentation to locate multiple changes, with an application to breast-cancer CGH microarray data.

Core claim

The central discovery is a change-point test based on energy distance whose fixed-split version is asymptotically standard normal under the null and whose global version, the maximum over candidate splits, is calibrated by permutation to achieve exact type I error control under exchangeability, while exhibiting greater robustness to distributional misspecification than competing procedures.

What carries the argument

The standardized energy-distance statistic Z_{n,k} for each candidate split k, together with the scan statistic T_n = max |Z_{n,k}| whose critical values are obtained from a permutation distribution.

If this is right

The procedure applies without distributional assumptions to any sequence of independent observations.
Permutation calibration supplies finite-sample type I error control under the stated exchangeability condition.
Binary segmentation extends the single-change test to consistent estimation of multiple change points.
The asymptotic normality result supplies a basis for power calculations under fixed or local alternatives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The robustness across error distributions suggests the method is well suited to genomic copy-number data whose marginal laws are rarely known in advance.
The same energy-distance construction could be applied directly to multivariate or functional observations without further modification.

Load-bearing premise

The observations are independent and the entire sequence is exchangeable under the null hypothesis of no change point.

What would settle it

A Monte Carlo experiment in which independent but non-exchangeable data under the null produce permutation p-values whose distribution deviates systematically from uniform would show that the type I error control fails.

Figures

Figures reproduced from arXiv: 2605.01062 by Suthakaran Ratnasingam.

**Figure 1.** Figure 1: Power comparison for the normal distribution under a mean change view at source ↗

**Figure 2.** Figure 2: Power comparison for the normal distribution under a variance change view at source ↗

**Figure 3.** Figure 3: Power comparison for the normal distribution under simultaneous changes in mean and variance, with view at source ↗

**Figure 4.** Figure 4: Power comparison for the skew normal distribution under a mean change view at source ↗

**Figure 5.** Figure 5: Power comparison for the skew normal distribution under a variance change view at source ↗

**Figure 6.** Figure 6: Power comparison for the skew normal distribution under simultaneous changes in mean, variance, and view at source ↗

**Figure 7.** Figure 7: Power comparison for the exponential distribution under a variance parameter change view at source ↗

**Figure 8.** Figure 8: Array CGH profile of 10 chromosomes of breast cancer cell line MDA157 view at source ↗

read the original abstract

In this paper, we develop and analyze a nonparametric procedure for detecting a single change point in sequences of independent observations using energy distance. The asymptotic properties of the test statistic are derived under both null and alternative hypotheses. Under the null hypothesis, for any fixed candidate split point, the standardized statistic $\mathcal{Z}_{n,k}$ converges to a standard normal limit. For global detection, we use the scan statistic $T_n=\max_{k\in K_\eta}|\mathcal{Z}_{n,k}|$ and calibrate critical values using a permutation test, which yields valid type I error control under exchangeability. The simulation study shows that the proposed method demonstrates much better robustness across various error distributions. To handle multiple change points in practical applications, the method is combined with a binary segmentation approach. The breast cancer cell line (MDA157) from cDNA microarray CGH data is used to illustrate the detection and estimation capabilities of the proposed method for genomic sequences.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a practical nonparametric single-change detector via energy distance plus permutation calibration that shows good robustness in non-normal simulations and applies it to genomic CGH data.

read the letter

This paper develops a nonparametric single change-point detection method that relies on the energy distance between segments on either side of a candidate point. They standardize the energy-distance statistic for each possible split, scan for the maximum, and use a permutation test to set critical values, which gives type I error control under the null of exchangeability. Asymptotics show normality for any fixed split under independence, and they extend the approach to multiple changes with binary segmentation before showing results on cDNA microarray data from a breast cancer cell line. The work does well by focusing on robustness: the simulations indicate the method holds up better than competitors when errors are non-normal, which is common in genomic sequences. The permutation approach is a clean way to avoid having to derive the null distribution of the max statistic analytically. Soft spots are limited. The idea is a synthesis of existing tools—energy distance is a known two-sample measure and permutation scans are standard—so the advance is mainly in the combination and the genomic application rather than a new theoretical framework. The abstract is light on derivation steps and simulation specifics, so the full paper needs to deliver clear variance formulas and a reproducible protocol to back the robustness claims. One data set is fine for illustration but not enough to establish broad utility. This is for methodologists in statistics or bioinformatics who want a simple, assumption-light tool for change points. A reader needing to analyze sequences without normality assumptions could adopt the procedure after verifying the code. It deserves a serious referee because the claims are standard and plausible, with practical relevance. I would send it to peer review.

Referee Report

0 major / 3 minor

Summary. The manuscript develops a nonparametric procedure for single change-point detection in sequences of independent observations based on the energy distance. Under the null, the standardized statistic Z_{n,k} for any fixed candidate split point k is shown to converge to a standard normal limit. Global detection uses the scan statistic T_n = max_{k in K_η} |Z_{n,k}| with critical values obtained from a permutation test, which is claimed to yield valid type-I error control under exchangeability. Simulations indicate superior robustness across error distributions relative to competitors, and the method is extended to multiple change points via binary segmentation with an illustration on breast cancer CGH microarray data.

Significance. If the central claims hold, the work supplies a distribution-free change-point procedure grounded in U-statistic theory, with exact finite-sample type-I control via permutation and demonstrated robustness in simulations. The permutation calibration for the scan statistic is a clear strength because it sidesteps the need for extreme-value asymptotics on the maximum. The genomic application shows practical relevance. Credit is given for the standard CLT argument for the fixed-split statistic and for avoiding circularity in the calibration device.

minor comments (3)

The abstract states that Z_{n,k} converges to a standard normal but does not display the explicit form of the standardization (i.e., the consistent variance estimator). This should be written out in the methods section with the corresponding equation number.
The simulation study asserts 'much better robustness' across error distributions; reporting power or type-I error with standard errors (or at least the number of Monte Carlo replications) would make the quantitative comparisons more convincing.
The range of the candidate set K_η (the η-trimmed interval of possible split points) is referenced but not defined explicitly; a short paragraph or equation in §2 would remove ambiguity for readers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript and the recommendation for minor revision. No specific major comments were provided in the report, so we have no points to address at this time. We appreciate the recognition of the distribution-free nature of the procedure, the exact type-I error control via permutation, and the practical illustration on genomic data. We remain available to incorporate any additional feedback if the referee has further points to raise.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claims rest on standard U-statistic CLT arguments for the fixed-k energy-distance statistic Z_{n,k} converging to N(0,1) under the null (independent observations) and on the permutation test's type-I control for the scan statistic T_n under exchangeability. Both follow from general theory external to the paper's equations and do not reduce by construction to any fitted parameter, self-definition, or self-citation chain within the manuscript. The derivation is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The procedure rests on standard nonparametric assumptions rather than new postulates; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Observations are independent
Explicitly stated for the sequences under study.
domain assumption Exchangeability under the null hypothesis
Required for the permutation test to deliver valid type-I error control.

pith-pipeline@v0.9.0 · 5460 in / 1367 out tokens · 39480 ms · 2026-05-09T18:34:38.020658+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

Abraham, B., & Wei, W. (1984). Inferences about the parameters of a time series model with changing variance. Metrika,31, 183-–194

work page 1984
[2]

Bhattacharya, G., & Johnson, R. (1968). Non-parametric tests for shift at an unknown time point.Annals of Mathematical Statistics,39, 1731—1743

work page 1968
[3]

C., Horv ´ath, L., & Jacobs, P

Boniece, B. C., Horv ´ath, L., & Jacobs, P. M. (2024). Change point detection in high dimensional datawith u-statistics.TEST,33, 400–452

work page 2024
[4]

E., & Darkhovsky, B

Brodsky, B. E., & Darkhovsky, B. S. (1993). Nonparametric methods in change point problems.Kluwer Academic Publishers

work page 1993
[5]

Chen, H., & Friedman, J. H. (2017). A new graph-based two-sample test for multivariate and non-euclidean data. Journal of the American Statistical Association,112(517), 397–409

work page 2017
[6]

Chen, J., & Gupta, A. (1997). Testing and locating variance change points with application to stock prices.Journal of the American Statistical Association,92, 739—747

work page 1997
[7]

Chernoff, H., & Zacks, S. (1964). Estimating the current mean of a normal distribution which is subject to changes in time.Annals of Mathematical Statistics,35, 999—1018. Cs¨org¨o, M., & Horv´ath, L. (1997). Limit theorems in change-point analysis.New York: Wiley & Sons

work page 1964
[8]

Davis, W. (1979). Robust methods for detection of shifts of the innovation variance of a time series.Technometrics, 21, 313-–320. 23

work page 1979
[9]

Dehling, H., Vuk, K., & Wendler, M. (2022). Change-point detection based on weighted two-sample u-statistics. Electronic Journal of Statistics,16(1), 862–891

work page 2022
[10]

Frick, K., Munk, A., & Sieling, H. (2014). Multiscale change point inference.Journal of the Royal Statistical Society: Series B (Statistical Methodology),76(3), 495–580

work page 2014
[11]

Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection.The Annals of Statistics, 42(6), 2243–2281

work page 2014
[12]

Gardner, L. (1969). On detecting changes in the mean of normal variables.Annals of Mathematical Statistics,40, 116–126

work page 1969
[13]

Gomboy, E. (2001). U-statistics for change under alternatives.Journal of Multivariate Analysis,78, 139-158

work page 2001
[14]

Gomboy, E., & Horv ´ath, L. (1995). An application of u-statistics to change-point analysis.Acta Sci.Math. (Szeged),60, 345–357

work page 1995
[15]

M., Rasch, M

Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch ¨olkopf, B., & Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research,13(1), 723–773

work page 2012
[16]

Gupta, A., & Chen, J. (1996). Detecting changes of mean in multidimensional normal sequences with application to literature and geology.Computational Statistics,11, 211-–221

work page 1996
[17]

Hawkins, D. M. (1977). Testing a sequence of observations for a shift in location.Journal of the American Statistical Association,72(357), 180–186

work page 1977
[18]

Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution.Annals of Mathematical Statistics,19(3), 293-325

work page 1948
[19]

Hsu, D. (1977). Tests for variance shifts at an unknown time point.Applied Statistics,26, No.3, 279-–284

work page 1977
[20]

Kirch, C., & Stoehr, C. (2021). Sequential change point tests based on u-statistics.Scandinavian Journal of Statistics,49(3), 1184-1214

work page 2021
[21]

Kirch, C., & Stoehr, C. (2022). Asymptotic delay times of sequential tests based on u-statistics for early and late change points.Journal of Statistical Planning and Inference,221, 114-135. Kov´acs, S., B ¨uhlmann, P., Li, H., & Munk, A. (2020). Seeded binary segmentation: a general methodology for fast and optimal change point detection.Biometrika,113(4)...

work page 2022
[22]

Li, S., Xie, Y ., Dai, H., & Song, L. (2019). M-statistic for kernel change-point detection.Advances in Neural Information Processing Systems,32

work page 2019
[23]

S., & James, N

Matteson, D. S., & James, N. A. (2014). ecp: An R package for nonparametric multiple change point analysis of multivariate data.Journal of Statistical Software,57(10), 1–25

work page 2014
[24]

Pettitt, A. N. (1979). A nonparametric approach to the change-point problem.Journal of the Royal Statistical Society Series C,28(2), 126-135

work page 1979
[25]

Rizzo, M. L. (2009). New goodness-of-fit tests for pareto distributions.Astin Bull,39(2), 691-715

work page 2009
[26]

L., & Haman, J

Rizzo, M. L., & Haman, J. (2016). Expected distances and goodness-of-fit for the asymmetric laplace distribution. Statisc. Probab. Lett.,117, 158-164

work page 2016
[27]

L., & Sz´ekely, G

Rizzo, M. L., & Sz´ekely, G. J. (2016). Energy distance.WIREs Comput. Stat.,8(1), 27–38

work page 2016
[28]

S., & Worsley, K

Srivastava, M. S., & Worsley, K. J. (1986). Likelihood ratio tests for a change in the multivariate mean.ournal of the American Statistical Association,81, 199–204. 24 Sz´ekely, G. J. (2000). E-statistics: Energy of statistical samples. Technical report.BGSU, Department of Mathe- matics and Statistics,03-05. Sz´ekely, G. J., & Rizzo, M. L. (2004). Testing...

work page 1986
[29]

Tibshirani, R., & Wang, P. (2008). Spatial smoothing and hot spot detection for cgh data using the fused lasso. Biostatistics,9, 18-29

work page 2008
[30]

Wegner, L., & Wendler, M. (2024). Robust change-point detection for functional time series based on u-statistics and dependent wild bootstrap.Statistical Papers,65, 4767–4810. 25

work page 2024

[1] [1]

Abraham, B., & Wei, W. (1984). Inferences about the parameters of a time series model with changing variance. Metrika,31, 183-–194

work page 1984

[2] [2]

Bhattacharya, G., & Johnson, R. (1968). Non-parametric tests for shift at an unknown time point.Annals of Mathematical Statistics,39, 1731—1743

work page 1968

[3] [3]

C., Horv ´ath, L., & Jacobs, P

Boniece, B. C., Horv ´ath, L., & Jacobs, P. M. (2024). Change point detection in high dimensional datawith u-statistics.TEST,33, 400–452

work page 2024

[4] [4]

E., & Darkhovsky, B

Brodsky, B. E., & Darkhovsky, B. S. (1993). Nonparametric methods in change point problems.Kluwer Academic Publishers

work page 1993

[5] [5]

Chen, H., & Friedman, J. H. (2017). A new graph-based two-sample test for multivariate and non-euclidean data. Journal of the American Statistical Association,112(517), 397–409

work page 2017

[6] [6]

Chen, J., & Gupta, A. (1997). Testing and locating variance change points with application to stock prices.Journal of the American Statistical Association,92, 739—747

work page 1997

[7] [7]

Chernoff, H., & Zacks, S. (1964). Estimating the current mean of a normal distribution which is subject to changes in time.Annals of Mathematical Statistics,35, 999—1018. Cs¨org¨o, M., & Horv´ath, L. (1997). Limit theorems in change-point analysis.New York: Wiley & Sons

work page 1964

[8] [8]

Davis, W. (1979). Robust methods for detection of shifts of the innovation variance of a time series.Technometrics, 21, 313-–320. 23

work page 1979

[9] [9]

Dehling, H., Vuk, K., & Wendler, M. (2022). Change-point detection based on weighted two-sample u-statistics. Electronic Journal of Statistics,16(1), 862–891

work page 2022

[10] [10]

Frick, K., Munk, A., & Sieling, H. (2014). Multiscale change point inference.Journal of the Royal Statistical Society: Series B (Statistical Methodology),76(3), 495–580

work page 2014

[11] [11]

Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection.The Annals of Statistics, 42(6), 2243–2281

work page 2014

[12] [12]

Gardner, L. (1969). On detecting changes in the mean of normal variables.Annals of Mathematical Statistics,40, 116–126

work page 1969

[13] [13]

Gomboy, E. (2001). U-statistics for change under alternatives.Journal of Multivariate Analysis,78, 139-158

work page 2001

[14] [14]

Gomboy, E., & Horv ´ath, L. (1995). An application of u-statistics to change-point analysis.Acta Sci.Math. (Szeged),60, 345–357

work page 1995

[15] [15]

M., Rasch, M

Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch ¨olkopf, B., & Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research,13(1), 723–773

work page 2012

[16] [16]

Gupta, A., & Chen, J. (1996). Detecting changes of mean in multidimensional normal sequences with application to literature and geology.Computational Statistics,11, 211-–221

work page 1996

[17] [17]

Hawkins, D. M. (1977). Testing a sequence of observations for a shift in location.Journal of the American Statistical Association,72(357), 180–186

work page 1977

[18] [18]

Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution.Annals of Mathematical Statistics,19(3), 293-325

work page 1948

[19] [19]

Hsu, D. (1977). Tests for variance shifts at an unknown time point.Applied Statistics,26, No.3, 279-–284

work page 1977

[20] [20]

Kirch, C., & Stoehr, C. (2021). Sequential change point tests based on u-statistics.Scandinavian Journal of Statistics,49(3), 1184-1214

work page 2021

[21] [21]

Kirch, C., & Stoehr, C. (2022). Asymptotic delay times of sequential tests based on u-statistics for early and late change points.Journal of Statistical Planning and Inference,221, 114-135. Kov´acs, S., B ¨uhlmann, P., Li, H., & Munk, A. (2020). Seeded binary segmentation: a general methodology for fast and optimal change point detection.Biometrika,113(4)...

work page 2022

[22] [22]

Li, S., Xie, Y ., Dai, H., & Song, L. (2019). M-statistic for kernel change-point detection.Advances in Neural Information Processing Systems,32

work page 2019

[23] [23]

S., & James, N

Matteson, D. S., & James, N. A. (2014). ecp: An R package for nonparametric multiple change point analysis of multivariate data.Journal of Statistical Software,57(10), 1–25

work page 2014

[24] [24]

Pettitt, A. N. (1979). A nonparametric approach to the change-point problem.Journal of the Royal Statistical Society Series C,28(2), 126-135

work page 1979

[25] [25]

Rizzo, M. L. (2009). New goodness-of-fit tests for pareto distributions.Astin Bull,39(2), 691-715

work page 2009

[26] [26]

L., & Haman, J

Rizzo, M. L., & Haman, J. (2016). Expected distances and goodness-of-fit for the asymmetric laplace distribution. Statisc. Probab. Lett.,117, 158-164

work page 2016

[27] [27]

L., & Sz´ekely, G

Rizzo, M. L., & Sz´ekely, G. J. (2016). Energy distance.WIREs Comput. Stat.,8(1), 27–38

work page 2016

[28] [28]

S., & Worsley, K

Srivastava, M. S., & Worsley, K. J. (1986). Likelihood ratio tests for a change in the multivariate mean.ournal of the American Statistical Association,81, 199–204. 24 Sz´ekely, G. J. (2000). E-statistics: Energy of statistical samples. Technical report.BGSU, Department of Mathe- matics and Statistics,03-05. Sz´ekely, G. J., & Rizzo, M. L. (2004). Testing...

work page 1986

[29] [29]

Tibshirani, R., & Wang, P. (2008). Spatial smoothing and hot spot detection for cgh data using the fused lasso. Biostatistics,9, 18-29

work page 2008

[30] [30]

Wegner, L., & Wendler, M. (2024). Robust change-point detection for functional time series based on u-statistics and dependent wild bootstrap.Statistical Papers,65, 4767–4810. 25

work page 2024