We introduce a new powerful scan statistic and an associated test for detecting the presence and pinpointing the location of a change point within the distribution of a data sequence where the data elements take values in a general separable metric space $(\Omega, d)$. These change points mark abrupt shifts in the distribution of the data sequence. Our method hinges on distance profiles, where the distance profile of an element $\omega \in \Omega$ is the distribution of distances from $\omega$ as dictated by the data. Our approach is fully non-parametric and universally applicable to diverse data types, including distributional and network data, as long as distances between the data objects are available. From a practicable point of view, it is nearly tuning parameter-free, except for the specification of cut-off intervals near the endpoints where change points are assumed not to occur. Our theoretical results include a precise characterization of the asymptotic distribution of the test statistic under the null hypothesis of no change points and rigorous guarantees on the consistency of the test in the presence of change points under contiguous alternatives, as well as for the consistency of the estimated change point location. Through comprehensive simulation studies encompassing multivariate data, bivariate distributional data and sequences of graph Laplacians, we demonstrate the effectiveness of our approach in both change point detection power and estimating the location of the change point. We apply our method to real datasets, including U.S. electricity generation compositions and Bluetooth proximity networks, underscoring its practical relevance.
翻译:暂无翻译