Data segmentation a.k.a. multiple change point analysis has received considerable attention due to its importance in time series analysis and signal processing, with applications in a variety of fields including natural and social sciences, medicine, engineering and finance. In the first part of this survey, we review the existing literature on the canonical data segmentation problem which aims at detecting and localising multiple change points in the mean of univariate time series. We provide an overview of popular methodologies on their computational complexity and theoretical properties. In particular, our theoretical discussion focuses on the separation rate relating to which change points are detectable by a given procedure, and the localisation rate quantifying the precision of corresponding change point estimators, and we distinguish between whether a homogeneous or multiscale viewpoint has been adopted in their derivation. We further highlight that the latter viewpoint provides the most general setting for investigating the optimality of data segmentation algorithms. Arguably, the canonical segmentation problem has been the most popular framework to propose new data segmentation algorithms and study their efficiency in the last decades. In the second part of this survey, we motivate the importance of attaining an in-depth understanding of strengths and weaknesses of methodologies for the change point problem in a simpler, univariate setting, as a stepping stone for the development of methodologies for more complex problems. We illustrate this with a range of examples showcasing the connections between complex distributional changes and those in the mean. We also discuss extensions towards high-dimensional change point problems where we demonstrate that the challenges arising from high dimensionality are orthogonal to those in dealing with multiple change points.
翻译:a.k.a. 多个变化点分析因其在时间序列分析和信号处理方面的重要性而得到了相当的重视。在本次调查的第一部分,我们审查了关于卡通性数据分割问题的现有文献,其目的是发现和确定单向时间序列平均值中多种变化点;我们概述了关于计算复杂性和理论属性的流行方法。特别是,我们的理论讨论侧重于与变化点可通过特定程序探测到的离差率,以及量化相应变化点估计器的精确度的本地化率,我们区分了在引伸过程中是否采用了同质或多级观点。我们进一步强调,后一种观点为调查数据分离算法的最佳性提供了最一般的背景。可以说,关于其计算复杂性和理论属性的分解问题一直是提出新的数据分解算法并研究其过去几十年效率的最受欢迎的框架。在本次调查的第二部分中,我们鼓励实现高变点精确度的精确度,我们用更精确的变差点来说明这些变差和变差之间的比例,我们用更精确的方法来说明这些变差点的变差点,我们用更精确的方法来解释这些变差的变差和变差点,我们用更精确的方法来说明这些变差的方法来说明这些变的变差的变差的变差点,我们用的方法来说明这些变差点是更的更的更的更的顺序,我们用的方法在变差的更的更的更的更的更的顺序,我们更的更的更的更的更难的变数。