Change point detection is an important part of time series analysis, as the presence of a change point indicates an abrupt and significant change in the data generating process. While many algorithms for change point detection have been proposed, comparatively little attention has been paid to evaluating their performance on real-world time series. Algorithms are typically evaluated on simulated data and a small number of commonly-used series with unreliable ground truth. Clearly this does not provide sufficient insight into the comparative performance of these algorithms. Therefore, instead of developing yet another change point detection method, we consider it vastly more important to properly evaluate existing algorithms on real-world data. To achieve this, we present a data set specifically designed for the evaluation of change point detection algorithms that consists of 37 time series from various application domains. Each series was annotated by five human annotators to provide ground truth on the presence and location of change points. We analyze the consistency of the human annotators, and describe evaluation metrics that can be used to measure algorithm performance in the presence of multiple ground truth annotations. Next, we present a benchmark study where 14 algorithms are evaluated on each of the time series in the data set. Our aim is that this data set will serve as a proving ground in the development of novel change point detection algorithms.
翻译:更改点的检测是时间序列分析的一个重要部分,因为变更点的存在表明数据生成过程发生了突然和重大的变化。虽然提出了许多变更点检测的算法,但相对较少注意评价其在现实世界时间序列中的性能。 等级通常根据模拟数据和少量常用的、地面真实性不可靠的系列来评估。 这显然无法对这些算法的比较性能提供充分的洞察力。 因此,我们认为,与其再开发另一个变更点检测方法,还远比正确评估现实世界数据的现有算法更为重要。 为了实现这一目标,我们专门为评估变更点检测算法制定了一套数据集,该数据集由不同应用领域的37个时间序列组成。每个序列都由5名人类说明员附加说明,以提供变化点的存在和位置的地面真相。我们分析了人类说明器的一致性,并描述了在多个地面真相说明中可用于衡量算法绩效的评价指标。 其次,我们提出一个基准研究,即每个时间序列的14个算法将用来评估数据在新数据检测中的每个时间序列中。我们的目标是设定一个数据将用来验证新的数据。