Change point detection becomes more and more important as datasets increase in size, where unsupervised detection algorithms can help users process data. To detect change points, a number of unsupervised algorithms have been developed which are based on different principles. One approach is to define an optimisation problem and minimise a cost function along with a penalty function. In the optimisation approach, the choice of the cost function affects the predictions made by the algorithm. In extension to the existing studies, a new type of cost function using Tikhonov regularisation is introduced. Another approach uses Bayesian statistics to calculate the posterior probability distribution of a specific point being a change point. It uses a priori knowledge on the distance between consecutive change points and a likelihood function with information about the segments. The optimisation and Bayesian approaches for offline change point detection are studied and applied to simulated datasets as well as a real world multi-phase dataset. The approaches have previously been studied separately and a novelty lies in comparing the predictions made by the two approaches in a specific setting, consisting of simulated datasets and a real world example. The study has found that the performance of the change point detection algorithms are affected by the features in the data.
翻译:随着数据集的大小增加,在不受监督的检测算法可以帮助用户处理数据的情况下,变化点的检测变得越来越重要。为了检测变化点,已经开发了一些基于不同原则的未经监督的算法。一种方法是界定优化问题,并尽可能减少成本函数和罚款功能。在优化方法中,成本函数的选择会影响算法所作的预测。在扩展现有研究时,采用一种使用Tikhonov常规化的新型成本函数。另一种方法是使用巴伊西亚统计来计算特定点作为变化点的后视概率分布。它使用先验知识来计算连续变化点之间的距离和部分信息的概率函数。对离线变化点检测的优化和巴伊西亚方法进行了研究,并应用于模拟数据集和真实的世界多阶段数据集。这些方法以前曾单独研究过,在比较具体设置中两种方法所作的预测是新颖的,包括模拟数据集和真实世界的检测功能。研究发现,性能变化的特征是影响世界的特征。