Detecting abrupt changes in data distribution is one of the most significant tasks in streaming data analysis. Although many unsupervised Change-Point Detection (CPD) methods have been proposed recently to identify those changes, they still suffer from missing subtle changes, poor scalability, or/and sensitive to noise points. To meet these challenges, we are the first to generalise the CPD problem as a special case of the Change-Interval Detection (CID) problem. Then we propose a CID method, named iCID, based on a recent Isolation Distributional Kernel (IDK). iCID identifies the change interval if there is a high dissimilarity score between two non-homogeneous temporal adjacent intervals. The data-dependent property and finite feature map of IDK enabled iCID to efficiently identify various types of change points in data streams with the tolerance of noise points. Moreover, the proposed online and offline versions of iCID have the ability to optimise key parameter settings. The effectiveness and efficiency of iCID have been systematically verified on both synthetic and real-world datasets.
翻译:尽管最近提出了许多未经监督的变化点探测方法,以确定这些变化,但它们仍然受到一些细微变化的缺失、可缩放性差或/和对噪声点敏感的影响。为了应对这些挑战,我们首先将数据分布的突然变化作为变化间探测(CID)问题的一个特例加以概括。然后,我们根据最近的隔离分布式内核(IDK)提出了称为 IDIDE 的CID 方法。如果两个非均匀时间相邻间隔之间有高度的异差分,IDK 的根据数据属性和有限特征图使IDK 能够有效地确定数据流中各种类型的变化点,并能够容忍噪声点。此外,拟议的iCID的在线和离线版本能够优化关键参数设置。在合成和现实世界数据集上,已经系统地核实了 IDE的有效性和效率。