Change-point analysis plays a significant role in various fields to reveal discrepancies in distribution in a sequence of observations. While a number of algorithms have been proposed for high-dimensional data, kernel-based methods have not been well explored due to difficulties in controlling false discoveries and mediocre performance. In this paper, we propose a new kernel-based framework that makes use of an important pattern of data in high dimensions to boost power. Analytic approximations to the significance of the new statistics are derived and fast tests based on the asymptotic results are proposed, offering easy off-the-shelf tools for large datasets. The new tests show superior performance for a wide range of alternatives when compared with other state-of-the-art methods. We illustrate these new approaches through an analysis of a phone-call network data. All proposed methods are implemented in an R package KerSeg.
翻译:变化点分析在各个领域起着重要作用,以揭示一系列观测的分布差异。虽然为高维数据提出了若干算法,但由于难以控制虚假发现和中等性能,以内核为基础的方法没有得到很好探讨。在本文件中,我们提议一个新的内核框架,利用高维数据的重要模式来增强动力。根据无药可循的结果,得出了新统计数据重要性的分析近似值,并提出了快速测试,为大型数据集提供了方便的现成工具。新的测试显示,与其他最新方法相比,各种替代方法的优异性优异性。我们通过分析电话呼叫网络数据来说明这些新做法。所有拟议方法都在一个R包 KerSeg中实施。