Online changepoint detection aims to detect anomalies and changes in real-time in high-frequency data streams, sometimes with limited available computational resources. This is an important task that is rooted in many real-world applications, including and not limited to cybersecurity, medicine and astrophysics. While fast and efficient online algorithms have been recently introduced, these rely on parametric assumptions which are often violated in practical applications. Motivated by data streams from the telecommunications sector, we build a flexible nonparametric approach to detect a change in the distribution of a sequence. Our procedure, NP-FOCuS, builds a sequential likelihood ratio test for a change in a set of points of the empirical cumulative density function of our data. This is achieved by keeping track of the number of observations above or below those points. Thanks to functional pruning ideas, NP-FOCuS has a computational cost that is log-linear in the number of observations and is suitable for high-frequency data streams. In terms of detection power, NP-FOCuS is seen to outperform current nonparametric online changepoint techniques in a variety of settings. We demonstrate the utility of the procedure on both simulated and real data.
翻译:在线变化点探测旨在探测高频数据流的异常和实时变化,有时是有限的计算资源,这是一项重要任务,植根于许多现实世界应用,包括但不仅限于网络安全、医学和天体物理学。虽然最近采用了快速高效的在线算法,但这些算法依赖实际应用中经常违反的参数假设。受电信部门数据流的驱动,我们建立了灵活的非参数方法,以探测序列分布的变化。我们的NP-FOCuS程序(NP-FOCuS)为改变我们数据的经验累积密度功能的一组点建立了相继概率比率测试。这是通过跟踪以上或以下点的观测数量来实现的。由于功能性调整,NP-FOCuS的计算成本在观测数量上是逻辑线性的,适合高频数据流。在探测能力方面,我们发现NP-FOCuS程序在各种环境中都超越了当前非参数在线变化点技术。我们模拟了程序的有效性。