This article studies the problem of online non-parametric change point detection in multivariate data streams. We approach the problem through the lens of kernel-based two-sample testing and introduce a sequential testing procedure based on random Fourier features, running with logarithmic time complexity per observation and with overall logarithmic space complexity. The algorithm has two advantages compared to the state of the art. First, our approach is genuinely online, and no access to training data known to be from the pre-change distribution is necessary. Second, the algorithm does not require the user to specify a window parameter over which local tests are to be calculated. We prove strong theoretical guarantees on the algorithm's performance, including information-theoretic bounds demonstrating that the detection delay is optimal in the minimax sense. Numerical studies on real and synthetic data show that our algorithm is competitive with respect to the state of the art.
翻译:本文研究了多元数据流中的在线非参数化变点检测问题。我们通过基于核的双样本检验视角切入,提出了一种基于随机傅里叶特征的序列检验方法,该方法在每次观测时具有对数时间复杂度,整体空间复杂度也为对数级别。相较于现有技术,该算法具备两大优势:首先,我们的方法真正实现在线检测,无需访问已知属于变化前分布的训练数据;其次,算法无需用户指定用于计算局部检验的窗口参数。我们证明了该算法性能的严格理论保证,包括信息论界表明检测延迟在极小极大意义下达到最优。在真实数据与合成数据上的数值研究表明,本算法相较于现有技术具有显著竞争力。