Change point detection in high dimensional data has found considerable interest in recent years. Most of the literature either designs methodology for a retrospective analysis, where the whole sample is already available when the statistical inference begins, or considers online detection schemes controlling the average time until a false alarm. This paper takes a different point of view and develops monitoring schemes for the online scenario, where high dimensional data arrives successively and the goal is to detect changes as fast as possible controlling at the same time the probability of a type I error of a false alarm. We develop a sequential procedure capable of detecting changes in the mean vector of a successively observed high dimensional time series with spatial and temporal dependence. The statistical properties of the method are analyzed in the case where both, thesample size and dimension tend to infinity. In this scenario, it is shown that the new monitoring scheme has asymptotic level alpha under the null hypothesis of no change and is consistent under the alternative of a change in at least one component of the high dimensional mean vector. The approach is based on a new type of monitoring scheme for one-dimensional data which turns out to be often more powerful than the usually used CUSUM and Page-CUSUM methods, and the component-wise statistics are aggregated by the maximum statistic. For the analysis of the asymptotic properties of our monitoring scheme we prove that the range of a Brownian motion on a given interval is in the domain of attraction of the Gumbel distribution, which is a result of independent interest in extreme value theory. The finite sample properties of the new methodology are illustrated by means of a simulation study and in the analysis of a data example.
翻译:近些年来,在高维数据中,对变化点的探测发现引起了相当大的兴趣。大多数文献要么设计了回溯分析的方法,在统计推论开始时,整个样本已经存在,要么考虑在线检测方法,控制平均时间,直到虚假警报。本文采取不同的观点,为在线情景制定监测计划,高维数据相继到达,目标是在同一时间尽可能快地检测变化,控制一种类型I误差的概率。我们开发了一种顺序程序,能够检测连续观测的具有空间和时间依赖性的高度时间序列中平均矢量的变化。在两种情况下,该方法的统计特性都会被分析为两种情况,即:抽样大小和尺寸倾向于不精确。在这一假设中,高维值数据连续观测的高维时间序列中,高维中至少一个部分的误差。我们开发了一种新类型的监测方法,通过一种新的维数据的显示方法,该方法的一维值往往比通常使用的GUA-C的准确度数据范围分析方法要强得多。