We consider the problem of detecting distributional changes in a sequence of high dimensional data. Our proposed methods are nonparametric, suitable for either continuous or discrete data, and are based on weighted cumulative sums of U-statistics stemming from $L_p$ norms. We establish the asymptotic distribution of our proposed test statistics separately in cases of weakly dependent and strongly dependent coordinates as $\min\{N,d\}\to\infty$, where $N$ denotes sample size and $d$ is the dimension, and also provide sufficient conditions for consistency of the proposed test procedures under a general fixed alternative with one change point. We further assess finite sample performance of the test procedures through Monte Carlo studies, and conclude with two applications to Twitter data concerning the mentions of U.S. Governors and the frequency of tweets containing social justice keywords.
翻译:我们考虑了在高维数据序列中探测分布变化的问题。我们建议的方法不是参数,适合连续数据或离散数据,以来自美元标准的U-统计加权累积总和为基础。我们将我们提议的测试统计数据在依赖性弱和高度依赖性坐标的情况下,分别确定为零点分布,即$\min ⁇ N,d ⁇ to\infty$,其中0美元表示抽样规模,$d$为维度。我们提出的方法也为在通用固定的替代方法下以一个变更点统一拟议的测试程序提供了充分的条件。我们通过蒙特卡洛研究进一步评估测试程序的有限抽样性能,并在Twitter数据中将提到美国州长和含有社会公正关键词的推文频率两项应用结尾。