Change-point detection has been a classical problem in statistics and econometrics. This work focuses on the problem of detecting abrupt distributional changes in the data-generating distribution of a sequence of high-dimensional observations, beyond the first two moments. This has remained a substantially less explored problem in the existing literature, especially in the high-dimensional context, compared to detecting changes in the mean or the covariance structure. We develop a nonparametric methodology to (i) detect an unknown number of change-points in an independent sequence of high-dimensional observations and (ii) test for the significance of the estimated change-point locations. Our approach essentially rests upon nonparametric tests for the homogeneity of two high-dimensional distributions. We construct a single change-point location estimator via defining a cumulative sum process in an embedded Hilbert space. As the key theoretical innovation, we rigorously derive its limiting distribution under the high dimension medium sample size (HDMSS) framework. Subsequently we combine our statistic with the idea of wild binary segmentation to recursively estimate and test for multiple change-point locations. The superior performance of our methodology compared to other existing procedures is illustrated via extensive simulation studies as well as over stock prices data observed during the period of the Great Recession in the United States.
翻译:在统计和计量经济学方面,变化点探测是一个典型的问题。这项工作的重点是在前两个时刻之后,发现在高维观测序列数据生成分布的突然分布变化,这在现有的文献中,特别是在高维背景下,与探测平均值或共差结构的变化相比,仍然是远为较少探索的问题。我们开发了一种非参数方法,以便(一) 在独立的高维观测序列中发现数目不详的变化点,并(二) 测试估计变化点位置的意义。我们的方法基本上依赖于对两种高维分布的同质性进行非参数性测试。我们通过在嵌入的希尔伯特空间中确定一个累积加积过程,构建了一个单一的改变点位置估计符。作为关键的理论创新,我们严格地从高维中样本规模(HDMSS)框架下得出其有限的分布。随后,我们将我们的数据与野生的二分解概念结合起来,对多位变化点位置进行反复估计和测试。我们的方法基本上依赖于两个高度分布分布的同质性测试。我们的方法与其他现行程序相比,通过广泛的模拟研究,在所观察到的美国存量期间的先进数据的优性表现。