In the regime of change-point detection, a nonparametric framework based on scan statistics utilizing graphs representing similarities among observations is gaining attention due to its flexibility and good performances for high-dimensional and non-Euclidean data sequences, which are ubiquitous in this big data era. However, this graph-based framework encounters problems when there are repeated observations in the sequence, which often happens for discrete data, such as network data. In this work, we extend the graph-based framework to solve this problem by averaging or taking union of all possible optimal graphs resulted from repeated observations. We consider both the single change-point alternative and the changed-interval alternative, and derive analytic formulas to control the type I error for the new methods, making them fast applicable to large datasets. The extended methods are illustrated on an application in detecting changes in a sequence of dynamic networks over time. All proposed methods are implemented in an R package gSeg available on CRAN.
翻译:在变化点探测制度下,一个基于扫描统计的非参数性框架,利用显示不同观测结果相似的图表进行扫描,由于这一框架具有灵活性,而且在这个大数据时代数据序列中,高维数据序列和非欧化数据序列的性能良好,因此日益受到注意。然而,在这种数据序列中反复观测时,以图形为基础的框架遇到了问题,这种序列中经常出现离散数据,例如网络数据。在这项工作中,我们扩大基于图形的框架,通过平均或结合从反复观测中得出的所有可能的最佳图表来解决这一问题。我们考虑了单位变化点替代品和变化间替代数据,并得出了分析公式,以控制新方法的I型错误,使其迅速适用于大型数据集。扩展的方法在用于探测动态网络序列随时间变化时加以说明。所有拟议方法都在CRAN上提供的R 套件 gSeg中实施。