利用计量运输进行非参数测试 (Change Point Analysis of Multivariate Data via Multivariate Rank-based Distribution-free Nonparametric Testing Using Measure Transportation)

In this paper, I propose a general algorithm for multiple change point analysis via multivariate distribution-free nonparametric testing based on the concept of ranks that are defined by measure transportation. Multivariate ranks and the usual one-dimensional ranks both share an important property: they are both distribution-free. This finding allows for the creation of nonparametric tests that are distribution-free under the null hypothesis. Here I will consider rank energy statistics in the context of the multiple change point problem. I will estimate the number of change points and each of their locations within a multivariate series of time-ordered observations. This paper will examine the multiple change point question in a broad setting in which the observed distributions and number of change points are unspecified, rather than assume the time series observations follow a parametric model or there is one change point, as many works in this area assume. The objective is to develop techniques for identifying change points while making as few presumptions as possible. This algorithm described here is based upon energy statistics and has the ability to detect any distributional change. Presented are the theoretical properties of this new algorithm and the conditions under which the approximate number of change points and their locations can be estimated. This newly proposed algorithm can be used to analyze various datasets, including financial and microarray data. This algorithm has also been successfully implemented in the R package recp, which is available on CRAN. A section of this paper is dedicated to the execution of this procedure, as well as the use of the recp package.

翻译：在本文中,我提出一个基于测量运输所定义的等级概念的多变分布非参数测试的多变分布点分析的通用算法。多变量排行和通常的一维排行都有一个重要的属性:它们都是无分布的。这个结论允许在无效假设下创建无分布的非参数测试。在这里,我将考虑多个变化点问题背景下的能源统计等级。我将估计变化点的数量及其在多变分布式一系列时间顺序观测中的位置。本文将审查在广泛环境下的多变点问题,在这个环境中,所观察到的分布和变化点的数量没有具体说明,而不是假设时间序列观测遵循一个参数模型或有一个变化点,正如许多工作假设的那样。目标是开发确定变化点的技术,同时尽可能少做一些假设。这里描述的算法基于能源统计,并且能够检测任何分布式变化。本文将介绍这一新算法的理论性质以及变更点的大致数量和变更点的数量, 并且其位置也假设时间序列中的时间序列中的时间序列中, 能够成功地使用这一缩略图中的缩略图, 这个算的缩略图中的缩略图是用来分析。