In this paper, I propose a general algorithm for multiple change point analysis via multivariate distribution-free nonparametric testing based on the concept of ranks that are defined by measure transportation. Multivariate ranks and the usual one-dimensional ranks both share an important property: they are both distribution-free. This finding allows for the creation of nonparametric tests that are distribution-free under the null hypothesis. Here I will consider rank energy statistics in the context of the multiple change point problem. I will estimate the number of change points and each of their locations within a multivariate series of time-ordered observations. This paper will examine the multiple change point question in a broad setting in which the observed distributions and number of change points are unspecified, rather than assume the time series observations follow a parametric model or there is one change point, as many works in this area assume. The objective is to develop techniques for identifying change points while making as few presumptions as possible. This algorithm described here is based upon energy statistics and has the ability to detect any distributional change. Presented are the theoretical properties of this new algorithm and the conditions under which the approximate number of change points and their locations can be estimated. This newly proposed algorithm can be used to analyze various datasets, including financial and microarray data. This algorithm has also been successfully implemented in the R package recp, which is available on GitHub. A section of this paper is dedicated to the execution of this procedure, as well as the use of the recp package.
翻译:在本文中, 我提出一个基于测量运输所定义的等级概念的多变分布非参数测试的多变分布点分析的通用算法。 多变量排行和通常的一维排行都有一个重要属性: 它们都是无分布的。 这个发现允许在无效假设下创建非参数测试, 在无效假设下没有分布。 这里我将考虑多个变化点问题背景下的能源统计等级。 我将估计变化点的数量及其在多变分布式一系列时间顺序观测中的位置。 本文将审查在广泛环境下的多变点问题, 所观察到的分布和变化点的数量没有具体说明, 而不是假设时间序列观测遵循一个参数模型或有一个改变点, 正如这个区域的许多工程假设一样。 目标是开发确定变化点的技术, 同时尽可能少做一些假设。 这里描述的算法以能源统计为基础, 并且能够检测任何分布式变化的分布式变化。 本文将介绍这一新算法的理论性质以及变更点的近似数量以及更改点和更改点的数量, 并且其位置也假定时间序列中有一个参数, 。 这个精确的算法程序是用来分析, 。 这个精确地分析, 。 这个系统的模型是用来分析, 。 。