多变量数据的变化点分析:使用基于多种变量的、无等级分配的多变量无分布式非参数测试,通过在肿瘤微粒和痴呆症应用的测量运输进行非参数测试 (Change Point Analysis of Multivariate Data: Using Multivariate Rank-based Distribution-free Nonparametric Testing via Measure Transportation with Applications in Tumor Microarrays and Dementia)

2021 年 11 月 7 日

Change Point Analysis of Multivariate Data: Using Multivariate Rank-based Distribution-free Nonparametric Testing via Measure Transportation with Applications in Tumor Microarrays and Dementia

翻译：多变量数据的变化点分析:使用基于多种变量的、无等级分配的多变量无分布式非参数测试,通过在肿瘤微粒和痴呆症应用的测量运输进行非参数测试

Amanda Ng

from arxiv, 20 pages and 4 figures

In this paper, I propose a general algorithm for multiple change point analysis via multivariate distribution-free nonparametric testing based on the concept of ranks that are defined by measure transportation. Multivariate ranks and the usual one-dimensional ranks both share an important property: they are both distribution-free. This finding allows for the creation of nonparametric tests that are distribution-free under the null hypothesis. This method has applications in a variety of fields, and in this paper I implement this algorithm to a microarray dataset for individuals with bladder tumors, an ECoG snapshot for a patient with epilepsy, and in the context of trajectories of CASI scores by education level and dementia status. Each change point denotes a shift in the rate of change of Cognitive Abilities score over years, indicating the existence of preclinical dementia. Here I will estimate the number of change points and each of their locations within a multivariate series of time-ordered observations. This paper will examine the multiple change point question in a broad setting in which the observed distributions and number of change points are unspecified, rather than assume the time series observations follow a parametric model or there is one change point, as many works in this area assume. The objective here is to create an algorithm for change point detection while making as few assumptions about the dataset as possible. Presented are the theoretical properties of this new algorithm and the conditions under which the approximate number of change points and their locations can be estimated. This algorithm has also been successfully implemented in the R package recp, which is available on GitHub. A section of this paper is dedicated to the execution of this procedure, as well as the use of the recp package.

翻译：在本文中, 我提出一个基于测量运量所定义的等级概念的多变分布非参数测试的多变分布点分析的通用算法。多变等级和通常的一维等级都具有重要的属性: 它们都是无分配的。此发现允许创建非参数测试, 在无效假设下是无分配的。此方法适用于多个领域, 并在本文中, 我将此算法应用到一个用于膀胱肿瘤患者的微数列数据集, ECoG 对癫痫病人的快照, 以及在CASI 分数的轨迹中, 以及按教育水平和痴呆状态排列。每个变化点都表示一个重要属性: 它们都是无分配的。这个结果可以用来创建无分配的非参数。这里我将估算改变点的数量及其在多个变异性时间序列中的位置。本文将审视一个多变数点的问题, 在一个宽的段落中, 所观察到的纸质分布和变数的数值是按教育程度和痴呆状态排列的。在轨迹中, 此变数的测算中, 此变数是用来测算的顺序中, 此变数的测算过程的顺序是用来测算的, 。。在测算中, 此测算中测算中测算中, 此测算的测算中, 的测算中, 的测算的测算中, 的测算的测算中, 的测算中, 的测算中, 的测算中, 的测算的测算的测算过程的测算过程的测算过程的测算中, 。