基于运输的功能ANOVA和共同经营人常设常设仲裁法院 (Transportation-Based Functional ANOVA and PCA for Covariance Operators)

We consider the problem of comparing several samples of stochastic processes with respect to their second-order structure, and describing the main modes of variation in this second order structure, if present. These tasks can be seen as an Analysis of Variance (ANOVA) and a Principal Component Analysis (PCA) of covariance operators, respectively. They arise naturally in functional data analysis, where several populations are to be contrasted relative to the nature of their dispersion around their means, rather than relative to their means themselves. We contribute a novel approach based on optimal (multi)transport, where each covariance can be identified with a a centred Gaussian process of corresponding covariance. By means of constructing the optimal simultaneous coupling of these Gaussian processes, we contrast the (linear) maps that achieve it with the identity with respect to a norm-induced distance. The resulting test statistic, calibrated by permutation, is seen to distinctly outperform the state-of-the-art, and to furnish considerable power even under local alternatives. This effect is seen to be genuinely functional, and is related to the potential for perfect discrimination in infinite dimensions. In the event of a rejection of the null hypothesis stipulating equality, a geometric interpretation of the transport maps allows us to construct a (tangent space) PCA revealing the main modes of variation. As a necessary step to developing our methodology, we prove results on the existence and boundedness of optimal multitransport maps. These are of independent interest in the theory of transport of Gaussian processes. The transportation ANOVA and PCA are illustrated on a variety of simulated and real examples.

翻译：我们认为,比较若干次级结构的随机过程样本,并描述第二顺序结构(如果存在的话)的主要变化模式。这些任务可以分别视为差异分析(ANOVA)和共差操作员的主要组成部分分析(PCA),这在功能数据分析中自然产生,其中将若干人口与其在手段周围的分散性质相对应,而不是与其手段本身相对应。我们以最佳(多级)运输为基础,提出了一种新颖的方法,其中每一种变异都可以与核心的高斯进程相匹配。通过构建这些高斯进程的最佳同时组合(ANOVA)和共差操作者的主要组成部分分析(PCA),我们用标准引起的距离来对比实现差异的(线性)地图。由此得出的测试性统计(根据变异程度加以校准)明显地超越了现状,甚至在当地替代工具下也提供了相当大的动力。这种效果被视为真正的功能,并且与精确的代差过程的潜在的高比值过程有关。通过构建这些(线性)地图的精确度分析,让我们的地理结构中出现一个必要的变异性模型。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日