We consider the problem of comparing several samples of stochastic processes with respect to their second-order structure, and describing the main modes of variation in this second order structure, if present. These tasks can be seen as an Analysis of Variance (ANOVA) and a Principal Component Analysis (PCA) of covariance operators, respectively. They arise naturally in functional data analysis, where several populations are to be contrasted relative to the nature of their dispersion around their means, rather than relative to their means themselves. We contribute a novel approach based on optimal (multi)transport, where each covariance can be identified with a a centred Gaussian process of corresponding covariance. By means of constructing the optimal simultaneous coupling of these Gaussian processes, we contrast the (linear) maps that achieve it with the identity with respect to a norm-induced distance. The resulting test statistic, calibrated by permutation, is seen to distinctly outperform the state-of-the-art, and to furnish considerable power even under local alternatives. This effect is seen to be genuinely functional, and is related to the potential for perfect discrimination in infinite dimensions. In the event of a rejection of the null hypothesis stipulating equality, a geometric interpretation of the transport maps allows us to construct a (tangent space) PCA revealing the main modes of variation. As a necessary step to developing our methodology, we prove results on the existence and boundedness of optimal multitransport maps. These are of independent interest in the theory of transport of Gaussian processes. The transportation ANOVA and PCA are illustrated on a variety of simulated and real examples.
翻译:我们认为,比较若干次级结构的随机过程样本,并描述第二顺序结构(如果存在的话)的主要变化模式。这些任务可以分别视为差异分析(ANOVA)和共差操作员的主要组成部分分析(PCA),这在功能数据分析中自然产生,其中将若干人口与其在手段周围的分散性质相对应,而不是与其手段本身相对应。我们以最佳(多级)运输为基础,提出了一种新颖的方法,其中每一种变异都可以与核心的高斯进程相匹配。通过构建这些高斯进程的最佳同时组合(ANOVA)和共差操作者的主要组成部分分析(PCA),我们用标准引起的距离来对比实现差异的(线性)地图。由此得出的测试性统计(根据变异程度加以校准)明显地超越了现状,甚至在当地替代工具下也提供了相当大的动力。这种效果被视为真正的功能,并且与精确的代差过程的潜在的高比值过程有关。通过构建这些(线性)地图的精确度分析,让我们的地理结构中出现一个必要的变异性模型。