In classical canonical correlation analysis (CCA), the goal is to determine the linear transformations of two random vectors into two new random variables that are most strongly correlated. Canonical variables are pairs of these new random variables, while canonical correlations are correlations between these pairs. In this paper, we propose and study two generalizations of this classical method: (1) Instead of two random vectors we study more complex data structures that appear in important applications. In these structures, there are $L$ features, each described by $p_l$ scalars, $1 \le l \le L$. We observe $n$ such objects over $T$ time points. We derive a suitable analog of the CCA for such data. Our approach relies on embeddings into Reproducing Kernel Hilbert Spaces, and covers several related data structures as well. (2) We develop an analogous approach for multidimensional random processes. In this case, the experimental units are multivariate continuous, square-integrable functions over a given interval. These functions are modeled as elements of a Hilbert space, so in this case, we define the multiple functional canonical correlation analysis, MFCCA. We justify our approaches by their application to two data sets and suitable large sample theory. We derive consistency rates for the related transformation and correlation estimators, and show that it is possible to relax two common assumptions on the compactness of the underlying cross-covariance operators and the independence of the data.
翻译:在经典典型相关分析(CCA)中,目标是确定两个随机向量到两个新随机变量的线性变换,使得这两个新变量之间的相关性最强。典型变量是这些新随机变量构成的配对,而典型相关则是这些配对之间的相关系数。本文提出并研究了该经典方法的两种推广形式:(1)我们不再局限于两个随机向量,而是研究重要应用中出现的更复杂数据结构。在这些结构中,存在 $L$ 个特征,每个特征由 $p_l$ 个标量描述,其中 $1 \le l \le L$。我们在 $T$ 个时间点上观测 $n$ 个此类对象。针对此类数据,我们推导出CCA的合适类比方法。我们的方法依赖于再生核希尔伯特空间中的嵌入技术,并涵盖了几种相关的数据结构。(2)我们为多维随机过程开发了类似的分析方法。在这种情况下,实验单元是在给定区间上的多元连续平方可积函数。这些函数被建模为希尔伯特空间中的元素,因此我们将其定义为多重函数型典型相关分析(MFCCA)。我们通过两个数据集的应用和适当的大样本理论验证了所提方法的合理性。针对相关变换和相关性估计量,我们推导出一致性收敛速率,并证明可以放宽关于基础互协方差算子紧致性和数据独立性的两个常见假设。