Canonical correlation analysis (CCA) is a technique for measuring the association between two multivariate data matrices. A regularized modification of canonical correlation analysis (RCCA) which imposes an $\ell_2$ penalty on the CCA coefficients is widely used in applications with high-dimensional data. One limitation of such regularization is that it ignores any data structure, treating all the features equally, which can be ill-suited for some applications. In this paper we introduce several approaches to regularizing CCA that take the underlying data structure into account. In particular, the proposed group regularized canonical correlation analysis (GRCCA) is useful when the variables are correlated in groups. We illustrate some computational strategies to avoid excessive computations with regularized CCA in high dimensions. We demonstrate the application of these methods in our motivating application from neuroscience, as well as in a small simulation example.
翻译:Canonical 相关分析(CCA)是衡量两个多变量数据矩阵之间关联的一种方法。在具有高维数据的应用中,对CAC系数处以2美元罚款的CRCA定期修改(RCCA)被广泛使用。这种规范化的一个局限性是,它忽视了任何数据结构,平等对待所有特征,可能不适合某些应用。在本文件中,我们引入了将CAC正规化的几种方法,将基本数据结构考虑在内。特别是,拟议的集团正规化CARCA(GRCA)在变量在组别相关时非常有用。我们举例说明一些计算战略,以避免在高维度的常规化CACA中进行过度计算。我们从神经科学以及一个小型模拟例子中展示了这些方法在激励应用中的应用。