Generalized correlation analysis (GCA) is concerned with uncovering linear relationships across multiple datasets. It generalizes canonical correlation analysis that is designed for two datasets. We study sparse GCA when there are potentially multiple generalized correlation tuples in data and the loading matrix has a small number of nonzero rows. It includes sparse CCA and sparse PCA of correlation matrices as special cases. We first formulate sparse GCA as generalized eigenvalue problems at both population and sample levels via a careful choice of normalization constraints. Based on a Lagrangian form of the sample optimization problem, we propose a thresholded gradient descent algorithm for estimating GCA loading vectors and matrices in high dimensions. We derive tight estimation error bounds for estimators generated by the algorithm with proper initialization. We also demonstrate the prowess of the algorithm on a number of synthetic datasets.
翻译:通用相关分析(GCA)涉及在多个数据集中发现线性关系,它概括了为两个数据集设计的典型相关分析。当数据中可能存在多重通用相关图象时,我们研究稀有的GCA,而装载矩阵则有少量非零行,包括作为特例的互连矩阵的稀少的CCA和稀少的五氯苯甲醚。我们首先通过谨慎地选择正常化制约,将稀少的GCA作为人口和样本层面的普遍缺量问题。根据样本优化问题的拉格朗格形式,我们提出了用于估计高维度的GCA载量矢量和矩阵的临界梯度下沉算法。我们从适当初始化的算法中得出严格的估计误差界限。我们还在一些合成数据集中展示了算法的优点。