Recent developments in regularized Canonical Correlation Analysis (CCA) promise powerful methods for high-dimensional, multiview data analysis. However, justifying the structural assumptions behind many popular approaches remains a challenge, and features of realistic biological datasets pose practical difficulties that are seldom discussed. We propose a novel CCA estimator rooted in an assumption of conditional independencies and based on the Graphical Lasso. Our method has desirable theoretical guarantees and good empirical performance, demonstrated through extensive simulations and real-world biological datasets. Recognizing the difficulties of model selection in high dimensions and other practical challenges of applying CCA in real-world settings, we introduce a novel framework for evaluating and interpreting regularized CCA models in the context of Exploratory Data Analysis (EDA), which we hope will empower researchers and pave the way for wider adoption.
翻译:正则化典型相关分析(CCA)的最新进展为高维多视图数据分析提供了强大的方法。然而,许多流行方法背后的结构假设仍难以得到充分论证,且现实生物数据集的特征带来了实践中鲜少讨论的困难。我们提出了一种基于条件独立性假设、以图套索(Graphical Lasso)为核心的新型CCA估计器。该方法具备理想的理论保证和良好的实证性能,并通过大量模拟实验与真实生物数据集得到验证。针对高维模型选择困难及CCA在实际应用中的其他挑战,我们引入了一个在探索性数据分析(EDA)框架下评估与解释正则化CCA模型的新范式,旨在赋能研究者并推动该技术的更广泛应用。