Canonical correlation analysis (CCA) is a standard tool for studying associations between two data sources; however, it is not designed for data with count or proportion measurement types. In addition, while CCA uncovers common signals, it does not elucidate which signals are unique to each data source. To address these challenges, we propose a new framework for CCA based on exponential families with explicit modeling of both common and source-specific signals. Unlike previous methods based on exponential families, the common signals from our model coincide with canonical variables in Gaussian CCA, and the unique signals are exactly orthogonal. These modeling differences lead to a non-trivial estimation via optimization with orthogonality constraints, for which we develop an iterative algorithm based on a splitting method. Simulations show on par or superior performance of the proposed method compared to the available alternatives. We apply the method to analyze associations between gene expressions and lipids concentrations in nutrigenomic study, and to analyze associations between two distinct cell-type deconvolution methods in prostate cancer tumor heterogeneity study.
翻译:Canonic 相关分析(CCA)是研究两个数据来源之间联系的标准工具;然而,它不是用于计算或比例测量类型数据的标准工具;此外,虽然共同国家评估发现了共同的信号,但没有阐明每个数据来源所特有的信号;为应对这些挑战,我们提议了一个基于指数式家庭的共同国家评估新框架,对共同和源特有的信号进行明确的建模;与以往基于指数式家庭的各种方法不同,我们模型中的共同信号与高斯共同国家评估中的 canonic变量相吻合,而独特的信号恰恰是正方形的。这些模型差异导致通过对正形限制进行优化来进行非三角估计,为此,我们以分裂法为基础开发了一个迭代算法。模拟显示了拟议方法与现有替代方法相比的等同或优性表现。我们采用这种方法分析营养遗传学研究中的基因表达和脂脂浓度之间的关联,并分析前列腺肿瘤遗传学研究中两种不同的细胞型脱变方法之间的关联。