Canonical correlation analysis (CCA) is a popular statistical technique for exploring the relationship between datasets. The estimation of sparse canonical correlation vectors has emerged in recent years as an important but challenging variation of the CCA problem, with widespread applications. Currently available rate-optimal estimators for sparse canonical correlation vectors are expensive to compute. We propose a quasi-Bayesian estimation procedure that achieves the minimax estimation rate, and yet is easy to compute by Markov Chain Monte Carlo (MCMC). The method builds on ([37]) and uses a re-scaled Rayleigh quotient function as a quasi-log-likelihood. However unlike these authors, we adopt a Bayesian framework that combines this quasi-log-likelihood with a spike-and-slab prior that serves to regularize the inference and promote sparsity. We investigated the empirical behavior of the proposed method on both continuous and truncated data, and we noted that it outperforms several state-of-the-art methods. As an application, we use the methodology to maximally correlate clinical variables and proteomic data for a better understanding of covid-19 disease.
翻译:Canonical 关联性分析(CCA)是探索数据集之间关系的流行统计技术。对稀有的Canonic 相关矢量的估算是近年来出现的,是CAC问题的一个重要但富有挑战性的变异,具有广泛的应用性。目前,对稀有的Canonic 相关矢量的现有速率最佳估计器计算成本很高。我们提出了一个准Bayyesian估算程序,该程序可以达到最小负数估计率,但由Markov Cain Call Monte Carlo(MCMC)来计算。该方法以([37]为基础,并使用重新标定的Ray Leay 商数功能作为准原样功能。然而,与这些作者不同,我们采用了一种巴伊西亚框架,将这种准log-lim和峰值前的峰值与悬浮值结合起来,从而规范了推断力,促进宽度。我们研究了拟议方法在连续和曲速数据上的经验行为。我们注意到,该方法超越了几种状态方法。我们使用这一方法来更好地理解临床变量和原型疾病。