Sparse principal component analysis (PCA) is a popular tool for dimensional reduction of high-dimensional data. Despite its massive popularity, there is still a lack of theoretically justifiable Bayesian sparse PCA that is computationally scalable. A major challenge is choosing a suitable prior for the loadings matrix, as principal components are mutually orthogonal. We propose a spike and slab prior that meets this orthogonality constraint and show that the posterior enjoys both theoretical and computational advantages. Two computational algorithms, the PX-CAVI and the PX-EM algorithms, are developed. Both algorithms use parameter expansion to deal with the orthogonality constraint and to accelerate their convergence speeds. We found that the PX-CAVI algorithm has superior empirical performance than the PX-EM algorithm and two other penalty methods for sparse PCA. The PX-CAVI algorithm is then applied to study a lung cancer gene expression dataset. $\mathsf{R}$ package $\mathsf{VBsparsePCA}$ with an implementation of the algorithm is available on The Comprehensive R Archive Network.
翻译:主要元件分析( PCA) 是一个广受欢迎的工具, 用来对高维数据进行量性减少。 尽管它广受欢迎, 但仍缺乏理论上合理的巴伊西亚稀有的可计算缩放的五氯苯甲醚。 一项重大挑战是选择一个适合装载矩阵的预选工具, 因为主要元件是相互交错的。 我们提议在之前使用一个钉钉和板块, 以满足这个交错性限制, 并显示后继器享有理论和计算上的优势。 正在开发两种计算算法, PX- CAVI 和 PX- EM 算法。 两种算法都使用参数扩展来处理正向限制并加速其趋同速度。 我们发现, PX- CAVI 算法比 PX- EM 算法和另外两种稀薄的五氯苯的处罚方法具有较高的实证性。 然后, PX- CAVI 算法用于研究肺癌基因表达数据集。 $\mathsf{ { { $\\ vprass PCA} 和算法的实施。