Matrix factorization methods - including Factor analysis (FA), and Principal Components Analysis (PCA) - are widely used for inferring and summarizing structure in multivariate data. Many matrix factorization methods exist, corresponding to different assumptions on the elements of the underlying matrix factors. For example, many recent methods use a penalty or prior distribution to achieve sparse representations ("Sparse FA/PCA"). Here we introduce a general Empirical Bayes approach to matrix factorization (EBMF), whose key feature is that it uses the observed data to estimate prior distributions on matrix elements. We derive a correspondingly-general variational fitting algorithm, which reduces fitting EBMF to solving a simpler problem - the so-called "normal means" problem. We implement this general algorithm, but focus particular attention on the use of sparsity-inducing priors that are uni-modal at 0. This yields a sparse EBMF approach - essentially a version of sparse FA/PCA - that automatically adapts the amount of sparsity to the data. We demonstrate the benefits of our approach through both numerical comparisons with competing methods and through analysis of data from the GTEx (Genotype Tissue Expression) project on genetic associations across 44 human tissues. In numerical comparisons EBMF often provides more accurate inferences than other methods. In the GTEx data, EBMF identifies interpretable structure that concords with known relationships among human tissues. Software implementing our approach is available at https://github.com/stephenslab/flashr
翻译:矩阵要素化方法,包括系数分析(FA)和主元件分析(PCA),被广泛用于多变量数据中的推算和总结结构。许多矩阵要素化方法存在,与基本矩阵要素要素的不同假设相对应。例如,许多近期方法使用惩罚或先前分配,以达到稀散的表示方式(“Sparse FA/PCA”)。我们在这里对矩阵要素化采用一般的 Epirital Bayes 方法(EBMF 方法),其主要特征是,它使用观察到的数据来估计在矩阵要素中先前的分布。我们得出了对应的通用变异配置算法,这减少了EBMF适合解决一个更简单的问题的能力化方法,即所谓的“正常手段”问题。我们采用这种一般算法,但特别侧重于使用松散性前的表示方式(“Sparse FA/PCA” 方法),这导致一种稀疏的 EBMFA/PCA方法, 其关键特征是,它自动调整了矩阵方法的容积度。我们的方法的好处是通过相互竞争的方法进行数字比较,并且通过分析从已知的EGTFABIS组织中的数据分析, 通常是使用EGTFAFA 。