Dimension reduction algorithms aim to discover latent variables which describe underlying structures in high-dimensional data. Methods such as factor analysis and principal component analysis have the downside of not offering much interpretability of its inferred latent variables. Sparse factor analysis addresses this issue by imposing sparsity on its factor loadings, allowing each latent variable to be related to only a subset of features, thus increasing interpretability. Sparse factor analysis has been used in a wide range of areas including genomics, signal processing, and economics. We compare two Bayesian inference techniques for sparse factor analysis, namely Markov chain Monte Carlo (MCMC), and variational inference (VI). VI is computationally faster than MCMC, at the cost of a loss in accuracy. We derive MCMC and VI algorithms and perform a comparison using both simulated and biological data, demonstrating that the higher computational efficiency of VI is desirable over the small gain in accuracy when using MCMC. Our implementation of MCMC and VI algorithms for sparse factor analysis is available at https://github.com/ysfoo/sparsefactor.
翻译:诸如要素分析和主要组成部分分析等方法的缺点在于其推断的潜伏变数没有提供多少解释性。粗略因素分析通过在因素负荷上加宽,使每个潜在变数仅与一组特征相关,从而增加可解释性,来解决这个问题。在包括基因组学、信号处理和经济学在内的一系列广泛领域使用了粗略因素分析。我们比较了两种稀薄因素分析的巴伊西亚推论技术,即Markov链 Monte Carlo(MCMC)和变异推论(VI)。六是在计算上比MCMC的更快,其代价是准确性损失。我们利用模拟数据和生物数据来计算MCM和VI的算法并进行比较,表明六的计算效率高于使用MCMC的精度的微增益。我们在https://github.com/ysfoo/spargyfactor上对稀薄要素分析的MCMC和VI算法进行了应用。我们可在https://github.com/ysfoo/sparfactor上查阅。