We propose a very fast approximate Markov Chain Monte Carlo (MCMC) sampling framework that is applicable to a large class of sparse Bayesian inference problems, where the computational cost per iteration in several models is of order $O(ns)$, where $n$ is the sample size, and $s$ the underlying sparsity of the model. This cost can be further reduced by data sub-sampling when stochastic gradient Langevin dynamics are employed. The algorithm is an extension of the asynchronous Gibbs sampler of Johnson et al. (2013), but can be viewed from a statistical perspective as a form of Bayesian iterated sure independent screening (Fan et al. (2009)). We show that in high-dimensional linear regression problems, the Markov chain generated by the proposed algorithm admits an invariant distribution that recovers correctly the main signal with high probability under some statistical assumptions. Furthermore we show that its mixing time is at most linear in the number of regressors. We illustrate the algorithm with several models.
翻译:我们提出了一个非常快速的Markov链条蒙特卡洛(MCMCC)抽样框架,适用于大量稀疏的贝叶斯人的推论问题,即若干模型中的每迭代计算成本按正值$(ns)美元计算,其中以美元为样本规模,以美元为基底的宽度为美元。当采用随机梯度梯度兰格文动态时,可以通过数据子抽样进一步降低这一成本。算法是Johnson等人(2013年)的非同步吉布斯采样器的延伸,但可以从统计角度看成是Bayesian反复反复独立筛选的一种形式(Fan等人(2009年) )。我们显示,在高维线性线性回归问题中,拟议算法产生的Markov链在一些统计假设中接受一种不变化分布,在极有可能情况下正确恢复主要信号。此外,我们显示其混合时间在递归者数量中最多为线性。我们用几种模型来说明算法。