Bayesian variable selection is a powerful tool for data analysis, as it offers a principled method for variable selection that accounts for prior information and uncertainty. However, wider adoption of Bayesian variable selection has been hampered by computational challenges, especially in difficult regimes with a large number of covariates P or non-conjugate likelihoods. To scale to the large P regime we introduce an efficient MCMC scheme whose cost per iteration is sublinear in P. In addition we show how this scheme can be extended to generalized linear models for count data, which are prevalent in biology, ecology, economics, and beyond. In particular we design efficient algorithms for variable selection in binomial and negative binomial regression, which includes logistic regression as a special case. In experiments we demonstrate the effectiveness of our methods, including on cancer and maize genomic data.
翻译:Bayesian变量选择是数据分析的有力工具,因为它为变量选择提供了一个原则性方法,考虑到先前的信息和不确定性。然而,更广泛地采用Bayesian变量选择受到计算挑战的阻碍,特别是在具有大量共变P或非共变可能性的艰难制度下。为了推广大型的P制度,我们引入了高效的MCMCM计划,其迭代成本在P中是次线性的。此外,我们展示了如何将这一计划推广到通用的计算数据线性模型,在生物学、生态学、经济学及其他方面十分普遍。特别是,我们设计了高效的算法,用于二进制和负二进制回归中的变量选择,其中包括作为特殊情况的后勤倒退。在实验中,我们展示了我们方法的有效性,包括癌症和玉米基因学数据的有效性。