Bayesian variable selection is a powerful tool for data analysis, as it offers a principled method for variable selection that accounts for prior information and uncertainty. However, wider adoption of Bayesian variable selection has been hampered by computational challenges, especially in difficult regimes with a large number of covariates or non-conjugate likelihoods. Generalized linear models for count data, which are prevalent in biology, ecology, economics, and beyond, represent an important special case. Here we introduce an efficient MCMC scheme for variable selection in binomial and negative binomial regression that exploits Tempered Gibbs Sampling (Zanella and Roberts, 2019) and that includes logistic regression as a special case. In experiments we demonstrate the effectiveness of our approach, including on cancer data with seventeen thousand covariates.
翻译:Bayesian变量选择是数据分析的有力工具,因为它为变量选择提供了一个原则性方法,考虑到先前的信息和不确定性;然而,更广泛地采用Bayesian变量选择受到计算挑战的阻碍,特别是在具有大量共变或非共变可能性的艰难制度下。生物、生态、经济学及其他方面普遍存在的计算数据通用线性模型是一个重要的特殊案例。这里我们引入了高效的MCMC计划,用于二进制和负二进制回归中的变量选择,利用Tempered Gibbbs Sampling(Zanella和Roberts,2019年),并将后勤倒退列为一个特殊案例。在实验中,我们展示了我们的方法的有效性,包括17万共变的癌症数据。