Spike-and-slab and horseshoe regression are arguably the most popular Bayesian variable selection approaches for linear regression models. However, their performance can deteriorate if outliers and heteroskedasticity are present in the data, which are common features in many real-world statistics and machine learning applications. In this work, we propose a Bayesian nonparametric approach to linear regression that performs variable selection while accounting for outliers and heteroskedasticity. Our proposed model is an instance of a Dirichlet process scale mixture model with the advantage that we can derive the full conditional distributions of all parameters in closed form, hence producing an efficient Gibbs sampler for posterior inference. Moreover, we present how to extend the model to account for heavy-tailed response variables. The performance of the model is tested against competing algorithms on synthetic and real-world datasets.
翻译:斯派克和斯拉布和马蹄石回归可以说是最受欢迎的巴耶斯变异选择线性回归模型。 但是,如果数据中含有外部值和三重心,则其性能可能会恶化,这是许多真实世界统计和机器学习应用中常见的特征。 在这项工作中,我们建议对线性回归采用巴耶斯非参数性方法,在计算外部值和三重心性时进行变量选择。我们提议的模型是一个迪里什特工艺级混合模型的例子,其优势是我们可以以封闭形式获得所有参数的完全有条件分布,从而产生高效的Gibbs取样器用于后方推断。此外,我们介绍了如何扩展模型以核算重尾量反应变量。模型的性能通过合成和真实世界数据集的竞争性算法进行测试。