High-dimensional linear models have been extensively studied in the recent literature, but the developments in high-dimensional generalized linear models, or GLMs, have been much slower. In this paper, we propose the use an empirical or data-driven prior specification leading to an empirical Bayes posterior distribution which can be used for estimation of and inference on the coefficient vector in a high-dimensional GLM, as well as for variable selection. For our proposed method, we prove that the posterior distribution concentrates around the true/sparse coefficient vector at the optimal rate and, furthermore, provide conditions under which the posterior can achieve variable selection consistency. Computation of the proposed empirical Bayes posterior is simple and efficient, and, in terms of variable selection in logistic and Poisson regression, is shown to perform well in simulations compared to existing Bayesian and non-Bayesian methods.
翻译:近代文献对高维线性模型进行了广泛研究,但高维通用线性模型(即GLMs)的发展速度要慢得多,在本文中,我们提议使用经验性或数据驱动的先期规格,导致经验性贝耶斯后方分布,可用于在高维GLM中估计和推断系数矢量,以及用于选择变量。关于我们建议的方法,我们证明后方分布以最佳速率集中在真实/偏差系数矢量周围,此外,还提供了后方能够实现可变选择一致性的条件。对拟议的经验性贝耶斯后方分布的计算简单而有效,在物流和普瓦森回归的可变选择方面,与巴耶斯和非拜耶斯方法相比,在模拟中表现良好。</s>