The training of high-dimensional regression models on comparably sparse data is an important yet complicated topic, especially when there are many more model parameters than observations in the data. From a Bayesian perspective, inference in such cases can be achieved with the help of shrinkage prior distributions, at least for generalized linear models. However, real-world data usually possess multilevel structures, such as repeated measurements or natural groupings of individuals, which existing shrinkage priors are not built to deal with. We generalize and extend one of these priors, the R2D2 prior by Zhang et al. (2020), to linear multilevel models leading to what we call the R2D2M2 prior. The proposed prior enables both local and global shrinkage of the model parameters. It comes with interpretable hyperparameters, which we show to be intrinsically related to vital properties of the prior, such as rates of concentration around the origin, tail behavior, and amount of shrinkage the prior exerts. We offer guidelines on how to select the prior's hyperparameters by deriving shrinkage factors and measuring the effective number of non-zero model coefficients. Hence, the user can readily evaluate and interpret the amount of shrinkage implied by a specific choice of hyperparameters. Finally, we perform extensive experiments on simulated and real data, showing that our inference procedure for the prior is well calibrated, has desirable global and local regularization properties and enables the reliable and interpretable estimation of much more complex Bayesian multilevel models than was previously possible.
翻译:在可比较的稀少数据上培训高维回归模型是一个重要但复杂的专题,特别是当数据中比观测更多的模型参数时。从巴伊西亚的角度来看,这类情况下的推论可以借助于缩小先前的分布,至少对于通用线性模型来说是如此。然而,现实世界数据通常具有多层次的结构,例如反复测量或个人自然组合,而现有的缩缩缩前研究并不是要处理的。我们推广并扩展了其中一个前科,即张等人(202020年)之前的R2D2,到导致我们以前称为R2D2M2的线性多级模型。从巴伊西亚的角度看,这类案例的推论可以使模型的本地和全球范围的缩缩缩缩,至少对通用的模型进行缩缩略,我们所显示的超大参数与先前的关键特性有内在的联系,例如围绕源的集中率、尾部行为和先前的缩缩略等。我们就如何通过测缩缩略的缩略图来选择之前的超常规性模型和测量非ZO值的精确性模型数量提供了指导方针。因此,我们最容易地在前期的缩略微的缩缩缩缩化的模型中,用户能够通过精确地评估和精确的缩略微的缩略微数据。