There are proposals that extend the classical generalized additive models (GAMs) to accommodate high-dimensional data ($p>>n$) using group sparse regularization. However, the sparse regularization may induce excess shrinkage when estimating smoothing functions, damaging predictive performance. Moreover, most of these GAMs consider an "all-in-all-out" approach for functional selection, rendering them difficult to answer if nonlinear effects are necessary. While some Bayesian models can address these shortcomings, using Markov chain Monte Carlo algorithms for model fitting creates a new challenge, scalability. Hence, we propose Bayesian hierarchical generalized additive models as a solution: we consider the smoothing penalty for proper shrinkage of curve interpolation and separation of smoothing function linear and nonlinear spaces. A novel spike-and-slab spline prior is proposed to select components of smoothing functions. Two scalable and deterministic algorithms, EM-Coordinate Descent and EM-Iterative Weighted Least Squares, are developed for different utilities. Simulation studies and metabolomics data analyses demonstrate improved predictive or computational performance against state-of-the-art models, mgcv, COSSO and sparse Bayesian GAM. The software implementation of the proposed models is freely available via an R package BHAM.
翻译:有一些建议扩大传统通用添加模型(GAMS),以适应高维数据(p ⁇ n$),使用群体稀疏的正规化,但是,稀疏的正规化可能会在估计平滑功能时导致过度缩缩,损害预测性性能。此外,大多数这些典型通用添加模型都考虑功能选择的“万灵通”方法,如果非线性效果是必需的,则难以回答这些缺陷。一些巴伊西亚模型可以解决这些缺陷,使用Markov链链 Monte Carlo 算法来安装模型会产生新的挑战和可缩放性。因此,我们提出巴伊西亚等级的等级级通用添加模型作为解决办法:我们考虑对曲线间流滑动功能线性和非线性空间的适当缩缩缩和分离实行平滑的罚款。之前建议采用一种“万灵通”方法来选择平滑功能的组成部分。为不同的公用事业开发了两种可缩和确定性的算法,即EM-相向源和EM-经精密的微最小广场。因此,模拟和模拟数据分析表明对州-SAMA-M-A模型的可自由使用。