Motivated by a real-world application in cardiology, we develop an algorithm to perform Bayesian bi-level variable selection in a generalized linear model, for datasets that may be large both in terms of the number of individuals and the number of predictors. Our algorithm relies on the waste-free SMC Sequential Monte Carlo methodology of Dau and Chopin (2022), a new proposal mechanism to deal with the constraints specific to bi-level selection (which forbid to select an individual predictor if its group is not selected), and the ALA (approximate Laplace approximation) approach of Rossell et al. (2021). We show in our numerical study that the algorithm may offer reliable performance on large datasets within a few minutes, on both simulated data and real data related to the aforementioned cardiology application.
翻译:本文受心脏病学相关研究的启发,开发了一种在广义线性模型中进行贝叶斯双层变量选择的算法。该算法适用于在个体数和预测变量数方面都可以很大的情况下分析数据集。我们的算法依赖于 Dau 和 Chopin(2022)的无浪费 SMC(Sequential Monte Carlo)方法,该方法是处理双层选择问题的新提案机制,同时结合了 Rossell 等人(2021)的 ALA(approximate Laplace approximation)方法。我们在数值实验中表明,该算法在模拟数据和与前述心脏病应用相关的真实数据中可能在几分钟内提供可靠的大数据集性能。