Linear mixed models (LMMs) are instrumental for regression analysis with structured dependence, such as grouped, clustered, or multilevel data. However, selection among the covariates--while accounting for this structured dependence--remains a challenge. We introduce a Bayesian decision analysis for subset selection with LMMs. Using a Mahalanobis loss function that incorporates the structured dependence, we derive optimal linear actions for any subset of covariates and under any Bayesian LMM. Crucially, these actions inherit shrinkage or regularization and uncertainty quantification from the underlying Bayesian LMM. Rather than selecting a single "best" subset, which is often unstable and limited in its information content, we collect the acceptable family of subsets that nearly match the predictive ability of the "best" subset. The acceptable family is summarized by its smallest member and key variable importance metrics. Customized subset search and out-of-sample approximation algorithms are provided for more scalable computing. These tools are applied to simulated data and a longitudinal physical activity dataset, and in both cases demonstrate excellent prediction, estimation, and selection ability.
翻译:线性混合模型(LMMs)有助于进行有结构依赖性的回归分析,如分组、集群或多层次数据。然而,在共同变量中选择这一结构依赖性分类的子集仍是一项挑战。我们采用巴伊西亚决定分析法来选择与LMMs相匹配的子集。我们使用包含结构依赖性的马哈拉诺比损失函数,为任何子集以及任何巴伊西亚LMM(Bayesian LMM)下的任何子集得出最佳线性行动。关键是,这些行动继承了基础巴伊西亚LMM(Bayesian)的缩缩水或正规化和不确定性量化。这些工具不是选择一个“最佳”子集,而该子集的信息内容往往不稳定且有限。我们收集了几近于“最佳”子集预测能力的可接受子组群群群群。可接受性家庭由最小的成员和关键可变重要度指标加以总结。定制的子搜索和超缩缩缩算法用于更可缩的计算。这些工具用于模拟数据和长纵向物理活动数据集,在两种情况下都显示极好的预测、估计和选择能力。