线性混合模型的子集选 (Subset selection for linear mixed models)

Linear mixed models (LMMs) are instrumental for regression analysis with structured dependence, such as grouped, clustered, or multilevel data. However, selection among the covariates--while accounting for this structured dependence--remains a challenge. We introduce a Bayesian decision analysis for subset selection with LMMs. Using a Mahalanobis loss function that incorporates the structured dependence, we derive optimal linear coefficients for (i) any given subset of variables and (ii) all subsets of variables that satisfy a cardinality constraint. Crucially, these estimates inherit shrinkage or regularization and uncertainty quantification from the underlying Bayesian model, and apply for any well-specified Bayesian LMM. More broadly, our decision analysis strategy deemphasizes the role of a single "best" subset, which is often unstable and limited in its information content, and instead favors a collection of near-optimal subsets. This collection is summarized by key member subsets and variable-specific importance metrics. Customized subset search and out-of-sample approximation algorithms are provided for more scalable computing. These tools are applied to simulated data and a longitudinal physical activity dataset, and demonstrate excellent prediction, estimation, and selection ability.

翻译：线性混合模型(LMMs)有助于以结构依赖性进行回归分析,如分组、集群或多层次数据。然而,在共同变量中进行选择,同时对结构依赖性进行核算,这仍然是个挑战。我们采用巴伊西亚决定分析法,与LMMs进行子集选择。我们采用马哈拉诺比斯损失函数,将结构依赖性纳入其中,我们为(一) 任何特定变量子集和(二) 符合基本要求的所有变量子集得出最佳线性系数。至关重要的是,这些估计数继承了基础贝伊西亚模型的缩缩放或正规化和不确定性量化,并适用于任何明确指定的巴伊西亚LMMm。更广泛地说,我们的决定分析战略不强调单一“最佳”子集的作用,该子集往往不稳定,信息内容有限,而是支持收集近于最佳的子集。这一收集由关键成员子集和变量特定重要度指标汇总。定制子集搜索和超模性近比值算法是用于更可缩的计算。这些工具被用于模拟和显示极好的物理选择能力。