Multiple imputation is a widely used technique to handle missing data in large observational studies. For variable selection on multiply-imputed datasets, however, if we conduct selection on each imputed dataset separately, different sets of important variables may be obtained. MI-LASSO, one of the most popular solutions to this problem, regards the same variable across all separate imputed datasets as a group of variables and exploits Group-LASSO to yield a consistent variable selection across all the multiply-imputed datasets. In this paper, we extend the MI-LASSO model into Bayesian framework and utilize five different Bayesian MI-LASSO models to perform variable selection on multiply-imputed data. These five models consist of three shrinkage priors based and two discrete mixture prior based approaches. We conduct a simulation study investigating the practical characteristics of each model across various settings. We further demonstrate these methods via a case study using the multiply-imputed data from the University of Michigan Dioxin Exposure Study. The Python package BMIselect is hosted on Github under an Apache-2.0 license: https://github.com/zjg540066169/Bmiselect.
翻译:在大型观测研究中,多种估算是一种广泛使用的处理缺失数据的方法。然而,对于对多参数数据集的变量选择,如果我们分别对每个估算数据集进行选择,则可能获得不同组的重要变量。MI-LASSO是这一问题最受欢迎的解决办法之一,它将所有不同估算数据集的变量都视为一组变量,并利用Group-LASSO, 在所有多参数数据集中得出一致的变量选择。在本文中,我们将MI-LASSO模型推广到巴伊西亚框架,并利用五个不同的Bayesian MI-LASSO模型对多参数数据进行不同选择。这五个模型包括三个基于先前缩影的缩影和两个先前基于方法的离散混合物。我们进行模拟研究,调查各个模型在不同环境中的实际特征。我们通过案例研究,利用密歇根大学的倍数模拟数据,进一步展示这些方法。在Githhoon BMIselect 的五种不同的 BMISELSO模型托管G-169/ASGI/BREMI/ASGI/ASGI/ASGI/ASGILLILA. MA. MA.