Multiple imputation (MI) is the state-of-the-art approach for dealing with missing data arising from non-response in sample surveys. Multiple imputation by chained equations (MICE) is the most widely used MI method, but it lacks theoretical foundation and is computationally intensive. Recently, MI methods based on deep learning models have been developed with encouraging results in small studies. However, there has been limited research on systematically evaluating their performance in realistic settings comparing to MICE, particularly in large-scale surveys. This paper provides a general framework for using simulations based on real survey data and several performance metrics to compare MI methods. We conduct extensive simulation studies based on the American Community Survey data to compare repeated sampling properties of four machine learning based MI methods: MICE with classification trees, MICE with random forests, generative adversarial imputation network, and multiple imputation using denoising autoencoders. We find the deep learning based MI methods dominate MICE in terms of computational time; however, MICE with classification trees consistently outperforms the deep learning MI methods in terms of bias, mean squared error, and coverage under a range of realistic settings.
翻译:多重估算(MI)是处理抽样调查中未答复的缺失数据的最先进方法。由链式方程式(MICE)进行多重估算是最广泛使用的MII方法,但缺乏理论基础,而且是在计算上十分密集。最近,根据深层学习模式开发了MI方法,在小型研究中取得了令人鼓舞的结果。然而,与MIICE相比,在现实环境中系统评估其绩效的研究有限,特别是在大规模调查中。本文提供了一个使用模拟的一般框架,以真实调查数据和若干性能衡量标准为基础,比较MI方法。我们根据美国社区调查数据进行了广泛的模拟研究,以比较基于MI方法的四个机器学习方法的重复抽样属性:MICE与分类树、MIICE与随机森林、基因化对称式对称估算网络以及使用脱钩自动电算器进行多重估算。我们发现基于深层学习的MIICE方法在计算时间方面占主导地位;然而,与分类树木相比,在偏见、平均平方误差和在现实环境中的覆盖面方面,MICE方法始终优于深度学习MI方法。