High dimensional incomplete data can be found in a wide range of systems. Due to the fact that most of the data mining techniques and machine learning algorithms require complete observations, data imputation is vital for down-stream analysis. In this work, we introduce an imputation approach, called EMFlow, that performs imputation in an latent space via an online version of Expectation-Maximization (EM) algorithm and connects the latent space and the data space via the normalizing flow (NF). The inference of EMFlow is iterative, involving updating the parameters of online EM and NF alternatively. Extensive experimental results on multivariate and image datasets show that the proposed EMFlow has superior performance to competing methods in terms of both imputation quality and convergence speed.
翻译:由于大多数数据挖掘技术和机器学习算法都需要完整的观测,数据估算对于下游分析至关重要。在这项工作中,我们采用了称为EMFlow的估算法,通过在线版本的预期-最大化算法在潜在空间进行估算,并通过正常化流程将潜伏空间和数据空间连接起来。 EMFlow的推论是迭接的,包括更新在线EM和NF的参数。关于多变量和图像数据集的广泛实验结果显示,拟议的EMFlow在估算质量和趋同速度两方面都优于相互竞争的方法。