This paper introduces a novel iterative method for missing data imputation that sequentially reduces the mutual information between data and the corresponding missingness mask. Inspired by GAN-based approaches that train generators to decrease the predictability of missingness patterns, our method explicitly targets this reduction in mutual information. Specifically, our algorithm iteratively minimizes the KL divergence between the joint distribution of the imputed data and missingness mask, and the product of their marginals from the previous iteration. We show that the optimal imputation under this framework can be achieved by solving an ODE whose velocity field minimizes a rectified flow training objective. We further illustrate that some existing imputation techniques can be interpreted as approximate special cases of our mutual-information-reducing framework. Comprehensive experiments on synthetic and real-world datasets validate the efficacy of our proposed approach, demonstrating its superior imputation performance. Our implementation is available at https://github.com/yujhml/MIRI-Imputation.
翻译:本文提出了一种新颖的迭代式缺失数据填补方法,该方法通过逐步降低数据与相应缺失掩码之间的互信息来实现。受基于生成对抗网络(GAN)方法的启发——这些方法通过训练生成器来降低缺失模式的可预测性——我们的方法明确以降低互信息为目标。具体而言,我们的算法迭代地最小化填补后数据与缺失掩码的联合分布与前一次迭代中它们边缘分布乘积之间的KL散度。我们证明,在该框架下,最优填补可通过求解一个速度场最小化修正流训练目标的常微分方程(ODE)来实现。我们进一步阐明,一些现有的填补技术可被解释为我们这一降低互信息框架的近似特例。在合成数据集和真实世界数据集上的综合实验验证了所提方法的有效性,展示了其优越的填补性能。我们的实现代码可在 https://github.com/yujhml/MIRI-Imputation 获取。