Missing data imputation is a fundamental problem in data analysis, and many studies have been conducted to improve its performance by exploring model structures and learning procedures. However, data augmentation, as a simple yet effective method, has not received enough attention in this area. In this paper, we propose a novel data augmentation method called Missingness Augmentation (MisA) for generative imputation models. Our approach dynamically produces incomplete samples at each epoch by utilizing the generator's output, constraining the augmented samples using a simple reconstruction loss, and combining this loss with the original loss to form the final optimization objective. As a general augmentation technique, MisA can be easily integrated into generative imputation frameworks, providing a simple yet effective way to enhance their performance. Experimental results demonstrate that MisA significantly improves the performance of many recently proposed generative imputation models on a variety of tabular and image datasets. The code is available at \url{https://github.com/WYu-Feng/Missingness-Augmentation}.
翻译:缺失数据插值是数据分析中的一项基础问题。过去的研究通过探索模型结构和学习过程来提高插值性能,但是数据增广作为一种简单但有效的方法,在该领域中却没有得到足够的关注。本文提出了一种新的数据增广方法——缺失增强(MisA),用于用于处理生成模型的插值问题。该方法利用生成器的输出动态产生不完整的样本,使用简单的重构损失约束增广样本,并将此损失与原始损失合并,形成最终的优化目标。作为一种通用的增广技术,MisA 可以轻松地集成到生成插值框架中,为提高性能提供了一种简单有效的方法。实验结果表明,MisA 显著提高了许多最近提出的生成插值模型在多个表格和图像数据集上的性能。该项目的代码可在 \url{https://github.com/WYu-Feng/Missingness-Augmentation} 上找到。