表格数据中缺失价值估算的传播模型 (Diffusion models for missing value imputation in tabular data)

Missing value imputation in machine learning is the task of estimating the missing values in the dataset accurately using available information. In this task, several deep generative modeling methods have been proposed and demonstrated their usefulness, e.g., generative adversarial imputation networks. Recently, diffusion models have gained popularity because of their effectiveness in the generative modeling task in images, texts, audio, etc. To our knowledge, less attention has been paid to the investigation of the effectiveness of diffusion models for missing value imputation in tabular data. Based on recent development of diffusion models for time-series data imputation, we propose a diffusion model approach called "Conditional Score-based Diffusion Models for Tabular data" (CSDI_T). To effectively handle categorical variables and numerical variables simultaneously, we investigate three techniques: one-hot encoding, analog bits encoding, and feature tokenization. Experimental results on benchmark datasets demonstrated the effectiveness of CSDI_T compared with well-known existing methods, and also emphasized the importance of the categorical embedding techniques.

翻译：机器学习中缺失的估算值是利用现有信息准确估计数据集缺失值的任务。在这项任务中,提出了几种深层次的基因模型方法,并展示了这些方法的有用性,例如:基因对抗估算网络。最近,传播模型因其在图像、文本、音频等的基因模型任务中的有效性而越来越受欢迎。据我们所知,较少注意对表格数据中缺失值估算的传播模型的有效性的调查。根据最近开发的时间序列数据估算传播模型,我们提出了一种称为“基于条件的分数数据滴入模型”的推广模型方法(CDI_T)。为了同时有效地处理绝对变量和数字变量,我们研究了三种技术:一热编码、模拟比特编码和特征符号化。基准数据集的实验结果表明,CNI_T与众所周知的现有方法相比是有效的,我们还强调了绝对嵌入技术的重要性。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日