TabDDPM: 带有传播模型的建模表格数据 (TabDDPM: Modelling Tabular Data with Diffusion Models)

Denoising diffusion probabilistic models are currently becoming the leading paradigm of generative modeling for many important data modalities. Being the most prevalent in the computer vision community, diffusion models have also recently gained some attention in other domains, including speech, NLP, and graph-like data. In this work, we investigate if the framework of diffusion models can be advantageous for general tabular problems, where datapoints are typically represented by vectors of heterogeneous features. The inherent heterogeneity of tabular data makes it quite challenging for accurate modeling, since the individual features can be of completely different nature, i.e., some of them can be continuous and some of them can be discrete. To address such data types, we introduce TabDDPM -- a diffusion model that can be universally applied to any tabular dataset and handles any type of feature. We extensively evaluate TabDDPM on a wide set of benchmarks and demonstrate its superiority over existing GAN/VAE alternatives, which is consistent with the advantage of diffusion models in other fields. Additionally, we show that TabDDPM is eligible for privacy-oriented setups, where the original datapoints cannot be publicly shared.

翻译：目前,在很多重要数据模式中,传播模型作为最普遍的计算机视觉界,最近也在包括语音、NLP和图表类数据在内的其他领域引起了一些关注。在这项工作中,我们调查传播模型框架是否有利于一般的表格问题,因为数据点通常由多种特征的矢量代表。表格数据的内在异质性使得精确模型具有相当大的挑战性,因为个别特征可能具有完全不同的性质,即其中某些特征可以是连续的,某些特征可以是分散的。为了处理这类数据类型,我们引入了TabDDPM -- -- 一种可普遍应用于任何表格数据集和处理任何特征的传播模型。我们广泛评价了一套广泛的基准,并表明TabDDPMM优于现有的GAN/VAE替代物,这与其他领域的传播模型的优势是一致的。此外,我们表明TabDDPMD有资格进行以隐私为导向的设置,而原始数据点无法公开分享。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

因果图，Causal Graphs，52页ppt

专知会员服务

252+阅读 · 2020年4月19日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日