We consider the task of self-supervised representation learning (SSL) for tabular data: tabular-SSL. Typical contrastive learning based SSL methods require instance-wise data augmentations which are difficult to design for unstructured tabular data. Existing tabular-SSL methods design such augmentations in a relatively ad-hoc fashion and can fail to capture the underlying data manifold. Instead of augmentations based approaches for tabular-SSL, we propose a new reconstruction based method, called Masked Encoding for Tabular Data (MET), that does not require augmentations. MET is based on the popular MAE approach for vision-SSL [He et al., 2021] and uses two key ideas: (i) since each coordinate in a tabular dataset has a distinct meaning, we need to use separate representations for all coordinates, and (ii) using an adversarial reconstruction loss in addition to the standard one. Empirical results on five diverse tabular datasets show that MET achieves a new state of the art (SOTA) on all of these datasets and improves up to 9% over current SOTA methods. We shed more light on the working of MET via experiments on carefully designed simple datasets.
翻译:我们考虑的是表格数据自我监督的代表性学习(SSL)的任务:表单-SSL。典型的对照学习方法要求以实例为基础的数据增强方法,这些方法很难设计出结构化的表格数据。现有的表单-SSL方法以相对临时的方式设计这种增强,并且可能无法捕捉基本数据。我们建议采用基于增强的表格-SSL方法,而不是基于增强的方法,即基于表单数据(MET)的新的重建方法,即所谓的“标签数据掩码编码(MET)”,不需要增强。MET基于愿景-SSL流行的MAE方法[He 等人,2021],并使用两个关键概念:(一)由于表格数据集中的每个坐标都有不同的含义,我们需要对所有坐标分别使用不同的表示,以及(二)除标准一外,使用对抗性重整损失。关于五个不同的表格数据集的结果表明,MET在所有这些数据集上都取得了新的状态,并比目前SOTA方法改进到9%。我们通过简单的实验方法仔细地展示了META的数据。