Though deep neural networks have gained enormous successes in various fields (e.g., computer vision) with supervised learning, they have so far been still trailing after the performances of GBDTs on tabular data. Delving into this task, we determine that a judicious handling of feature interactions and feature representation is crucial to the effectiveness of neural networks on tabular data. We develop a novel neural network called ExcelFormer, which alternates in turn between two attention modules that shrewdly manipulate feature interactions and feature representation updates, respectively. A bespoke training methodology is jointly introduced to facilitate model performances. Specifically, by initializing parameters with minuscule values, these attention modules are attenuated when the training begins, and the effects of feature interactions and representation updates grow progressively up to optimum levels under the guidance of our proposed specific regularization schemes Feat-Mix and Hidden-Mix as the training proceeds. Experiments on 28 public tabular datasets show that our ExcelFormer approach is superior to extensively-tuned GBDTs, which is an unprecedented progress of deep neural networks on supervised tabular learning.
翻译:尽管深层神经网络在各个领域(例如计算机视野)取得了巨大的成功,并经过监督的学习,但迄今为止,这些网络在以表格形式提供的数据的GBDT的性能之后仍然落后。我们仔细研究这项任务,确定明智地处理特征互动和特征代表方式对于以表格形式提供的数据的神经网络的有效性至关重要。我们开发了一个名为ExcelFormer的新神经网络,该网络在两个关注模块之间互为替代,这些模块分别精巧地操纵了特征互动和特征代表方式更新。我们联合采用了一种直言不讳的培训方法,以促进模型的性能。具体地说,在培训开始时,这些关注模块开始使用极微值的参数,而特征互动和特征代表方式更新的作用随着培训的进行而逐渐提高到最佳水平。在28个公开表格数据集的实验显示,我们的ExcorFormer方法优于广泛调整的GBDTs,这是监督用表格学习的深层神经网络取得前所未有的进展。