Though neural networks have achieved enormous breakthroughs on various fields (e.g., computer vision) in supervised learning, they still trailed the performances of GBDTs on tabular data thus far. Delving into this issue, we identify that a proper handling of feature interactions and feature embedding is crucial to the success of neural networks on tabular data. We develop a novel neural network called ExcelFormer, which alternates in turn two attention modules that respectively manipulate careful feature interactions and feature embedding updates. A bespoke training methodology is jointly introduced to facilitate the model performances. By initializing parameters with minuscule values, these attention modules are attenuated when the training begins, and the effects of feature interactions and embedding updates progressively grow up to optimum levels under the guidance of the proposed specific regularization approaches Swap-Mix and Hidden-Mix as the training proceeds. Experiments on 25 public tabular datasets show that our ExcelFormer is superior to extremely-tuned GBDTs, which is an unprecedented achievement of neural networks in supervised tabular learning. The codes are available at https://github.com/WhatAShot/ExcelFormer.
翻译:尽管神经网络在受监督的学习领域(如计算机愿景)取得了巨大的突破,但它们迄今仍然在表格数据上跟踪了GBDTs的绩效。我们研究这一问题时发现,正确处理特征互动和嵌入特征对于在表格数据上神经网络的成功至关重要。我们开发了一个叫ExcelFormer的新型神经网络,它反过来又将分别操作谨慎特征互动和嵌入特征更新的两个关注模块相交为交替。为了便利模型性能,我们联合采用了一种简单的培训方法。通过初始化带有微值的参数,这些关注模块在培训开始时就被加速了,在拟议的具体规范方法Swap-Mix和Hide-Mix的指导下,功能互动和嵌入更新的影响逐渐提高到最佳水平。在25个公共表格数据集上进行的实验显示,我们的ExFormer软件优于经过极大调的GBDTs。这是在监督的表格学习中神经网络前所未有的成就。代码见https://github.com/WheAShot/ExcelFormer。