A key element of AutoML systems is setting the types of models that will be used for each type of task. For classification and regression problems with tabular data, the use of tree ensemble models (like XGBoost) is usually recommended. However, several deep learning models for tabular data have recently been proposed, claiming to outperform XGBoost for some use-cases. In this paper, we explore whether these deep models should be a recommended option for tabular data, by rigorously comparing the new deep models to XGBoost on a variety of datasets. In addition to systematically comparing their accuracy, we consider the tuning and computation they require. Our study shows that XGBoost outperforms these deep models across the datasets, including datasets used in the papers that proposed the deep models. We also demonstrate that XGBoost requires much less tuning. On the positive side, we show that an ensemble of the deep models and XGBoost performs better on these datasets than XGBoost alone.
翻译:AutoML 系统的一个关键要素是设定每种任务将使用的模型类型。 对于表格数据的分类和回归问题,通常建议使用树合金模型(如 XGBoost ) 。 但是,最近为表格数据提出了几个深层次学习模型,声称某些使用案例比XGBoost 高得多。 在本文中,我们探讨这些深层模型是否应该作为推荐列表数据的一种选项,在各种数据集上将新的深层模型与XGBoost 进行严格比较。除了系统地比较其准确性外,我们还考虑它们所需要的调试和计算。我们的研究显示, XGBoost 超越了这些跨数据集的深层模型,包括提出深层模型的文件中所使用的数据集。我们还表明, XGBoost 要求的调调要少得多。 在积极的一面,我们显示,深层模型和XGBoost 的共集比 XGBoost 单靠这些数据集表现得更好。