For tabular data sets, we explore data and model distillation, as well as data denoising. These techniques improve both gradient-boosting models and a specialized DNN architecture. While gradient boosting is known to outperform DNNs on tabular data, we close the gap for datasets with 100K+ rows and give DNNs an advantage on small data sets. We extend these results with input-data distillation and optimized ensembling to help DNN performance match or exceed that of gradient boosting. As a theoretical justification of our practical method, we prove its equivalence to classical cross-entropy knowledge distillation. We also qualitatively explain the superiority of DNN ensembles over XGBoost on small data sets. For an industry end-to-end real-time ML platform with 4M production inferences per second, we develop a model-training workflow based on data sampling that distills ensembles of models into a single gradient-boosting model favored for high-performance real-time inference, without performance loss. Empirical evaluation shows that the proposed combination of methods consistently improves model accuracy over prior best models across several production applications deployed worldwide.
翻译:对于表格数据集,我们探索数据和模型蒸馏以及数据分解。这些技术改善了梯度加速模型和专门的 DNN 结构。虽然人们知道梯度推动在表格数据上优于DNNs,但我们缩小了100K+行的数据集差距,使DNNS在小数据集上占有优势。我们通过输入数据蒸馏和优化组合将这些结果扩展为帮助DNN的性能匹配或超过梯度推升的性能。作为我们实际方法的理论依据,我们证明它等同于典型的跨热带知识蒸馏。我们还从质量上解释DNNN的聚合优于小数据集XGBoost的优势。对于工业端到端实时 ML 平台,每秒产生4M 的推论,我们根据数据取样开发一个模型培训工作流程,将模型吸收模型的模型转化为单一的加速模型,有利于高性能实时推导,而不会造成性能损失。Empiricalalal 评估显示,在各种最佳模型上配置了全球范围的组合。</s>