Nearly all practical neural models for classification are trained using cross-entropy loss. Yet this ubiquitous choice is supported by little theoretical or empirical evidence. Recent work (Hui & Belkin, 2020) suggests that training using the (rescaled) square loss is often superior in terms of the classification accuracy. In this paper we propose the "squentropy" loss, which is the sum of two terms: the cross-entropy loss and the average square loss over the incorrect classes. We provide an extensive set of experiments on multi-class classification problems showing that the squentropy loss outperforms both the pure cross entropy and rescaled square losses in terms of the classification accuracy. We also demonstrate that it provides significantly better model calibration than either of these alternative losses and, furthermore, has less variance with respect to the random initialization. Additionally, in contrast to the square loss, squentropy loss can typically be trained using exactly the same optimization parameters, including the learning rate, as the standard cross-entropy loss, making it a true "plug-and-play" replacement. Finally, unlike the rescaled square loss, multiclass squentropy contains no parameters that need to be adjusted.
翻译:几乎所有实际的分类神经模型都是使用跨热带损失来培训的。然而,这种无处不在的选择得到很少理论或经验证据的支持。最近的工作(Hui & Belkin, 2020)表明,使用(重新计算的)平方损失进行的培训在分类准确性方面往往优于(调整的)平方损失。在本文件中,我们提议了 " 稀释性”损失,这是两个条件的总和:跨热带损失和不正确的类别的平均平方损失。我们提供了一套关于多级分类问题的广泛实验,表明纯交叉类损失和重新标定的平方损失在分类准确性方面都比纯交叉类的和重新标定的平方损失都好。我们还表明,它所提供的模型校准大大优于这些替代损失中的任何一个,而且与随机初始化相比,差异较小。此外,与平方损失相比,昆特罗性损失通常可以使用完全相同的优化参数来培训,包括学习率,作为标准的跨热带损失,使其成为真正的“插入和摆式”真正替换。最后,与重新标定的平方损失参数不同。