All machine learning algorithms use a loss, cost, utility or reward function to encode the learning objective and oversee the learning process. This function that supervises learning is a frequently unrecognized hyperparameter that determines how incorrect outputs are penalized and can be tuned to improve performance. This paper shows that training speed and final accuracy of neural networks can significantly depend on the loss function used to train neural networks. In particular derivative values can be significantly different with different loss functions leading to significantly different performance after gradient descent based Backpropagation (BP) training. This paper explores the effect on performance of new loss functions that are more liberal or strict compared to the popular Cross-entropy loss in penalizing incorrect outputs. Eight new loss functions are proposed and a comparison of performance with different loss functions is presented. The new loss functions presented in this paper are shown to outperform Cross-entropy loss on computer vision and NLP benchmarks.
翻译:所有的机器学习算法都使用损失、代价、效用或奖励函数来编码学习目标并监督学习过程。这个监督学习的函数是一个经常被忽视的超参数,它决定了如何惩罚不正确的输出,并可以调节以改善性能。本文表明,神经网络的训练速度和最终准确性在很大程度上取决于用于训练的损失函数。尤其是,不同损失函数的导数值可能会显著不同,导致梯度下降的反向传播(BP)训练后的性能显著不同。本文探讨了一些新的损失函数对性能的影响,这些损失函数在惩罚不正确的输出方面比流行的交叉熵损失更自由或更严格。提出了八个新的损失函数,并给出了使用不同损失函数的性能比较。本文提出的新损失函数在计算机视觉和自然语言处理基准测试中表现优于交叉熵损失。