The loss function is arguably among the most important hyperparameters for a neural network. Many loss functions have been designed to date, making a correct choice nontrivial. However, elaborate justifications regarding the choice of the loss function are not made in related work. This is, as we see it, an indication of a dogmatic mindset in the deep learning community which lacks empirical foundation. In this work, we consider deep neural networks in a supervised classification setting and analyze the impact the choice of loss function has onto the training result. While certain loss functions perform suboptimally, our work empirically shows that under-represented losses such as the KL Divergence can outperform the State-of-the-Art choices significantly, highlighting the need to include the loss function as a tuned hyperparameter rather than a fixed choice.
翻译:损失函数可以说是神经网络最重要的超参数之一。 许多损失函数是迄今为止设计出来的, 做出了正确的选择, 而不是三重选择。 但是, 在相关工作中没有详细解释损失函数的选择理由。 正如我们所认为的, 这表明了深层次学习界缺乏经验基础的教条主义心态。 在这项工作中, 我们考虑在受监督的分类设置中深层神经网络, 分析损失函数的选择对培训结果的影响。 虽然某些损失函数具有次优性,但我们的工作从经验上表明, 诸如KL Divergence 等代表不足的损失可以明显超越艺术国家的选择, 强调需要将损失函数作为调整的超参数而不是固定的选择纳入其中。