Choosing a suitable loss function is essential when learning by empirical risk minimisation. In many practical cases, the datasets used for training a classifier may contain incorrect labels, which prompts the interest for using loss functions that are inherently robust to label noise. In this paper, we study the Fisher-Rao loss function, which emerges from the Fisher-Rao distance in the statistical manifold of discrete distributions. We derive an upper bound for the performance degradation in the presence of label noise, and analyse the learning speed of this loss. Comparing with other commonly used losses, we argue that the Fisher-Rao loss provides a natural trade-off between robustness and training dynamics. Numerical experiments with synthetic and MNIST datasets illustrate this performance.
翻译:在通过实验风险最小化来学习时,选择适当的损失功能至关重要。 在许多实际情况下,用于培训分类员的数据集可能包含不正确的标签,这促使人们有兴趣使用对噪音具有内在活力的损失功能。 在本文中,我们研究了离散分布的统计方方面面的Fisher-Rao距离产生的Fisher-Rao损失功能。我们在标签噪音面前获得性能退化的上限,并分析这一损失的学习速度。与通常使用的其他损失相比,我们争辩说,Fisher-Rao损失提供了稳健性与培训动态之间的自然权衡。 合成和MNIST数据集的数值实验证明了这一表现。