Deep learning has outperformed other machine learning algorithms in a variety of tasks, and as a result, it is widely used. However, like other machine learning algorithms, deep learning, and convolutional neural networks (CNNs) in particular, perform worse when the data sets present label noise. Therefore, it is important to develop algorithms that help the training of deep networks and their generalization to noise-free test sets. In this paper, we propose a robust training strategy against label noise, called RAFNI, that can be used with any CNN. This algorithm filters and relabels instances of the training set based on the predictions and their probabilities made by the backbone neural network during the training process. That way, this algorithm improves the generalization ability of the CNN on its own. RAFNI consists of three mechanisms: two mechanisms that filter instances and one mechanism that relabels instances. In addition, it does not suppose that the noise rate is known nor does it need to be estimated. We evaluated our algorithm using different data sets of several sizes and characteristics. We also compared it with state-of-the-art models using the CIFAR10 and CIFAR100 benchmarks under different types and rates of label noise and found that RAFNI achieves better results in most cases.
翻译:深层次的学习比其他机器学习算法要好得多,因此,它被广泛使用。然而,与其他机器学习算法、深层次的学习和进化神经网络(CNNs)一样,当数据集显示标签噪音时,其效果更差。因此,重要的是开发算法,帮助培训深层网络,将其推广到无噪音测试组。在本文中,我们提出了一个强有力的培训战略,防止标签噪音,称为RAFNI,可用于任何CNN。根据主神经网络在培训过程中作出的预测及其概率,这种算法过滤和重新标出的培训实例。这种算法提高了CNN本身的普及能力。RAFNI由三个机制组成:两个机制过滤实例,一个机制重新标出无噪音的测试组。此外,我们并不认为噪音率是已知的,也不需要加以估计。我们用不同大小和特征的不同数据集对我们的算法进行了评估。我们还将它与使用不同类型CIFAR10 和在各种类型下找到的最高级的频率和最高级的ARFAFA模型进行比较。