The success of deep learning depends on large-scale and well-curated training data, while data in real-world applications are commonly long-tailed and noisy. Many methods have been proposed to deal with long-tailed data or noisy data, while a few methods are developed to tackle long-tailed noisy data. To solve this, we propose a robust method for learning from long-tailed noisy data with sample selection and balanced loss. Specifically, we separate the noisy training data into clean labeled set and unlabeled set with sample selection, and train the deep neural network in a semi-supervised manner with a balanced loss based on model bias. Extensive experiments on benchmarks demonstrate that our method outperforms existing state-of-the-art methods.
翻译:深层学习的成功取决于大型和精密的培训数据,而现实应用中的数据通常是长尾和吵闹的。提出了许多处理长尾数据或吵闹数据的方法,同时制定了一些处理长尾数据的方法。为了解决这个问题,我们提出了一个强有力的方法,通过抽样选择和平衡损失,从长尾的吵闹数据中学习。具体地说,我们将吵闹的培训数据分为标签干净的成套培训数据,没有标注的样本选择,以半监督的方式培训深神经网络,以基于模型偏差的平衡损失。 有关基准的广泛实验表明,我们的方法优于现有最先进的方法。