Deep Neural Networks (DNNs) have been shown to be susceptible to memorization or overfitting in the presence of noisily-labelled data. For the problem of robust learning under such noisy data, several algorithms have been proposed. A prominent class of algorithms rely on sample selection strategies wherein, essentially, a fraction of samples with loss values below a certain threshold are selected for training. These algorithms are sensitive to such thresholds, and it is difficult to fix or learn these thresholds. Often, these algorithms also require information such as label noise rates which are typically unavailable in practice. In this paper, we propose an adaptive sample selection strategy that relies only on batch statistics of a given mini-batch to provide robustness against label noise. The algorithm does not have any additional hyperparameters for sample selection, does not need any information on noise rates and does not need access to separate data with clean labels. We empirically demonstrate the effectiveness of our algorithm on benchmark datasets.
翻译:深神经网络(DNNS)被证明容易在有传闻标签的数据的情况下进行记忆化或超配。对于在如此吵闹的数据下进行强力学习的问题,提出了几种算法。一种突出的算法依靠抽样选择策略,其中基本上选择了一小部分损失值低于某一阈值的样本来进行培训。这些算法对这种阈值很敏感,很难固定或了解这些阈值。这些算法通常也需要一些信息,例如标签噪音率等,而这种费率通常在实际中是不存在的。在本文中,我们建议采用适应性抽样选择策略,仅依靠某一小型批量的批次统计,以提供抵御标签噪音的稳健性。这种算法不需要任何关于噪音率的任何信息,也不需要使用有清洁标签的单独数据。我们从经验上展示了基准数据集的算法的有效性。