There are inevitably many mislabeled data in real-world datasets. Because deep neural networks (DNNs) have an enormous capacity to memorize noisy labels, a robust training scheme is required to prevent labeling errors from degrading the generalization performance of DNNs. Current state-of-the-art methods present a co-training scheme that trains dual networks using samples associated with small losses. In practice, however, training two networks simultaneously can burden computing resources. In this study, we propose a simple yet effective robust training scheme that operates by training only a single network. During training, the proposed method generates temporal self-ensemble by sampling intermediate network parameters from the weight trajectory formed by stochastic gradient descent optimization. The loss sum evaluated with these self-ensembles is used to identify incorrectly labeled samples. In parallel, our method generates multi-view predictions by transforming an input data into various forms and considers their agreement to identify incorrectly labeled samples. By combining the aforementioned metrics, we present the proposed {\it self-ensemble-based robust training} (SRT) method, which can filter the samples with noisy labels to reduce their influence on training. Experiments on widely-used public datasets demonstrate that the proposed method achieves a state-of-the-art performance in some categories without training the dual networks.
翻译:由于深神经网络(DNNs)具有巨大的能力,可以记住噪音标签,因此需要一个强有力的培训计划来防止标签错误降低DNNs的一般性能。目前最先进的方法是一个共同培训计划,利用与小额损失有关的样本培训双重网络。但在实践中,培训两个网络同时可以负担计算资源。在这项研究中,我们提出了一个简单而有效的强健的培训计划,仅通过培训一个单一网络运作。在培训期间,拟议方法通过从随机梯度梯度下层优化形成的重量轨迹中取样中间网络参数,产生时间性自我聚合。用这些自我集合评估的损失总和用来识别错误的标签样本。与此同时,我们的方法通过将输入数据转换为不同形式,并考虑它们是否同意识别错误的标签样本。我们通过将上述指标合并,提出拟议的“基于自我考虑的可靠培训”方法,通过抽取中间网络参数来产生时间性的自我聚合值。这种方法可以将标注的标本与热度测试方法进行筛选,从而在公众培训中不广泛地展示其精确性能。