Deep neural networks (DNNs) suffer from noisy-labeled data because of the risk of overfitting. To avoid the risk, in this paper, we propose a novel DNN training method with sample selection based on adaptive k-set selection, which selects k (< n) clean sample candidates from the whole n noisy training samples at each epoch. It has a strong advantage of guaranteeing the performance of the selection theoretically. Roughly speaking, a regret, which is defined by the difference between the actual selection and the best selection, of the proposed method is theoretically bounded, even though the best selection is unknown until the end of all epochs. The experimental results on multiple noisy-labeled datasets demonstrate that our sample selection strategy works effectively in the DNN training; in fact, the proposed method achieved the best or the second-best performance among state-of-the-art methods, while requiring a significantly lower computational cost. The code is available at https://github.com/songheony/TAkS.
翻译:深神经网络(DNNs)由于存在过度装配的风险,因此受到杂乱标签的数据。 为了避免这种风险,我们在本文件中提出一种新的DNN培训方法,根据适应性 kset 选择样本选择样本,从每个时代整个吵闹的培训样本中选择 k ( < n) 清洁样本候选人,这在理论上保证选择的绩效方面有很大的优势。 粗略地说,对拟议方法的遗憾(以实际选择与最佳选择之间的差别来界定)在理论上是有约束的,即使最佳选择直到所有时代结束为止都未知。 多个噪音标签数据集的实验结果表明,我们的样本选择战略在DNN培训中有效; 事实上,拟议方法在最新方法中实现了最佳或次优的绩效,同时要求大幅降低计算成本。 代码可在 https://github.com/songheony/TakS网站上查阅。