Noisy labels can impair the performance of deep neural networks. To tackle this problem, in this paper, we propose a new method for filtering label noise. Unlike most existing methods relying on the posterior probability of a noisy classifier, we focus on the much richer spatial behavior of data in the latent representational space. By leveraging the high-order topological information of data, we are able to collect most of the clean data and train a high-quality model. Theoretically we prove that this topological approach is guaranteed to collect the clean data with high probability. Empirical results show that our method outperforms the state-of-the-arts and is robust to a broad spectrum of noise types and levels.
翻译:吵闹标签会损害深层神经网络的性能。 为了解决这个问题,我们在本文中提出一种新的过滤标签噪音的方法。 与大多数依靠杂音分类器的事后概率的现有方法不同, 我们侧重于潜在代表空间中数据更丰富的空间行为。 通过利用高阶数据表层信息, 我们能够收集大部分干净数据, 并训练一个高质量的模型。 从理论上说, 我们证明这种表层学方法可以保证以很高的概率收集干净数据。 经验性结果显示, 我们的方法超过了艺术的状态, 并且对广泛的噪音类型和水平非常强大。