Label noise will degenerate the performance of deep learning algorithms because deep neural networks easily overfit label errors. Let X and Y denote the instance and clean label, respectively. When Y is a cause of X, according to which many datasets have been constructed, e.g., SVHN and CIFAR, the distributions of P(X) and P(Y|X) are entangled. This means that the unsupervised instances are helpful to learn the classifier and thus reduce the side effect of label noise. However, it remains elusive on how to exploit the causal information to handle the label noise problem. In this paper, by leveraging a structural causal model, we propose a novel generative approach for instance-dependent label-noise learning. In particular, we show that properly modeling the instances will contribute to the identifiability of the label noise transition matrix and thus lead to a better classifier. Empirically, our method outperforms all state-of-the-art methods on both synthetic and real-world label-noise datasets.
翻译:深神经网络很容易覆盖标签错误, 让 X 和 Y 分别表示实例和清洁标签。 当 Y 是 X 的原因之一时, 我们建议一种新型的基因化方法, 用于基于实例的标签学习。 特别是, 我们表明, 适当建模这些实例将有助于识别标签噪声转换矩阵的可识别性, 从而导致更好的分类器。 这表示, 不受监督的事例有助于学习分类器, 从而降低标签噪音的副作用 。 但是, 在如何利用因果信息处理标签噪音问题方面, 仍然难以找到。 在本文中, 我们通过利用结构性因果模型, 提出一种新型的基因化方法, 用于基于实例的标签学习。 特别是, 我们表明, 适当建模这些实例将有助于标签噪声转换矩阵的可识别性, 从而导致更好的分类器。 想象性地说, 我们的方法超越了合成和真实世界标签- 数据集的所有状态方法 。