Large scale image classification datasets often contain noisy labels. We take a principled probabilistic approach to modelling input-dependent, also known as heteroscedastic, label noise in these datasets. We place a multivariate Normal distributed latent variable on the final hidden layer of a neural network classifier. The covariance matrix of this latent variable, models the aleatoric uncertainty due to label noise. We demonstrate that the learned covariance structure captures known sources of label noise between semantically similar and co-occurring classes. Compared to standard neural network training and other baselines, we show significantly improved accuracy on Imagenet ILSVRC 2012 79.3% (+2.6%), Imagenet-21k 47.0% (+1.1%) and JFT 64.7% (+1.6%). We set a new state-of-the-art result on WebVision 1.0 with 76.6% top-1 accuracy. These datasets range from over 1M to over 300M training examples and from 1k classes to more than 21k classes. Our method is simple to use, and we provide an implementation that is a drop-in replacement for the final fully-connected layer in a deep classifier.
翻译:大型图像分类数据集通常含有噪音标签 。 我们对这些数据集中基于输入依赖的建模( 也称为 heteroscedistic ) 标签噪音采取有原则的概率方法 。 我们在神经网络分类器的最后隐藏层放置了多变量正常分布潜伏变量 。 这个潜伏变量的共变式矩阵模型是因标签噪音而导致的偏移不确定性模型 。 我们证明, 学习到的共变结构可以捕捉已知的标签噪音源, 在语义相似和共发类之间。 与标准的神经网络培训和其他基线相比, 我们显示图像网 IMSVRC 2012 79.3% (+2.6%)、 图像网- 21k 47.0% (+1.1 %) 和 JFT 64.7% (+1.6 % ) 的精确度显著提高 。 我们为网络浏览1.0 设定了新的状态艺术结果, 最高为76.6% 的精确度。 这些数据集从1M 到300M 以上的培训示例, 从1k 类到21k 类以上 类。 我们的方法简单使用, 我们提供了一种与深层相连接的版本。