Label noise in multiclass classification is a major obstacle to the deployment of learning systems. However, unlike the widely used class-conditional noise (CCN) assumption that the noisy label is independent of the input feature given the true label, label noise in real-world datasets can be aleatory and heavily dependent on individual instances. In this work, we investigate the instance-dependent noise (IDN) model and propose an efficient approximation of IDN to capture the instance-specific label corruption. Concretely, noting the fact that most columns of the IDN transition matrix have only limited influence on the class-posterior estimation, we propose a variational approximation that uses a single-scalar confidence parameter. To cope with the situation where the mapping from the instance to its confidence value could vary significantly for two adjacent instances, we suggest using instance embedding that assigns a trainable parameter to each instance. The resulting instance-confidence embedding (ICE) method not only performs well under label noise but also can effectively detect ambiguous or mislabeled instances. We validate its utility on various image and text classification tasks.
翻译:多级分类中的标签噪音是部署学习系统的主要障碍。然而,与广泛使用的等级条件噪音(CCN)不同,即噪音标签与真实标签中输入特征无关,真实世界数据集中的标签噪音可能是易散的,严重依赖个别情况。在这项工作中,我们调查了以实例为基础的噪音(IDN)模式,并提议IDN的高效近似以捕捉具体实例标签腐败。具体地说,注意到IDN过渡矩阵的大多数栏目对等级前置估计的影响有限,我们建议使用单一比例信任参数来显示变化近似。为了应对从实例到其信任值在两个相邻情况下可能大不相同的情况,我们建议使用实例嵌入为每个实例指定一个可培训参数。由此产生的实例信任嵌入方法不仅在标签噪音下运行良好,而且能够有效检测模糊或错误标签。我们验证了它对于各种图像和文本分类任务的实用性。