Noisy labels are unavoidable yet troublesome in the ecosystem of deep learning because models can easily overfit them. There are many types of label noise, such as symmetric, asymmetric and instance-dependent noise (IDN), with IDN being the only type that depends on image information. Such dependence on image information makes IDN a critical type of label noise to study, given that labelling mistakes are caused in large part by insufficient or ambiguous information about the visual classes present in images. Aiming to provide an effective technique to address IDN, we present a new graphical modelling approach called InstanceGM, that combines discriminative and generative models. The main contributions of InstanceGM are: i) the use of the continuous Bernoulli distribution to train the generative model, offering significant training advantages, and ii) the exploration of a state-of-the-art noisy-label discriminative classifier to generate clean labels from instance-dependent noisy-label samples. InstanceGM is competitive with current noisy-label learning approaches, particularly in IDN benchmarks using synthetic and real-world datasets, where our method shows better accuracy than the competitors in most experiments.
翻译:在深层学习的生态系统中,噪音标签是不可避免的,但是由于模型可以轻易地取代它们,所以在深层学习的生态系统中,噪音是不可避免的,但是很麻烦。有许多类型的标签噪音,例如对称、不对称和以实例为依存的噪音(IDN),而IDN是唯一依赖图像信息的类型。这种对图像信息的依赖使IDN成为需要研究的标签噪音的一种关键类型,因为标签错误在很大程度上是由关于图像中存在的视觉类别的信息不足或模糊造成的。为了提供解决IDN问题的有效技术,我们提出了一个新的图形建模方法,称为DienceGM,将歧视性和基因模型结合起来。CigencyGM的主要贡献是:i)使用连续的Bernoulli发行来培训基因模型,提供重要的培训优势,以及ii)探索一个最先进的噪音标签歧视性分类器,以便从以实例为根据的噪音标签样本中产生干净的标签。为了提供有效的技术,在目前使用合成和真实世界数据集的IDN基准中,特别是使用数字标签学习方法比竞争。我们的方法在大多数实验中比竞争得更准确。