Learning with noisy labels (LNL) is challenging as the model tends to memorize noisy labels, which can lead to overfitting. Many LNL methods detect clean samples by maximizing the similarity between samples in each category, which does not make any assumptions about likely noise sources. However, we often have some knowledge about the potential source(s) of noisy labels. For example, an image mislabeled as a cheetah is more likely a leopard than a hippopotamus due to their visual similarity. Thus, we introduce a new task called Learning with Noisy Labels and noise source distribution Knowledge (LNL+K), which assumes we have some knowledge about likely source(s) of label noise that we can take advantage of. By making this presumption, methods are better equipped to distinguish hard negatives between categories from label noise. In addition, this enables us to explore datasets where the noise may represent the majority of samples, a setting that breaks a critical premise of most methods developed for the LNL task. We explore several baseline LNL+K approaches that integrate noise source knowledge into state-of-the-art LNL methods across three diverse datasets and three types of noise, where we report a 5-15% boost in performance compared with the unadapted methods. Critically, we find that LNL methods do not generalize well in every setting, highlighting the importance of directly exploring our LNL+K task.
翻译:暂无翻译