Most studies on learning from noisy labels rely on unrealistic models of i.i.d. label noise, such as class-conditional transition matrices. More recent work on instance-dependent noise models are more realistic, but assume a single generative process for label noise across the entire dataset. We propose a more principled model of label noise that generalizes instance-dependent noise to multiple labelers, based on the observation that modern datasets are typically annotated using distributed crowdsourcing methods. Under our labeler-dependent model, label noise manifests itself under two modalities: natural error of good-faith labelers, and adversarial labels provided by malicious actors. We present two adversarial attack vectors that more accurately reflect the label noise that may be encountered in real-world settings, and demonstrate that under our multimodal noisy labels model, state-of-the-art approaches for learning from noisy labels are defeated by adversarial label attacks. Finally, we propose a multi-stage, labeler-aware, model-agnostic framework that reliably filters noisy labels by leveraging knowledge about which data partitions were labeled by which labeler, and show that our proposed framework remains robust even in the presence of extreme adversarial label noise.
翻译:有关从噪音标签中学习的多数研究都依赖于不切实际的标签噪音模型,如等级条件过渡矩阵。最近关于以实例为依据的噪音模型的工作比较现实,但在整个数据集中采用单一的标签噪音基因化过程。我们提出了一个更加有原则的标签噪音模型,将依赖环境的噪音普遍化为多标签者,其依据的观察是,现代数据集通常使用分布式众包方法附加说明。根据我们的标签依赖模式,标签噪音在两种模式下表现为可靠过滤器:善意标签的自然错误和恶意行为者提供的对抗性标签。我们提出了两种对抗性攻击矢量,更准确地反映了在现实世界环境中可能遇到的标签噪音,并表明在我们多式噪音标签模型下,从噪音标签中学习的状态艺术方法被对抗性标签攻击所击败。最后,我们提议了一个多阶段、标签人觉察觉、模异性框架,通过利用关于标签标签所标明的数据间隔点的知识,可靠地标出噪音标签。