Poisoning attacks are a category of adversarial machine learning threats in which an adversary attempts to subvert the outcome of the machine learning systems by injecting crafted data into training data set, thus increasing the machine learning model's test error. The adversary can tamper with the data feature space, data labels, or both, each leading to a different attack strategy with different strengths. Various detection approaches have recently emerged, each focusing on one attack strategy. The Achilles heel of many of these detection approaches is their dependence on having access to a clean, untampered data set. In this paper, we propose CAE, a Classification Auto-Encoder based detector against diverse poisoned data. CAE can detect all forms of poisoning attacks using a combination of reconstruction and classification errors without having any prior knowledge of the attack strategy. We show that an enhanced version of CAE (called CAE+) does not have to employ a clean data set to train the defense model. Our experimental results on three real datasets MNIST, Fashion-MNIST and CIFAR demonstrate that our proposed method can maintain its functionality under up to 30% contaminated data and help the defended SVM classifier to regain its best accuracy.
翻译:中毒袭击是一种对抗性机器学习威胁,敌对方试图通过将手工制作的数据输入培训数据集来破坏机器学习系统的结果,从而增加机器学习模型的测试错误。对手可以篡改数据特征空间、数据标签或两者兼而有之,从而导致不同的攻击战略。最近出现了各种探测方法,每个方法都侧重于一个攻击战略。许多这类探测方法的致命性之极在于它们依赖获得一个干净、未经复制的数据集。在本文件中,我们提议CAE,一个基于分类自动电解器的检测器,以对抗多种有毒数据。CAE,一个基于分类自动电解的检测器,可以使用重建和分类错误的组合来探测所有形式的中毒袭击,而不必事先对攻击战略有任何了解。我们表明,一个强化的CAE(称为CAE+)不需要使用干净的数据集来训练防御模型。我们关于三个真实数据集MNIST、Fashaon-MNIST和CIFAR的实验结果表明,我们提议的方法可以将其功能保持在30%污染数据之下,并帮助防御的SVGRIA的精确度恢复。