Semi-supervised machine learning models learn from a (small) set of labeled training examples, and a (large) set of unlabeled training examples. State-of-the-art models can reach within a few percentage points of fully-supervised training, while requiring 100x less labeled data. We study a new class of vulnerabilities: poisoning attacks that modify the unlabeled dataset. In order to be useful, unlabeled datasets are given strictly less review than labeled datasets, and adversaries can therefore poison them easily. By inserting maliciously-crafted unlabeled examples totaling just 0.1% of the dataset size, we can manipulate a model trained on this poisoned dataset to misclassify arbitrary examples at test time (as any desired label). Our attacks are highly effective across datasets and semi-supervised learning methods. We find that more accurate methods (thus more likely to be used) are significantly more vulnerable to poisoning attacks, and as such better training methods are unlikely to prevent this attack. To counter this we explore the space of defenses, and propose two methods that mitigate our attack.
翻译:半受监督的机器学习模型从一组(小型)标签培训实例和一组(大)未标签培训实例中学习。 最先进的模型可以在完全监督的培训中达到几个百分点, 同时需要100x较少标签数据。 我们研究一种新的脆弱性类别: 中毒袭击, 修改未标签数据集。 为了有用, 未标签的数据集比标签的数据集得到严格较少的审查, 因此对手可以轻易毒死它们。 通过插入恶意制作的未标签实例, 总计仅为数据集的0.1%, 我们可以操纵一个在这种有毒数据集上训练过的模型, 在测试时间( 任何想要的标签) 错误地分类任意实例。 我们的攻击非常有效地跨越数据集和半监控的学习方法。 我们发现, 更准确的方法( 更有可能被使用 ) 更容易中毒袭击, 这样更好的培训方法是不可能防止这种攻击的。 为了抵制这种攻击, 我们探索了防御空间, 并提出了两个减轻攻击的方法 。