In a poisoning attack, an adversary with control over a small fraction of the training data attempts to select that data in a way that induces a corrupted model that misbehaves in favor of the adversary. We consider poisoning attacks against convex machine learning models and propose an efficient poisoning attack designed to induce a specified model. Unlike previous model-targeted poisoning attacks, our attack comes with provable convergence to {\it any} attainable target classifier. The distance from the induced classifier to the target classifier is inversely proportional to the square root of the number of poisoning points. We also provide a lower bound on the minimum number of poisoning points needed to achieve a given target classifier. Our method uses online convex optimization, so finds poisoning points incrementally. This provides more flexibility than previous attacks which require a priori assumption about the number of poisoning points. Our attack is the first model-targeted poisoning attack that provides provable convergence for convex models, and in our experiments, it either exceeds or matches state-of-the-art attacks in terms of attack success rate and distance to the target model.
翻译:在一次中毒袭击中,一个控制培训数据一小部分的对手试图选择这些数据,这种选择的方式是诱发一种有利于对手的腐败模式。我们考虑对Convex机器学习模型进行中毒袭击,并提出一种旨在诱导特定模型的有效中毒袭击。与以往的针对模型的中毒袭击不同,我们的袭击与可实现的目标分类器有明显的趋同性。从诱导分类器到目标分类器的距离与中毒点数的平方根成反比。我们还提供了实现特定目标分类器所需的最低中毒点的下限。我们的方法使用在线Convex优化,从而逐渐发现中毒点。这比以前的袭击具有更大的灵活性,需要预先假定中毒点的数量。我们的袭击是第一个针对模型的中毒袭击,为Convex模型和我们的实验中,它要么在攻击成功率和距离方面超过,要么与最先进的袭击相匹配。