Machine learning models trained on data from the outside world can be corrupted by data poisoning attacks that inject malicious points into the models' training sets. A common defense against these attacks is data sanitization: first filter out anomalous training points before training the model. In this paper, we develop three attacks that can bypass a broad range of common data sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition. By adding just 3% poisoned data, our attacks successfully increase test error on the Enron spam detection dataset from 3% to 24% and on the IMDB sentiment classification dataset from 12% to 29%. In contrast, existing attacks which do not explicitly account for these data sanitization defenses are defeated by them. Our attacks are based on two ideas: (i) we coordinate our attacks to place poisoned points near one another, and (ii) we formulate each attack as a constrained optimization problem, with constraints designed to ensure that the poisoned points evade detection. As this optimization involves solving an expensive bilevel problem, our three attacks correspond to different ways of approximating this problem, based on influence functions; minimax duality; and the Karush-Kuhn-Tucker (KKT) conditions. Our results underscore the need to develop more robust defenses against data poisoning attacks.
翻译:根据外部世界的数据而培训的机器学习模型可能会被数据中毒袭击腐蚀,这些袭击将恶意点输入模型的培训组。对这些袭击的共同防御是数据净化:首先在培训模型之前过滤异常训练点。在本文中,我们开发了三次袭击,这些袭击可以绕过广泛的共同数据净化防御系统,包括基于近邻的异常探测器、培训损失和单值分解。通过仅仅添加3%的中毒数据,我们的袭击成功地增加了Enron垃圾检测数据集的测试错误,从3%增加到24 %,以及IMDB情绪分类数据集的测试错误从12%增加到29 %。相比之下,现有的袭击没有明确说明这些数据净化防御系统的异常训练点被它们击败。我们的袭击基于两个想法:(一)我们协调我们的攻击,将中毒点放在另一个附近,培训损失和单值分解分解。我们把每一次袭击都作为有限的优化问题,以确保中毒点逃避检测。由于这种优化涉及解决昂贵的双级问题,我们三次袭击都对应了对稳定度防御系统进行的不同方法。基于双级安全度的双重攻击,我们需要。