Data poisoning is a type of adversarial attack on training data where an attacker manipulates a fraction of data to degrade the performance of machine learning model. Therefore, applications that rely on external data-sources for training data are at a significantly higher risk. There are several known defensive mechanisms that can help in mitigating the threat from such attacks. For example, data sanitization is a popular defensive mechanism wherein the learner rejects those data points that are sufficiently far from the set of training instances. Prior work on data poisoning defense primarily focused on offline setting, wherein all the data is assumed to be available for analysis. Defensive measures for online learning, where data points arrive sequentially, have not garnered similar interest. In this work, we propose a defense mechanism to minimize the degradation caused by the poisoned training data on a learner's model in an online setup. Our proposed method utilizes an influence function which is a classic technique in robust statistics. Further, we supplement it with the existing data sanitization methods for filtering out some of the poisoned data points. We study the effectiveness of our defense mechanism on multiple datasets and across multiple attack strategies against an online learner.
翻译:数据中毒是对培训数据的一种对抗性攻击,攻击者利用部分数据来降低机器学习模式的性能。因此,依靠外部数据源进行的培训数据应用风险要高得多。有几种已知的防御机制可以帮助减轻这类攻击的威胁。例如,数据净化是一种受欢迎的防御机制,使学习者拒绝那些离培训系统足够远的数据点。以前的数据中毒防御工作主要集中在离线设置上,假设所有数据都可以用于分析。在线学习防御性措施,即数据点按顺序到达,没有引起类似的兴趣。在这项工作中,我们提议了一个防御机制,以尽量减少一个在线设置的学习者模型上有毒的培训数据造成的退化。我们提议的方法使用一种影响功能,即强力统计中的经典技术。此外,我们用现有的数据净化方法来补充它,以过滤一些中毒数据点。我们研究我们的防御机制在多个数据集上的效力,并跨越一个在线学习者的各种攻击战略。