Data Poisoning (DP) is an effective attack that causes trained classifiers to misclassify their inputs. DP attacks significantly degrade a classifier's accuracy by covertly injecting attack samples into the training set. Broadly applicable to different classifier structures, without strong assumptions about the attacker, an {\it unsupervised} Bayesian Information Criterion (BIC)-based mixture model defense against "error generic" DP attacks is herein proposed that: 1) addresses the most challenging {\it embedded} DP scenario wherein, if DP is present, the poisoned samples are an {\it a priori} unknown subset of the training set, and with no clean validation set available; 2) applies a mixture model both to well-fit potentially multi-modal class distributions and to capture poisoned samples within a small subset of the mixture components; 3) jointly identifies poisoned components and samples by minimizing the BIC cost defined over the whole training set, with the identified poisoned data removed prior to classifier training. Our experimental results, for various classifier structures and benchmark datasets, demonstrate the effectiveness and universality of our defense under strong DP attacks, as well as its superiority over other works.
翻译:数据中毒(DP)是一种有效的攻击,导致受过训练的分类人员错误地分类其投入。DP攻击通过秘密注射攻击样品,大大降低分类人员的准确性。 广泛适用于不同分类结构,对攻击者没有严格的假设,而是不加监督的巴伊斯信息标准(BIC)针对“紧急通用”DP攻击提出的基于BIC(BIC)的混合物模型防御建议如下:1)处理最具挑战性的DP假设,即如果DP在场,中毒样品是成套训练中一个未知的先验子,而且没有干净的鉴定组;2)应用混合模型,既可以很好地利用潜在的多式分类分布,又可以在混合物组成部分的一小部分中捕捉有毒样品;3)共同确定有毒成分和样品,尽量减少BIC在整个训练中界定的费用,在分类培训之前删除已查明的有毒数据。对于各种分类结构和基准数据集,我们的实验结果显示我们在强有力的DP攻击下进行防御的有效性和普遍性,以及它优于其他工作。