Data Poisoning (DP) is an effective attack that causes trained classifiers to misclassify their inputs.DP attacks significantly degrade a classifier's accuracy by covertly injecting attack samples into the training set. Broadly applicable to different classifier structures, without strong assumptions about the attacker, we herein propose a novel Bayesian Information Criterion (BIC)-based mixture model defense against DP attacks that: 1) applies a mixture model both to well-fit potentially multi-modal class distributions and to capture adversarial samples within a small subset of mixture components; 2) jointly identifies poisoned components and samples by minimizing the BIC cost over all classes, with the identified poisoned data removed prior to classifier training. Our experimental results, for various classifier structures, demonstrate the effectiveness and universality of our defense under strong DP attacks, as well as the superiority over other works.
翻译:数据中毒(DP)是一种有效的攻击,导致受过训练的分类人员错误地分类其投入。DP攻击通过在训练中隐性注射攻击样品,大大降低分类人员的准确性。广泛适用于不同的分类结构,对攻击者不作强烈的假设,我们在此提出一个新的贝耶斯信息标准(BIC)的混合模型,以抵御DP攻击:1)采用混合模型,既可以很好地适用潜在的多式分类分布,也可以在一小部分混合物成分中捕捉对立样品;2)联合查明有毒成分和样品,将BIC成本降到所有类别,并在分类培训前删除已查明的有毒数据。我们对各种分类结构的实验结果表明,在强有力的DP攻击下,我们防御的有效性和普遍性,以及优于其他工程的优势。