Despite the widespread use of Knowledge Graph Embeddings (KGE), little is known about the security vulnerabilities that might disrupt their intended behaviour. We study data poisoning attacks against KGE models for link prediction. These attacks craft adversarial additions or deletions at training time to cause model failure at test time. To select adversarial deletions, we propose to use the model-agnostic instance attribution methods from Interpretable Machine Learning, which identify the training instances that are most influential to a neural model's predictions on test instances. We use these influential triples as adversarial deletions. We further propose a heuristic method to replace one of the two entities in each influential triple to generate adversarial additions. Our experiments show that the proposed strategies outperform the state-of-art data poisoning attacks on KGE models and improve the MRR degradation due to the attacks by up to 62% over the baselines.
翻译:尽管广泛使用知识图嵌入器(KGE),但对于可能破坏其预期行为的安全弱点知之甚少。我们研究了对KGE模型进行数据中毒袭击的数据,以进行连接预测。这些攻击是在训练时进行对抗性补充或删除的,目的是在试验时造成模型失败。我们建议使用来自解释机器学习的模型-不可知实例归属方法,这种方法确定了对神经模型对试验实例预测最有影响力的培训实例。我们用这些有影响力的三重模型作为对抗性删除。我们进一步提出了替换每个有影响力的三重模型中两个实体中的一个的超常方法,以产生对抗性补充。我们的实验表明,拟议的战略比对KGE模型进行的最新数据中毒攻击效果要强,并且改进了由于基线超过62%的攻击而导致的MRR退化。