Recent neural-based relation extraction approaches, though achieving promising improvement on benchmark datasets, have reported their vulnerability towards adversarial attacks. Thus far, efforts mostly focused on generating adversarial samples or defending adversarial attacks, but little is known about the difference between normal and adversarial samples. In this work, we take the first step to leverage the salience-based method to analyze those adversarial samples. We observe that salience tokens have a direct correlation with adversarial perturbations. We further find the adversarial perturbations are either those tokens not existing in the training set or superficial cues associated with relation labels. To some extent, our approach unveils the characters against adversarial samples. We release an open-source testbed, "DiagnoseAdv".
翻译:最近基于神经关系的提取方法虽然在基准数据集方面取得了有希望的改进,但报告说它们很容易遭到对抗性攻击。迄今为止,努力的重点大多是生成对抗性样品或防御对抗性攻击,但对正常和对抗性样品之间的区别知之甚少。在这项工作中,我们迈出了第一步,利用基于显著方法分析这些对抗性样品。我们发现,突出的标志与对抗性干扰有直接关系。我们进一步发现,对抗性干扰要么是训练组中不存在的标志,要么是与关系标签有关的表面暗示。在某种程度上,我们的方法暴露了对抗性样品的特征。我们发布了一个公开来源的试样,即“DiagnoseAdv”。