Neural text ranking models have witnessed significant advancement and are increasingly being deployed in practice. Unfortunately, they also inherit adversarial vulnerabilities of general neural models, which have been detected but remain underexplored by prior studies. Moreover, the inherit adversarial vulnerabilities might be leveraged by blackhat SEO to defeat better-protected search engines. In this study, we propose an imitation adversarial attack on black-box neural passage ranking models. We first show that the target passage ranking model can be transparentized and imitated by enumerating critical queries/candidates and then train a ranking imitation model. Leveraging the ranking imitation model, we can elaborately manipulate the ranking results and transfer the manipulation attack to the target ranking model. For this purpose, we propose an innovative gradient-based attack method, empowered by the pairwise objective function, to generate adversarial triggers, which causes premeditated disorderliness with very few tokens. To equip the trigger camouflages, we add the next sentence prediction loss and the language model fluency constraint to the objective function. Experimental results on passage ranking demonstrate the effectiveness of the ranking imitation attack model and adversarial triggers against various SOTA neural ranking models. Furthermore, various mitigation analyses and human evaluation show the effectiveness of camouflages when facing potential mitigation approaches. To motivate other scholars to further investigate this novel and important problem, we make the experiment data and code publicly available.
翻译:不幸的是,它们还继承了一般神经模型的对抗性弱点,这些模型已经检测出来,但先前的研究仍未充分探讨。此外,黑帽子SEO可能会利用继承的对抗性弱点来击败保护更好的搜索引擎。在本研究中,我们提议对黑盒神经通过排名模型进行模拟性对抗性攻击,对黑盒神经通过排名模型进行模拟性攻击。我们首先显示,通过列举关键查询/提名,可以透明地和模仿目标排位排名模式,然后训练一个类比模型。利用排名模拟模型,我们可以精心调整排名结果,并将操纵攻击转移到目标排名模式。为此目的,我们提议一种创新的基于梯度的攻击方法,由对称目标功能授权,以产生对抗性触发性攻击性攻击,造成预谋性混乱,只有极少的象征性。我们首先展示触发性迷彩,再将下一个句的预测损失和语言模型的宽度限制添加到目标功能。在排序时,实验排名的实验结果展示了排序攻击性模型的有效性和操纵性攻击性攻击目标排名模型,同时展示了面临其他软度模型的重要模型。