Neural ranking models (NRMs) have shown remarkable success in recent years, especially with pre-trained language models. However, deep neural models are notorious for their vulnerability to adversarial examples. Adversarial attacks may become a new type of web spamming technique given our increased reliance on neural information retrieval models. Therefore, it is important to study potential adversarial attacks to identify vulnerabilities of NRMs before they are deployed. In this paper, we introduce the Word Substitution Ranking Attack (WSRA) task against NRMs, which aims to promote a target document in rankings by adding adversarial perturbations to its text. We focus on the decision-based black-box attack setting, where the attackers cannot directly get access to the model information, but can only query the target model to obtain the rank positions of the partial retrieved list. This attack setting is realistic in real-world search engines. We propose a novel Pseudo Relevance-based ADversarial ranking Attack method (PRADA) that learns a surrogate model based on Pseudo Relevance Feedback (PRF) to generate gradients for finding the adversarial perturbations. Experiments on two web search benchmark datasets show that PRADA can outperform existing attack strategies and successfully fool the NRM with small indiscernible perturbations of text.
翻译:近年来,神经等级模型(NRM)表现出显著的成功,特别是在经过培训的语言模型方面。但是,深神经模型因其易受对抗性实例的影响而臭名昭著。由于我们日益依赖神经信息检索模型,反向攻击可能成为一种新型的网络垃圾技术。因此,必须研究潜在的对抗性攻击,在部署之前先确定NRM的脆弱性。在本文中,我们提出了针对NRM的基于WSRA的以WSRA为基调的顶级攻击(WSRA)任务,目的是通过在文本中添加对抗性干扰来在排名中促进一个目标文件。我们侧重于基于决定的黑箱攻击设置,攻击者无法直接获得模型信息,但只能查询目标模型以获得部分检索清单的等级位置。这种攻击设置在现实世界搜索引擎中是现实的。我们提出了一个新的基于Pseudo 相关性的顶级攻击攻击方法(PRADADA),该方法的目的是通过Pseurdo ime reference Commactive(PRF)来生成一个基于PERF)的代位模型模型模型,用以生成可调控性攻击性攻击性战略的梯度,以便在现有的RADRDRDSBSBSDM数据库中用现有数据库中找到可测试性攻击性测试性攻击性战略。