Contextual ranking models based on BERT are now well established for a wide range of passage and document ranking tasks. However, the robustness of BERT-based ranking models under adversarial inputs is under-explored. In this paper, we argue that BERT-rankers are not immune to adversarial attacks targeting retrieved documents given a query. Firstly, we propose algorithms for adversarial perturbation of both highly relevant and non-relevant documents using gradient-based optimization methods. The aim of our algorithms is to add/replace a small number of tokens to a highly relevant or non-relevant document to cause a large rank demotion or promotion. Our experiments show that a small number of tokens can already result in a large change in the rank of a document. Moreover, we find that BERT-rankers heavily rely on the document start/head for relevance prediction, making the initial part of the document more susceptible to adversarial attacks. More interestingly, we find a small set of recurring adversarial words that when added to documents result in successful rank demotion/promotion of any relevant/non-relevant document respectively. Finally, our adversarial tokens also show particular topic preferences within and across datasets, exposing potential biases from BERT pre-training or downstream datasets.
翻译:以BERT为基础的背景排名模型现已为一系列广泛的内容和文件排名任务奠定了良好的基础。然而,在对抗性投入下基于BERT的排名模型的稳健性尚未得到充分探讨。在本文中,我们认为,BERT排名器并非不受针对检索到的文件的对抗性攻击的豁免。首先,我们提出了使用基于梯度的优化方法对高度相关和非相关文件进行对抗性干扰的算法。我们算法的目的是在高度相关或非相关文件上添加/替换少量标语,以导致大规模降级或升级。我们的实验表明,少量标语可能已经导致文件的排名发生重大变化。此外,我们发现,BERT排名器高度依赖文件的起始/头部进行相关预测,使得文件的最初部分更容易受到基于梯度的对抗性攻击。更有意思的是,我们发现,当添加到一个高度相关或非相关文件的排名,导致任何相关/非相关文件的级别降级或升级。最后,我们的对抗性标语可能会导致文件的等级大幅变化。此外,我们的对抗性标语还表明,在风险前的标定数据中,还存在特定的选择。