Although machine learning based algorithms have been extensively used for detecting phishing websites, there has been relatively little work on how adversaries may attack such "phishing detectors" (PDs for short). In this paper, we propose a set of Gray-Box attacks on PDs that an adversary may use which vary depending on the knowledge that he has about the PD. We show that these attacks severely degrade the effectiveness of several existing PDs. We then propose the concept of operation chains that iteratively map an original set of features to a new set of features and develop the "Protective Operation Chain" (POC for short) algorithm. POC leverages the combination of random feature selection and feature mappings in order to increase the attacker's uncertainty about the target PD. Using 3 existing publicly available datasets plus a fourth that we have created and will release upon the publication of this paper, we show that POC is more robust to these attacks than past competing work, while preserving predictive performance when no adversarial attacks are present. Moreover, POC is robust to attacks on 13 different classifiers, not just one. These results are shown to be statistically significant at the p < 0.001 level.
翻译:虽然基于机器学习的算法被广泛用于探测网钓网站,但相对而言,关于对手如何攻击这种“钓鱼探测器”的工作相对较少。在本文中,我们提议一套针对PD的灰色反弹攻击,对手可以使用这些攻击,视他对PD的了解程度不同而有所不同。我们表明,这些攻击严重降低了现有几个PD的效力。我们然后提出操作链的概念,即迭接地为一套新的特征绘制原始特征图,并开发“保护操作链(短)算法 ” 。POC利用随机地物选择和地物绘图的组合,以增加攻击者对目标PD的不确定性。使用3个现有的公开数据集加上我们制作的第四套数据,在发表本文时,我们显示POC对这些攻击比以往的竞争工作更强大,同时在没有对抗性攻击时保持预测性性能。此外,POC对13个不同的分类器的攻击是强大的,而不只是一个。这些结果在统计学上显示:在PDRA级别上具有重大意义。