Web-based phishing accounts for over 90% of data breaches, and most web-browsers and security vendors rely on machine-learning (ML) models as mitigation. Despite this, links posted regularly on anti-phishing aggregators such as PhishTank and VirusTotal are shown to easily bypass existing detectors. Prior art suggests that automated website cloning, with light mutations, is gaining traction with attackers. This has limited exposure in current literature and leads to sub-optimal ML-based countermeasures. The work herein conducts the first empirical study that compiles and evaluates a variety of state-of-the-art cloning techniques in wide circulation. We collected 13,394 samples and found 8,566 confirmed phishing pages targeting 4 popular websites using 7 distinct cloning mechanisms. These samples were replicated with malicious code removed within a controlled platform fortified with precautions that prevent accidental access. We then reported our sites to VirusTotal and other platforms, with regular polling of results for 7 days, to ascertain the efficacy of each cloning technique. Results show that no security vendor detected our clones, proving the urgent need for more effective detectors. Finally, we posit 4 recommendations to aid web developers and ML-based defences to alleviate the risks of cloning attacks.
翻译:网上钓鱼占数据破坏的90%以上,大多数网络浏览者和安全供应商依靠机器学习(ML)模型来减缓。尽管如此,在PhishTank和病毒Tatal等反钓鱼聚合器上定期张贴的链接显示很容易绕过现有探测器。前科显示,带有轻突变的自动化网站克隆正在获得攻击者的牵引力。这限制了现有文献的曝光,导致低于最佳ML的对策。本文的工作进行了第一次经验性研究,汇编和评价了广泛流通的各种最先进的克隆技术。我们收集了13 394个样本,发现了8 566个确认的网页,以4个流行网站为目标,使用了7个不同的克隆机制。这些样本被复制为恶意代码,在一个控制平台内去除,有防止意外进入的防范装置。我们随后将我们的网站报告给病毒陶塔尔和其他平台,定期对结果进行7天的检验,以确定每一种克隆技术的功效。结果显示,没有安全供应商检测到我们的克隆技术的克隆技术的功效,证明我们急需的网络防御系统,需要更高效的M。最后,我们测试了对基于克隆的实验室的抗试测。