PhishMatch: 有效检测钓鱼URL的多层方法 (PhishMatch: A Layered Approach for Effective Detection of Phishing URLs)

Phishing attacks continue to be a significant threat on the Internet. Prior studies show that it is possible to determine whether a website is phishing or not just by analyzing its URL more carefully. A major advantage of the URL based approach is that it can identify a phishing website even before the web page is rendered in the browser, thus avoiding other potential problems such as cryptojacking and drive-by downloads. However, traditional URL based approaches have their limitations. Blacklist based approaches are prone to zero-hour phishing attacks, advanced machine learning based approaches consume high resources, and other approaches send the URL to a remote server which compromises user's privacy. In this paper, we present a layered anti-phishing defense, PhishMatch, which is robust, accurate, inexpensive, and client-side. We design a space-time efficient Aho-Corasick algorithm for exact string matching and n-gram based indexing technique for approximate string matching to detect various cybersquatting techniques in the phishing URL. To reduce false positives, we use a global whitelist and personalized user whitelists. We also determine the context in which the URL is visited and use that information to classify the input URL more accurately. The last component of PhishMatch involves a machine learning model and controlled search engine queries to classify the URL. A prototype plugin of PhishMatch, developed for the Chrome browser, was found to be fast and lightweight. Our evaluation shows that PhishMatch is both efficient and effective.

翻译：网上钓鱼攻击仍然是互联网上的一大威胁。先前的研究显示, 有可能确定网站是否正在钓鱼, 而不是仅仅通过更仔细地分析其URL。基于 URL 方法的一大优点是, 即使在浏览器中提供网页之前, 也可以在浏览器中找到网钓网站, 从而避免了诸如窃听和驱动下载等其他潜在问题。但是, 传统的基于 UR 的方法有其局限性。基于黑名单的方法容易发生零小时的钓鱼攻击, 高级机器学习方法消耗大量资源, 其它方法将URL 发送到一个会损害用户隐私的远程服务器。在本文中, 我们展示了一个分层的反钓鱼防御, PhishMatch, 强健、准确、廉价和客户端。我们设计了一个空间- 高效的Aho- Corasick 算法, 用于精确的线串匹配, 以及基于 ng 索引的技术, 以探测网络光线 URL 的多种网络钓鱼技术。为了减少错误的肯定, 我们使用全球白名单和个化用户的用户名化的网络服务器, 也显示一个快速的搜索的搜索。我们的域路路路路路标的检索中, 我们最后找到的检索。