The rise in phishing attacks via e-mail and short message service (SMS) has not slowed down at all. The first thing we need to do to combat the ever-increasing number of phishing attacks is to collect and characterize more phishing cases that reach end users. Without understanding these characteristics, anti-phishing countermeasures cannot evolve. In this study, we propose an approach using Twitter as a new observation point to immediately collect and characterize phishing cases via e-mail and SMS that evade countermeasures and reach users. Specifically, we propose CrowdCanary, a system capable of structurally and accurately extracting phishing information (e.g., URLs and domains) from tweets about phishing by users who have actually discovered or encountered it. In our three months of live operation, CrowdCanary identified 35,432 phishing URLs out of 38,935 phishing reports. We confirmed that 31,960 (90.2%) of these phishing URLs were later detected by the anti-virus engine, demonstrating that CrowdCanary is superior to existing systems in both accuracy and volume of threat extraction. We also analyzed users who shared phishing threats by utilizing the extracted phishing URLs and categorized them into two distinct groups - namely, experts and non-experts. As a result, we found that CrowdCanary could collect information that is specifically included in non-expert reports, such as information shared only by the company brand name in the tweet, information about phishing attacks that we find only in the image of the tweet, and information about the landing page before the redirect.
翻译:网络钓鱼攻击通过电子邮件和短信服务(SMS)的形式从未放缓,我们需要立即收集和描述达到终端用户的网络钓鱼案例,这是对抗日益增长的网络钓鱼攻击的第一步。没有了解这些特征,反网络钓鱼措施无法发展。在此研究中,我们提出了一种使用Twitter作为新的观测点通过收集由专家和非专家发现或遭遇的电子邮件和短信服务的网络钓鱼案例并对其进行数量和特征化描述的方法。具体而言,我们提出了CrowdCanary系统,该系统能够从关于网络钓鱼的推文中结构化而准确地提取网络钓鱼信息(如URL和域名)。在三个月的现场操作中,CrowdCanary从38935个网络钓鱼报告中识别出35432个钓鱼URL。我们确认31,960(90.2%)的这些网络钓鱼URL后来被反病毒引擎检测到,从而证明了CrowdCanary在威胁提取的准确性和数量方面优于现有系统。我们还利用提取出的钓鱼URL分析了分享网络钓鱼威胁的用户,并将其分类为两个不同的组-专家和非专家。结果发现,CrowdCanary能够收集仅包括Tweet中公司品牌名称的非专家报告中的信息、仅在Tweet图片中发现的有关网络钓鱼攻击的信息以及有关重定向前着陆页面的信息。