Phishing emails that appear legitimate lure people into clicking on the attached malicious links or documents. Increasingly more sophisticated phishing campaigns in recent years necessitate a more adaptive detection system other than traditional signature-based methods. In this regard, natural language processing (NLP) with deep neural networks (DNNs) is adopted for knowledge acquisition from a large number of emails. However, such sensitive daily communications containing personal information are difficult to collect on a server for centralized learning in real life due to escalating privacy concerns. To this end, we propose a decentralized phishing email detection method called the Federated Phish Bowl (FPB) leveraging federated learning and long short-term memory (LSTM). FPB allows common knowledge representation and sharing among different clients through the aggregation of trained models to safeguard the email security and privacy. A recent phishing email dataset was collected from an intergovernmental organization to train the model. Moreover, we evaluated the model performance based on various assumptions regarding the total client number and the level of data heterogeneity. The comprehensive experimental results suggest that FPB is robust to a continually increasing client number and various data heterogeneity levels, retaining a detection accuracy of 0.83 and protecting the privacy of sensitive email communications.
翻译:近些年来,更先进的钓鱼运动要求有一个更适应性更强的检测系统,而不是传统的基于签名的方法。在这方面,自然语言处理(NLP)与深层神经网络(DNNS)一起,用于从大量电子邮件获取知识。然而,由于隐私问题加剧,在服务器上很难收集含有个人信息的敏感日常通信,以便在现实生活中集中学习。为此,我们提议采用一种分散式的网钓电子邮件检测方法,称为联邦菲律宾碗(FPB),利用联合学习和长期的短期记忆(LSTM)。FPB允许不同客户通过汇集经过培训的模型共享共同知识并分享知识,以保障电子邮件安全和隐私。最近从一个政府间组织收集了含有个人信息的网络邮件数据集,以培训模型。此外,我们根据对客户总数和数据异质程度的各种假设,评估了模型的性能。全面实验结果表明,FPBB对不断增长的客户数量和各种敏感程度的电子邮件保密性检测,保留了0.8的精确度。