基于联邦学习和LSTM的隐私保护钓鱼电子邮件探测 (Privacy-Preserving Phishing Email Detection Based on Federated Learning and LSTM)

Phishing emails that appear legitimate lure people into clicking on the attached malicious links or documents. Increasingly more sophisticated phishing campaigns in recent years necessitate a more adaptive detection system other than traditional signature-based methods. In this regard, natural language processing (NLP) with deep neural networks (DNNs) is adopted for knowledge acquisition from a large number of emails. However, such sensitive daily communications containing personal information are difficult to collect on a server for centralized learning in real life due to escalating privacy concerns. To this end, we propose a decentralized phishing email detection method called the Federated Phish Bowl (FPB) leveraging federated learning and long short-term memory (LSTM). FPB allows common knowledge representation and sharing among different clients through the aggregation of trained models to safeguard the email security and privacy. A recent phishing email dataset was collected from an intergovernmental organization to train the model. Moreover, we evaluated the model performance based on various assumptions regarding the total client number and the level of data heterogeneity. The comprehensive experimental results suggest that FPB is robust to a continually increasing client number and various data heterogeneity levels, retaining a detection accuracy of 0.83 and protecting the privacy of sensitive email communications.

翻译：近些年来,更先进的钓鱼运动要求有一个更适应性更强的检测系统,而不是传统的基于签名的方法。在这方面,自然语言处理(NLP)与深层神经网络(DNNS)一起,用于从大量电子邮件获取知识。然而,由于隐私问题加剧,在服务器上很难收集含有个人信息的敏感日常通信,以便在现实生活中集中学习。为此,我们提议采用一种分散式的网钓电子邮件检测方法,称为联邦菲律宾碗(FPB),利用联合学习和长期的短期记忆(LSTM)。FPB允许不同客户通过汇集经过培训的模型共享共同知识并分享知识,以保障电子邮件安全和隐私。最近从一个政府间组织收集了含有个人信息的网络邮件数据集,以培训模型。此外,我们根据对客户总数和数据异质程度的各种假设,评估了模型的性能。全面实验结果表明,FPBB对不断增长的客户数量和各种敏感程度的电子邮件保密性检测,保留了0.8的精确度。

相关内容

联邦学习

关注 199

联邦学习（Federated Learning）是一种新兴的人工智能基础技术，在 2016 年由谷歌最先提出，原本用于解决安卓手机终端用户在本地更新模型的问题，其设计目标是在保障大数据交换时的信息安全、保护终端数据和个人数据隐私、保证合法合规的前提下，在多参与方或多计算结点之间开展高效率的机器学习。其中，联邦学习可使用的机器学习算法不局限于神经网络，还包括随机森林等重要算法。联邦学习有望成为下一代人工智能协同算法和协作网络的基础。

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

【AAAI Tutorials 2019】联合学习：机器学习中的用户隐私，数据安全性和机密性（Federated Learning: User Privacy, Data Security and Confidentiality in Machine Learning）

专知会员服务

15+阅读 · 2019年11月18日