The cyber-threat landscape has evolved tremendously in recent years, with new threat variants emerging daily, and large-scale coordinated campaigns becoming more prevalent. In this study, we propose CELEST (CollaborativE LEarning for Scalable Threat detection), a federated machine learning framework for global threat detection over HTTP, which is one of the most commonly used protocols for malware dissemination and communication. CELEST leverages federated learning in order to collaboratively train a global model across multiple clients who keep their data locally, thus providing increased privacy and confidentiality assurances. Through a novel active learning component integrated with the federated learning technique, our system continuously discovers and learns the behavior of new, evolving, and globally-coordinated cyber threats. We show that CELEST is able to expose attacks that are largely invisible to individual organizations. For instance, in one challenging attack scenario with data exfiltration malware, the global model achieves a three-fold increase in Precision-Recall AUC compared to the local model. We deploy CELEST on two university networks and show that it is able to detect the malicious HTTP communication with high precision and low false positive rates. Furthermore, during its deployment, CELEST detected a set of previously unknown 42 malicious URLs and 20 malicious domains in one day, which were confirmed to be malicious by VirusTotal.
翻译:近年来,网络威胁的格局发生了巨大变化,新的威胁变异每天都在出现,大规模协调的宣传运动越来越普遍。在本研究中,我们提议CELEST(Collaborative Learning Leaning for Scalable Dream Setation)为全球威胁探测建立联合的机器学习框架,这是用于恶意软件传播和通信的最常用协议之一。CELEST利用联合学习来协作培训一个全球模型,让多个客户保持其当地数据,从而增加隐私和保密保证。我们通过与联合学习技术相结合的新颖的积极学习组成部分,我们的系统不断发现和学习新的、不断发展的、全球协调的网络威胁的行为。我们表明,CELEST能够揭露对单个组织来说基本上看不见的攻击。例如,在一个具有挑战性的进攻情景中,用数据过滤错误软件,全球模型比当地模型的准确度增加了三倍。我们在两个大学网络上安装了CELE,并显示它能够以错误的准确度探测到过去20天的恶意服务器。