CELEST：面向全球协同威胁检测的联邦学习 (CELEST: Federated Learning for Globally Coordinated Threat Detection)

The cyber-threat landscape has evolved tremendously in recent years, with new threat variants emerging daily, and large-scale coordinated campaigns becoming more prevalent. In this study, we propose CELEST (CollaborativE LEarning for Scalable Threat detection, a federated machine learning framework for global threat detection over HTTP, which is one of the most commonly used protocols for malware dissemination and communication. CELEST leverages federated learning in order to collaboratively train a global model across multiple clients who keep their data locally, thus providing increased privacy and confidentiality assurances. Through a novel active learning component integrated with the federated learning technique, our system continuously discovers and learns the behavior of new, evolving, and globally-coordinated cyber threats. We show that CELEST is able to expose attacks that are largely invisible to individual organizations. For instance, in one challenging attack scenario with data exfiltration malware, the global model achieves a three-fold increase in Precision-Recall AUC compared to the local model. We also design a poisoning detection and mitigation method, DTrust, specifically designed for federated learning in the collaborative threat detection domain. DTrust successfully detects poisoning clients using the feedback from participating clients to investigate and remove them from the training process. We deploy CELEST on two university networks and show that it is able to detect the malicious HTTP communication with high precision and low false positive rates. Furthermore, during its deployment, CELEST detected a set of previously unknown 42 malicious URLs and 20 malicious domains in one day, which were confirmed to be malicious by VirusTotal.

翻译：近年来，网络威胁形势急剧演变，每天涌现出诸多新的威胁变种，大规模协同网络攻击也越来越普遍。本研究提出了一种名为CELEST（CollaborativE LEarning for Scalable Threat detection）的联邦学习框架，用于全球范围内HTTP协议下的威胁检测。该框架利用联邦学习机制，在客户端本地保留自己的数据，协同训练一个全局模型，提供更高的隐私保护和机密保密性。通过整合新颖的主动学习组件和联邦学习技术，我们的系统不断地发现和学习新的、不断演化的和全球协同的网络威胁。我们发现，CELEST能够暴露单个组织看不到的攻击。例如，在一种困难的数据窃取恶意软件攻击场景中，相较于本地模型，全局模型的精度-召回AUC排名提高了三倍。此外，我们针对协同威胁检测领域设计了一种中毒检测和缓解方法DTrust。DTrust专为联邦学习设计，可利用参与者的反馈来检测中毒客户端，并将其从训练过程中排除。我们在两个大学网络上部署了CELEST，并展示它能够高精度低误报率地检测恶意HTTP通信。此外，在部署过程中，CELEST在一天内检测到了42个先前未知的恶意URL和20个恶意域名，并被VirusTotal证实为恶意。