Over past years, the manually methods to create detection rules were no longer practical in the anti-malware product since the number of malware threats has been growing. Thus, the turn to the machine learning approaches is a promising way to make the malware recognition more efficient. The traditional centralized machine learning requires a large amount of data to train a model with excellent performance. To boost the malware detection, the training data might be on various kind of data sources such as data on host, network and cloud-based anti-malware components, or even, data from different enterprises. To avoid the expenses of data collection as well as the leakage of private data, we present a federated learning system to identify malwares through the behavioural graphs, i.e., system call dependency graphs. It is based on a deep learning model including a graph autoencoder and a multi-classifier module. This model is trained by a secure learning protocol among clients to preserve the private data against the inference attacks. Using the model to identify malwares, we achieve the accuracy of 85\% for the homogeneous graph data and 93\% for the inhomogeneous graph data.
翻译:过去几年来,由于恶意软件威胁的数量不断增加,创建检测规则的人工方法在防疟软件产品中已不再实用,因此,转向机器学习方法是提高恶意软件识别效率的一个很有希望的方法。传统的中央机器学习需要大量数据来训练一个表现出色的模型。为了提高恶意软件检测,培训数据可能来自各种类型的数据源,如主机、网络和云基防疟软件组件数据,甚至来自不同企业的数据。为了避免数据收集费用和私人数据泄漏,我们提出了一个联合学习系统,通过行为图(即系统调用依赖图)识别恶意软件。它基于一个深层学习模型,包括一个图形自动编码器和一个多分类模块。这个模型由客户之间一个安全学习协议来培训,以保存私人数据不受推断攻击。使用模型来识别恶意软件,我们实现了对同质图形数据的85 ⁇ 和对同质图形数据的93 ⁇ 的准确度。