Signature and anomaly based techniques are the quintessential approaches to malware detection. However, these techniques have become increasingly ineffective as malware has become more sophisticated and complex. Researchers have therefore turned to deep learning to construct better performing model. In this paper, we create four different long-short term memory (LSTM) based models and train each to classify malware samples from 20 families. Our features consist of opcodes extracted from malware executables. We employ techniques used in natural language processing (NLP), including word embedding and bidirection LSTMs (biLSTM), and we also use convolutional neural networks (CNN). We find that a model consisting of word embedding, biLSTMs, and CNN layers performs best in our malware classification experiments.
翻译:以签名和异常为基础的技术是识别恶意软件的典型方法,但随着恶意软件越来越复杂和复杂,这些技术越来越无效,研究人员因此转向深层次学习,以构建更好的模型。在本文件中,我们创建了四个不同的基于长期短期内存模型,并培训每个模型对来自20个家庭的恶意软件样本进行分类。我们的特点包括从恶意软件执行中提取的代码。我们使用自然语言处理(NLP)中使用的技术,包括单词嵌入和双向LSTMs(BILSTM),我们还使用动态神经网络(CNN)。我们发现,由单词嵌入、双LSTM和CNN层组成的模型在我们恶意软件分类实验中表现最佳。