Malicious software (malware) causes much harm to our devices and life. We are eager to understand the malware behavior and the threat it made. Most of the record files of malware are variable length and text-based files with time stamps, such as event log data and dynamic analysis profiles. Using the time stamps, we can sort such data into sequence-based data for the following analysis. However, dealing with the text-based sequences with variable lengths is difficult. In addition, unlike natural language text data, most sequential data in information security have specific properties and structure, such as loop, repeated call, noise, etc. To deeply analyze the API call sequences with their structure, we use graphs to represent the sequences, which can further investigate the information and structure, such as the Markov model. Therefore, we design and implement an Attention Aware Graph Neural Network (AWGCN) to analyze the API call sequences. Through AWGCN, we can obtain the sequence embeddings to analyze the behavior of the malware. Moreover, the classification experiment result shows that AWGCN outperforms other classifiers in the call-like datasets, and the embedding can further improve the classic model's performance.
翻译:恶意软件( 磁盘) 给我们的装置和生命造成了很大的伤害。 我们渴望理解恶意软件的行为和它造成的威胁。 恶意软件的大多数记录文件长度不一,以文字为基础,带有时间戳,例如事件日志数据和动态分析剖面。 使用时间戳, 我们可以将这些数据排序为基于序列的数据, 用于以下分析。 但是, 处理基于文本的序列, 长度不一是困难的。 此外, 与自然语言文本数据不同, 信息安全中的大多数顺序数据都具有特定的属性和结构, 如循环、 重复调用、 噪音等 。 要深入分析API 调用序列的结构, 我们使用图表来代表序列, 可以进一步调查信息和结构, 如Markov 模型。 因此, 我们设计并使用一个关注图形神经网络( AWGCN) 来分析API 调用序列。 我们通过 AWGCN 获得序列嵌入式以分析恶意软件的行为。 此外, 分类实验结果显示, AICN 超越了类似调用数据集中的其他分类系统, 以及嵌入性能进一步改进。