Malicious document files used in targeted attacks often contain a small program called shellcode. It is often hard to prepare a runnable environment for dynamic analysis of these document files because they exploit specific vulnerabilities. In these cases, it is necessary to identify the position of the shellcode in each document file to analyze it. If the exploit code uses executable scripts such as JavaScript and Flash, it is not so hard to locate the shellcode. On the other hand, it is sometimes almost impossible to locate the shellcode when it does not contain any JavaScript or Flash but consists of native x86 code only. Binary fragment classification is often applied to visualize the location of regions of interest, and shellcode must contain at least a small fragment of x86 native code even if most of it is obfuscated, such as, a decoder for the obfuscated body of the shellcode. In this paper, we propose a novel method, o-glasses, to visualize the shellcode by recognizing the x86 native code using a specially designed one-dimensional convolutional neural network (1d-CNN). The fragment size needs to be as small as the minimum size of the x86 native code in the whole shellcode. Our results show that a 16-instruction-sequence (approximately 48 bytes on average) is sufficient for the code fragment visualization. Our method, o-glasses (1d-CNN), outperforms other methods in that it recognizes x86 native code with a surprisingly high F-measure rate (about 99.95%).
翻译:目标攻击中使用的恶意文档文件往往包含一个名为 shellcode 的小型程序。 通常很难为动态分析这些文档文件准备一个可运行的环境, 因为这些文档文件使用特定的弱点。 在这些情况下, 需要在每个文档文件中识别shellcode的位置来分析它。 如果开发代码使用 JavaScript 和 Flash 等可执行的脚本, 找到贝壳代码并不那么困难。 另一方面, 当贝壳代码不包含任何 JavaScript 或 Flash 时, 有时几乎不可能找到它的位置, 但它只包含本地的 x86 代码。 二元碎片分类通常用于将感兴趣的区域定位为可视化 。 在这些文件中, 贝壳代码至少必须包含一个 x86 本地代码的小片段, 即使大部分是模糊的, 比如 JavaScript 和 Flasholt 。 在本文中, 我们提出的一种新方法, oclicle, 通过一个专门设计的一维度 内存的本地代码( 1- N) o- oral oral cal comn net net network net net net net net net net net net net net net comm comm comm comm comm lax lax lax lax) lax lax lax lax le a le a le a laxxxx lex le ax le axxx le a licoldemodex le a le a lix lex lex le a lex lix lix lix le a licold le a le a le a le a le a le a le a le a le a le a le a le a le a lical le a le a le a le a le a le a le a le a le a le a le a le a le a le a le a le a lical lical lical le a le a le a li