Third-party libraries (TPLs) are reused frequently in software applications for reducing development cost. However, they could introduce security risks as well. Many TPL detection methods have been proposed to detect TPL reuse in Android bytecode or in source code. This paper focuses on detecting TPL reuse in binary code, which is a more challenging task. For a detection target in binary form, libraries may be compiled and linked to separate dynamic-link files or built into a fused binary that contains multiple libraries and project-specific code. This could result in fewer available code features and lower the effectiveness of feature engineering. In this paper, we propose a binary TPL reuse detection framework, LibDB, which can effectively and efficiently detect imported TPLs even in stripped and fused binaries. In addition to the basic and coarse-grained features (string literals and exported function names), LibDB utilizes function contents as a new type of feature. It embeds all functions in a binary file to low-dimensional representations with a trained neural network. It further adopts a function call graph-based comparison method to improve the accuracy of the detection. LibDB is able to support version identification of TPLs contained in the detection target, which is not considered by existing detection methods. To evaluate the performance of LibDB, we construct three datasets for binary-based TPL reuse detection. Our experimental results show that LibDB is more accurate and efficient than state-of-the-art tools on the binary TPL detection task and the version identification task. Our datasets and source code used in this work are anonymously available at https://github.com/DeepSoftwareAnalytics/LibDB.
翻译:第三方图书馆(TPL)经常在软件应用程序中重新使用,以降低开发成本。但是,它们也可能引入安全风险。许多TPL检测方法已经提出,以检测在Android bytecode或源代码中的TPL再利用TPL。本文件侧重于在二进制代码中检测TPL再利用,这是一项更具挑战性的任务。对于二进制格式的检测目标,图书馆可以汇编并连接到单独的动态链接文件,或建在一个包含多个图书馆和项目特定代码的连接二进制二进制二进制二进制二进制二进制二进制二进制软件中。这可能会减少可用的代码功能,降低功能工程效率。在这个文件中,我们提议了一个基于二进制计算机的自动再利用检测框架框架(lib PLD),这个工具的精确性能检测方法可以有效检测。LibDB的当前测试工具的精确性能测试工具是智能的。LBDOD,这个工具的精确性测试工具是我们目前检测工具的测试工具。LBDBDUDO的精确性数据版本,这个工具是用于我们目前检测。