Matching binary to source code and vice versa has various applications in different fields, such as computer security, software engineering, and reverse engineering. Even though there exist methods that try to match source code with binary code to accelerate the reverse engineering process, most of them are designed to focus on one programming language. However, in real life, programs are developed using different programming languages depending on their requirements. Thus, cross-language binary-to-source code matching has recently gained more attention. Nonetheless, the existing approaches still struggle to have precise predictions due to the inherent difficulties when the problem of matching binary code and source code needs to be addressed across programming languages. In this paper, we address the problem of cross-language binary source code matching. We propose GraphBinMatch, an approach based on a graph neural network that learns the similarity between binary and source codes. We evaluate GraphBinMatch on several tasks, such as cross-language binary-to-source code matching and cross-language source-to-source matching. We also evaluate our approach performance on single-language binary-to-source code matching. Experimental results show that GraphBinMatch outperforms state-of-the-art significantly, with improvements as high as 15% over the F1 score.
翻译:二进制到源代码以及源代码到二进制的匹配在不同领域中都有着广泛的应用,例如计算机安全、软件工程和逆向工程等。虽然现有的方法尝试匹配源代码与二进制代码以加速逆向工程流程,但大多数方法都专注于一个编程语言。然而在实际生活中,程序根据其需求使用不同的编程语言进行开发。因此,跨语言二进制源代码匹配近年来变得越来越受到关注。尽管如此,现有的方法仍然难以进行准确的预测,因为在跨编程语言时匹配二进制代码和源代码时存在内在的困难。在本文中,我们解决了跨语言二进制源代码匹配问题。我们提出了GraphBinMatch,这是一种基于图神经网络的方法,它学习二进制和源代码之间的相似度。我们评估了GraphBinMatch在多个任务上的性能,如跨语言二进制源代码匹配和跨语言源到源匹配。我们还评估了我们的方法在单个语言的二进制源代码匹配上的性能。实验证明,GraphBinMatch的性能显著优于最先进的方法,F1分数的提高高达15%。