Binary2source function matching is a fundamental task for many security applications, including Software Component Analysis (SCA). The "1-to-1" mechanism has been applied in existing binary2source matching works, in which one binary function is matched against one source function. However, we discovered that such mapping could be "1-to-n" (one query binary function maps multiple source functions), due to the existence of function inlining. To help conduct binary2source function matching under function inlining, we propose a method named O2NMatcher to generate Source Function Sets (SFSs) as the matching target for binary functions with inlining. We first propose a model named ECOCCJ48 for inlined call site prediction. To train this model, we leverage the compilable OSS to generate a dataset with labeled call sites (inlined or not), extract several features from the call sites, and design a compiler-opt-based multi-label classifier by inspecting the inlining correlations between different compilations. Then, we use this model to predict the labels of call sites in the uncompilable OSS projects without compilation and obtain the labeled function call graphs of these projects. Next, we regard the construction of SFSs as a sub-tree generation problem and design root node selection and edge extension rules to construct SFSs automatically. Finally, these SFSs will be added to the corpus of source functions and compared with binary functions with inlining. We conduct several experiments to evaluate the effectiveness of O2NMatcher and results show our method increases the performance of existing works by 6% and exceeds all the state-of-the-art works.
翻译:Binary2 源代码函数匹配是许多安全应用程序的基本任务, 包括软件元件分析( SSC ) 。 “ 1到 1” 机制已被应用到现有的二进制2源源匹配工程中, 其中将一个二进制函数匹配到一个源函数。 然而, 我们发现, 这种映射可能是“ 1 到 n ” (一个查询二进制函数映射多个源函数), 这是因为存在函数内衬。 为了帮助执行二进制匹配功能, 我们提议了一种名为 O2NMatcher 的源代码函数匹配方法, 用于生成与内衬中的二进制函数匹配的目标 。 我们首先提议了一个名为 ECOCCJ48 的模型, 用于线内调站点预测。 为了培训这个模型, 我们利用可调制的 OCCCJ48 来生成一个二进制的二进制函数, 以“ 1到 N” (一个调制) (一个调制), 从调制的二进制中提取一个基于多调的多标签的多标签的多标签, 。 然后, 我们用这个模型来预测调制的调制的 OCFS- breal 的标签, 将显示的运行中, 将显示S- brealdrode 的运行的运行的运行的运行的功能 。