基于多级胎记模型的口译辅助软件再使用探测 (Interpretation-enabled Software Reuse Detection Based on a Multi-Level Birthmark Model)

Software reuse, especially partial reuse, poses legal and security threats to software development. Since its source codes are usually unavailable, software reuse is hard to be detected with interpretation. On the other hand, current approaches suffer from poor detection accuracy and efficiency, far from satisfying practical demands. To tackle these problems, in this paper, we propose \textit{ISRD}, an interpretation-enabled software reuse detection approach based on a multi-level birthmark model that contains function level, basic block level, and instruction level. To overcome obfuscation caused by cross-compilation, we represent function semantics with Minimum Branch Path (MBP) and perform normalization to extract core semantics of instructions. For efficiently detecting reused functions, a process for "intent search based on anchor recognition" is designed to speed up reuse detection. It uses strict instruction match and identical library call invocation check to find anchor functions (in short anchors) and then traverses neighbors of the anchors to explore potentially matched function pairs. Extensive experiments based on two real-world binary datasets reveal that \textit{ISRD} is interpretable, effective, and efficient, which achieves $97.2\%$ precision and $94.8\%$ recall. Moreover, it is resilient to cross-compilation, outperforming state-of-the-art approaches.

翻译：软件的再利用,特别是部分再利用,对软件开发构成法律和安全威胁。由于软件源代码通常不可用,因此很难用口译探测到软件再利用。另一方面,目前的方法由于检测准确性和效率差,远远不能满足实际需求。为了解决这些问题,我们在本文件中提议采用“textit{ISRD}”,一个基于多级胎记模型的解释辅助软件再利用检测方法,该模型包含功能水平、基本块级水平和教学水平。要克服交叉合成造成的混淆,我们代表最小分支路径(MBP)的功能语义,并进行常规化以提取指令的核心语义。为了高效地探测再利用功能,“基于锁定识别的意向搜索”程序旨在加速再利用探测。它使用严格的教学匹配和相同的图书馆访问来查找锚功能(在短锚上),然后让锚的邻居探索可能匹配的功能配对。基于两个真实世界双数据集的大规模实验显示,\ textititit {ISRD4) 是可解释的、有效性和高效率的方法。