We consider the problem of program clone search, i.e. given a target program and a repository of known programs (all in executable format), the goal is to find the program in the repository most similar to our target program - with potential applications in terms of reverse engineering, program clustering, malware lineage and software theft detection. Recent years have witnessed a blooming in code similarity techniques, yet most of them focus on function-level similarity while we are interested in program-level similarity. Consequently, these recent approaches are not directly suited to program clone search, being either too slow to handle large code bases, not precise enough, or not robust against slight variations introduced by compilation or source code versions. We introduce Programs Spectral Similarity (PSS), the first spectral analysis dedicated to program-level similarity. PSS reaches a sweet spot in terms of precision, speed and robustness. Especially, its one-time spectral feature extraction is tailored for large repositories of programs, making it a perfect fit for program clone search.
翻译:我们考虑的是程序克隆搜索问题,即给一个目标程序和已知程序储存库(都以可执行的格式),目标是在存储库中找到与我们的目标方案最相似的程序,在反向工程、程序集群、软件软件软件错误软件系列和软件盗窃探测方面可能应用。近年来,代码相似技术正在涌现,但大多数都集中在功能级相似性上,而我们对程序级相似性感兴趣。因此,这些最近的方法并不直接适合程序克隆搜索,要么太慢,无法处理大代码基础,不够精确,或者对编译或源代码版本引入的微小变异没有很强的力度。我们引入了方案光谱相似性(PSS),这是用于程序级相似性的第一个光谱分析。PSS在精确、速度和稳健性方面达到了一个甜点。特别是,其一次性光谱特征提取为程序大型存储库定制,因此它完全适合程序克隆搜索。