Independently developed codebases typically contain many segments of code that perform same or closely related operations (semantic clones). Finding functionally equivalent segments enables applications like replacing a segment by a more efficient or more secure alternative. Such related segments often have different interfaces, so some glue code (an adapter) is needed to replace one with the other. We present an algorithm that searches for replaceable code segments at the function level by attempting to synthesize an adapter between them from some family of adapters; it terminates if it finds no possible adapter. We implement our technique using (1) concrete adapter enumeration based on Intel's Pin framework (2) binary symbolic execution, and explore the relation between size of adapter search space and total search time. We present examples of applying adapter synthesis for improving security and efficiency of binary functions, deobfuscating binary functions, and switching between binary implementations of RC4. We present two large-scale evaluations, (1) we run adapter synthesis on more than 13,000 function pairs from the Linux C library, (2) using more than 61,000 fragments of binary code extracted from a ARM image built for the iPod Nano 2g device and known functions from the VLC media player, we evaluate our adapter synthesis implementation on more than a million synthesis tasks . Our results confirm that several instances of adaptably equivalent binary functions exist in real-world code, and suggest that adapter synthesis can be applied for reverse engineering and for constructing cleaner, less buggy, more efficient programs.
翻译:独立开发的代码库通常包含许多执行相同或密切相关操作的代码部分( 语义克隆 ) 。 查找功能等同的部件可以让应用程序替换一个部分, 比如用更高效或更安全的替代品替换一个部分。 这些相关部分通常有不同的界面。 因此, 需要用一些粘合代码( 适配器) 替换另一个界面。 我们提出了一个算法, 通过尝试合成某些适配器组群的可替换代码部分, 在函数层中搜索一个可替换代码部分; 如果它找不到任何可能的适配器, 它就会终止 。 我们使用技术, 使用 (1) 根据 Intel 的 Pin 框架进行的具体调配对(2) 二进式符号执行, 探索调适配器搜索空间大小和总搜索时间之间的关系。 我们演示应用调配对器合成器合成可替换的可替换代码部分 。 我们演示了两种大规模评估, (1) 我们从 Linux C 图书馆对超过13 000个函数进行调配对, 使用超过 61,000个二进调码的二进集搜索器搜索空间搜索空间搜索空间搜索空间搜索空间空间的大小,,,, 将我们所建的更清晰化的校正变校对的校正校正, 。