释放编译者最佳优化二元代码差异的隐藏力量:经验研究 (Unleashing the Hidden Power of Compiler Optimization on Binary Code Difference: An Empirical Study)

Since compiler optimization is the most common source contributing to binary code differences in syntax, testing the resilience against the changes caused by different compiler optimization settings has become a standard evaluation step for most binary diffing approaches. For example, 47 top-venue papers in the last 12 years compared different program versions compiled by default optimization levels (e.g., -Ox in GCC and LLVM). Although many of them claim they are immune to compiler transformations, it is yet unclear about their resistance to non-default optimization settings. Especially, we have observed that adversaries explored non-default compiler settings to amplify malware differences. This paper takes the first step to systematically studying the effectiveness of compiler optimization on binary code differences. We tailor search-based iterative compilation for the auto-tuning of binary code differences. We develop BinTuner to search near-optimal optimization sequences that can maximize the amount of binary code differences. We run BinTuner with GCC 10.2 and LLVM 11.0 on SPEC benchmarks (CPU2006 & CPU2017), Coreutils, and OpenSSL. Our experiments show that at the cost of 279279 to 1,8811,881 compilation iterations, BinTuner can find custom optimization sequences that are substantially better than the general -Ox settings. BinTuner's outputs seriously undermine prominent binary diffing tools' comparisons. In addition, the detection rate of the IoT malware variants tuned by BinTuner falls by more than 50%. Our findings paint a cautionary tale for security analysts that attackers have a new way to mutate malware code cost-effectively, and the research community needs to step back to reassess optimization-resistance evaluations.

翻译：由于编译优化是最常见的来源, 从而导致语法中的二进制代码差异, 测试不同编译优化设置导致的变化的抗御能力已成为大多数二进制调试方法的标准评估步骤。例如, 在过去12年里, 47 个顶尖版本纸质文件与以默认优化级别( 例如, 海湾合作委员会和 LLLVM ) 汇编的不同程序版本相比, 过去12年中, 47个顶层纸质文件比以默认优化级别( 例如, 海湾合作委员会和 LLVM ) 。尽管其中许多人声称他们不受编译代码变的影响, 但他们对非默认优化设置的阻力尚不清楚。特别是, 我们观察到, 对手探索了非默认的编译器设置以扩大恶意软件差异。本文迈出了第一步, 系统化了对二进制代码差异的精度优化效果。我们的实验显示, 2779 & CPUU2017, Coreutierls, 以及 Opreal- laseralal- Serviews, 需要更精确地进行编译。