Binary code similarity analysis (BCSA) is widely used for diverse security applications, including plagiarism detection, software license violation detection, and vulnerability discovery. Despite the surging research interest in BCSA, it is significantly challenging to perform new research in this field for several reasons. First, most existing approaches focus only on the end results, namely, increasing the success rate of BCSA, by adopting uninterpretable machine learning. Moreover, they utilize their own benchmark, sharing neither the source code nor the entire dataset. Finally, researchers often use different terminologies or even use the same technique without citing the previous literature properly, which makes it difficult to reproduce or extend previous work. To address these problems, we take a step back from the mainstream and contemplate fundamental research questions for BCSA. Why does a certain technique or a certain feature show better results than the others? Specifically, we conduct the first systematic study on the basic features used in BCSA by leveraging interpretable feature engineering on a large-scale benchmark. Our study reveals various useful insights on BCSA. For example, we show that a simple interpretable model with a few basic features can achieve a comparable result to that of recent deep learning-based approaches. Furthermore, we show that the way we compile binaries or the correctness of underlying binary analysis tools can significantly affect the performance of BCSA. Lastly, we make all our source code and benchmark public and suggest future directions in this field to help further research.
翻译:在多种安全应用中,广泛使用二元代码相似性分析(BCSA),包括破坏性检测、软件许可证违规检测和脆弱性发现。尽管BCSA的研究兴趣日益浓厚,但在这一领域开展新研究仍具有很大挑战性,原因有几种。首先,大多数现有方法仅侧重于最终结果,即通过采用无法解释的机器学习,提高BCSA的成功率。此外,它们使用自己的基准,既不共享源代码,也不共享整个数据集。最后,研究人员经常使用不同的术语,甚至使用同样的技术,而不适当引用以前的文献,这就难以复制或扩展先前的工作。为了解决这些问题,我们从主流中倒退一步,考虑BCSA的基本研究问题。为什么某种技术或某些特征显示比其他方法更好的结果?具体地说,我们通过利用可解释性特征工程在大型基准上进行,我们的研究揭示了对BSA的各种有用见解。我们展示了一个简单的解释性模型,用一些基本特征复制或扩展了以前的工作。我们深层次的BSA的实地分析方法可以使我们的实地分析结果得到比较。我们进行深入的实地评估。我们从基础的实地分析,我们为基础。