Detecting similar code fragments, usually referred to as code clones, is an important task. In particular, code clone detection can have significant uses in the context of vulnerability discovery, refactoring and plagiarism detection. However, false positives are inevitable and always require manual reviews. In this paper, we propose Twin-Finder+, a novel closed-loop approach for pointer-related code clone detection that integrates machine learning and symbolic execution techniques to achieve precision. Twin-Finder+ introduces a formal verification mechanism to automate such manual reviews process. Our experimental results show Twin-Finder+ that can remove 91.69% false positives in average. We further conduct security analysis for memory safety using real-world applications, Links version 2.14 and libreOffice-6.0.0.1. Twin-Finder+ is able to find 6 unreported bugs in Links version 2.14 and one public patched bug in libreOffice-6.0.0.1.
翻译:检测类似代码碎片(通常称为代码克隆)是一项重要任务。特别是,代码克隆检测在脆弱性发现、再设定和白化检测方面可能具有重要用途。但是,虚假的阳性是不可避免的,而且总是需要人工审查。在本文中,我们提议双环式双环法,用于指针相关代码的检测,将机器学习和象征性执行技术结合起来,以达到精确度。双环式双环式+引入正式的核查机制,将这种手工审查过程自动化。我们的实验结果显示双环式双环式+可以平均消除91.69%的假阳性。我们进一步利用现实世界应用程序、链接2.14版和libreoffice-6.0.0.1。双环式双环式计算机能够发现链接2.14中的6个未报告的错误,并在libreoffice-6.0.0.1中发现一个公共补装的错误。