代号财产图中代表LLLVM-IR (Representing LLVM-IR in a Code Property Graph)

In the past years, a number of static application security testing tools have been proposed which make use of so-called code property graphs, a graph model which keeps rich information about the source code while enabling its user to write language-agnostic analyses. However, they suffer from several shortcomings. They work mostly on source code and exclude the analysis of third-party dependencies if they are only available as compiled binaries. Furthermore, they are limited in their analysis to whether an individual programming language is supported or not. While often support for well-established languages such as C/C++ or Java is included, languages that are still heavily evolving, such as Rust, are not considered because of the constant changes in the language design. To overcome these limitations, we extend an open source implementation of a code property graph to support LLVM-IR which can be used as output by many compilers and binary lifters. In this paper, we discuss how we address challenges that arise when mapping concepts of an intermediate representation to a CPG. At the same time, we optimize the resulting graph to be minimal and close to the representation of equivalent source code. Our evaluation indicates that existing analyses can be reused without modifications and that the performance requirements are comparable to operating on source code. This makes the approach suitable for an analysis of large-scale projects.

翻译：过去几年来,提出了若干静态应用安全测试工具,这些工具利用了所谓的代码属性图,这是一个图表模型,保存了关于源代码的丰富信息,同时使用户能够编写语言不可知性分析,但是,这些工具存在若干缺陷,它们大多使用源代码,如果仅作为汇编的二进制文件提供,则排除第三方依赖性分析。此外,它们的分析限于单个编程语言是否得到支持。虽然常常包括了对C/C++或Java等成熟语言的支持,但是,由于语言设计不断发生变化,诸如Rust等仍在大量演变的语言没有得到考虑。为了克服这些限制,我们扩展了一个代码属性图的开放源头,以支持LLLVM-IR, 它可以作为许多编译员和二进制电梯的输出。在本文中,我们讨论了如何应对在为中央采购公司绘制中间代表制概念时出现的挑战。与此同时,我们优化了由此产生的图表,因为Rust等语言仍在大量演变中,因此没有得到考虑。我们的评价表明,现有的软件的运行模式可以与大规模分析相仿照。