The abundance of publicly available source code repositories, in conjunction with the advances in neural networks, has enabled data-driven approaches to program analysis. These approaches, called neural program analyzers, use neural networks to extract patterns in the programs for tasks ranging from development productivity to program reasoning. Despite the growing popularity of neural program analyzers, the extent to which their results are generalizable is unknown. In this paper, we perform a large-scale evaluation of the generalizability of two popular neural program analyzers using seven semantically-equivalent transformations of programs. Our results caution that in many cases the neural program analyzers fail to generalize well, sometimes to programs with negligible textual differences. The results provide the initial stepping stones for quantifying robustness in neural program analyzers.
翻译:大量公开的源代码储存库,加上神经网络的进步,使得数据驱动的方案分析方法成为了一种由数据驱动的方法。这些方法称为神经程序分析器,利用神经网络来提取从发展生产率到程序推理等各种任务的方案模式。尽管神经程序分析器越来越受欢迎,但其结果在多大程度上可以普遍化却还不得而知。在本文件中,我们用七个等同的程式的等同变换对两个流行的神经程序分析器的一般性进行了大规模评估。我们的结果告诫说,在许多情况下,神经程序分析器没有很好地概括,有时没有很好地概括到有微小文字差异的方案。结果为量化神经程序分析器的稳健性提供了初步的踏脚石。