Neural code intelligence (CI) models are opaque black-boxes and offer little insight on the features they use in making predictions. This opacity may lead to distrust in their prediction and hamper their wider adoption in safety-critical applications. Recently, input program reduction techniques have been proposed to identify key features in the input programs to improve the transparency of CI models. However, this approach is syntax-unaware and does not consider the grammar of the programming language. In this paper, we apply a syntax-guided program reduction technique that considers the grammar of the input programs during reduction. Our experiments on multiple models across different types of input programs show that the syntax-guided program reduction technique is faster and provides smaller sets of key tokens in reduced programs. We also show that the key tokens could be used in generating adversarial examples for up to 65% of the input programs.
翻译:神经代码智能(CI) 模型是不透明的黑盒, 很少洞察到它们在预测中使用的特征。 这种不透明可能会导致对预测的不信任, 并妨碍在安全关键应用中更广泛地采用这些模型。 最近, 提出了输入程序减少技术, 以确定输入程序的关键特征, 以提高 CI 模型的透明度 。 但是, 这种方法是语法- unaware, 不考虑编程语言的语法 。 在本文中, 我们应用了一种以语法为指南的减少程序技术, 来考虑正在减少的输入方案的语法。 我们对不同类型输入程序的多重模型的实验显示, 以语法为指南的减少程序技术更快, 并在减少的程序中提供更小的几组关键符号 。 我们还表明, 关键符号可用于生成高达65%的输入方案的对抗示例 。