通过简化方案,了解神经法情报 (Understanding Neural Code Intelligence Through Program Simplification)

A wide range of code intelligence (CI) tools, powered by deep neural networks, have been developed recently to improve programming productivity and perform program analysis. To reliably use such tools, developers often need to reason about the behavior of the underlying models and the factors that affect them. This is especially challenging for tools backed by deep neural networks. Various methods have tried to reduce this opacity in the vein of "transparent/interpretable-AI". However, these approaches are often specific to a particular set of network architectures, even requiring access to the network's parameters. This makes them difficult to use for the average programmer, which hinders the reliable adoption of neural CI systems. In this paper, we propose a simple, model-agnostic approach to identify critical input features for models in CI systems, by drawing on software debugging research, specifically delta debugging. Our approach, SIVAND, uses simplification techniques that reduce the size of input programs of a CI model while preserving the predictions of the model. We show that this approach yields remarkably small outputs and is broadly applicable across many model architectures and problem domains. We find that the models in our experiments often rely heavily on just a few syntactic features in input programs. We believe that SIVAND's extracted features may help understand neural CI systems' predictions and learned behavior.

翻译：最近开发了由深层神经网络驱动的多种代码智能(CI)工具,用于提高编程生产率和进行程序分析。为了可靠地使用这些工具,开发者往往需要了解基础模型的行为和影响这些工具的因素。这对于由深层神经网络支持的工具来说尤其具有挑战性。各种方法试图减少“透明/解释-AI”的不透明性。然而,这些方法往往针对特定的一组网络结构,甚至需要访问网络参数。这使得它们难以用于妨碍可靠采用神经CI系统的普通程序设计员。在本文中,我们提出一个简单、模型-不可知性的方法,通过软件调试研究,特别是三角调试,确定CIS系统模型的关键输入特征。我们的方法SIVAND使用简化技术,在保存模型预测的同时,减少CIS模型输入程序的规模。我们发现,这种方法产生非常小的产出,并且在许多模型结构和问题领域广泛适用。我们发现,我们常常在很多模型结构和问题领域进行模型的模拟,我们相信SISAND的模型会在很大程度上依赖S-AND。