In recent years, there has been a wide interest in designing deep neural network-based models that automate downstream software engineering tasks, such as program document generation, code search, and program repair. Although the main objective of these studies is to improve the effectiveness of the downstream task, many studies only attempt to employ the next best neural network model, without a proper in-depth analysis of why a particular solution works or does not, on particular tasks or scenarios. In this paper, using an eXplainable AI (XAI) method (attention mechanism), we study state-of-the-art Transformer-based models (CodeBERT and GraphCodeBERT) on a set of software engineering downstream tasks: code document generation (CDG), code refinement (CR), and code translation (CT). We first evaluate the validity of the attention mechanism on each particular task. Then, through quantitative and qualitative studies, we identify what CodeBERT and GraphCodeBERT learn (put the highest attention on, in terms of source code token types), on these tasks. Finally, we show some of the common patterns when the model does not work as expected (perform poorly while the problem in hand is easy) and suggest recommendations that may alleviate the observed challenges.
翻译:近年来,人们广泛关注设计深层神经网络模型,这些模型使下游软件工程任务自动化,例如程序文件生成、代码搜索、程序维修等。虽然这些研究的主要目的是提高下游任务的效力,但许多研究只试图采用下一个最佳神经网络模型,而没有适当深入分析为什么特定解决方案在特定任务或情景上起作用或不起作用。在本文件中,我们使用一种可互换的 AI (XAI) 方法(注意机制),研究一套软件工程下游任务(CodeBERT和GapCoppCodeBERT) 最新模型(CodeBERT和GreabCodeBERT),研究一套软件工程下游任务:代码文件生成、代码完善(CDG)、代码改进(CR)和代码翻译(CT)。我们首先评价每个特定任务的关注机制的有效性,然后通过定量和定性研究,我们确定CoCBERT和GapogCodeBERT在这些任务上学到什么(在源代码符号类型方面给予最高度的注意)。最后,我们展示了一些共同模式的模式,当该模型没有如预期要解决的问题时,发现什么容易的问题。