Over the past few years, deep neural networks (DNNs) have been continuously expanding their real-world applications for source code processing tasks across the software engineering domain, e.g., clone detection, code search, comment generation. Although quite a few recent works have been performed on testing of DNNs in the context of image and speech processing, limited progress has been achieved so far on DNN testing in the context of source code processing, that exhibits rather unique characteristics and challenges. In this paper, we propose a search-based testing framework for DNNs of source code embedding and its downstream processing tasks like Code Search. To generate new test inputs, we adopt popular source code refactoring tools to generate the semantically equivalent variants. For more effective testing, we leverage the DNN mutation testing to guide the testing direction. To demonstrate the usefulness of our technique, we perform a large-scale evaluation on popular DNNs of source code processing based on multiple state-of-the-art code embedding methods (i.e., Code2vec, Code2seq and CodeBERT). The testing results show that our generated adversarial samples can on average reduce the performance of these DNNs from 5.41% to 9.58%. Through retraining the DNNs with our generated adversarial samples, the robustness of DNN can improve by 23.05% on average. The evaluation results also show that our adversarial test generation strategy has the least negative impact (median of 3.56%), on the performance of the DNNs for regular test data, compared to the other methods.
翻译:过去几年来,深层神经网络(DNN)在软件工程领域不断扩大其源码处理任务的实际应用,例如克隆检测、代码搜索、评论生成等。虽然在图像和语音处理方面最近对 DNN 测试做了一些工作,但在图像和语音处理方面,迄今为止在DNN 测试方面进展有限,在源码处理方面,对源码嵌入和下游处理任务(如代码搜索)的 DNN 测试框架非常独特。为了产生新的测试投入,我们采用了流行源码重新构思工具来生成等同的变体。为了进行更有效的测试,我们利用DNNN突变测试来指导测试方向。为了展示我们技术的效用,我们对源码处理的流行 DNNN 进行大规模评估,基于多种状态的代码最小嵌入方法(即代码2vec、代码2SECE和DCBBERT)进行。为了产生新的测试投入,我们采用了流行的源码再培训效果,我们正常的D.58NNR 测试结果可以显示我们正常的D 的D 测试结果。通过正常的D 测试结果,通过测试结果可以显示我们正常的D 的D 的D 测试结果,通过测试结果来显示我们正常的RNNNNNR 。