Deep learning models are widely used for solving challenging code processing tasks, such as code generation or code summarization. Traditionally, a specific model architecture was carefully built to solve a particular code processing task. However, recently general pretrained models such as CodeBERT or CodeT5 have been shown to outperform task-specific models in many applications. While pretrained models are known to learn complex patterns from data, they may fail to understand some properties of source code. To test diverse aspects of code understanding, we introduce a set of diagnosting probing tasks. We show that pretrained models of code indeed contain information about code syntactic structure and correctness, the notions of identifiers, data flow and namespaces, and natural language naming. We also investigate how probing results are affected by using code-specific pretraining objectives, varying the model size, or finetuning.
翻译:深度学习模型被广泛用于解决具有挑战性的代码处理任务,如代码生成或代码汇总等。传统上,为解决特定代码处理任务,精心设计了一个特定的模型结构。然而,最近,在很多应用中,诸如 codeBERT 或 codT5 等一般的预设模型显示,在许多应用中,预设模型优于特定任务模式。虽然预设模型已知可以从数据中学习复杂模式,但它们可能无法理解源代码的某些特性。为了测试代码理解的不同方面,我们引入了一套诊断性测试任务。我们显示,预先培训的代码模型确实包含关于代码合成结构和正确性、识别器概念、数据流和命名空间以及自然语言命名的信息。我们还调查了使用特定代码的预设目标、改变模型大小或微调如何影响验证结果。