Learning program representations has been the core prerequisite of code intelligent tasks such as code search and code clone detection. The state-of-the-art pre-trained models such as CodeBERT require the availability of large-scale code corpora. However, gathering training samples can be costly and infeasible for domain-specific languages such as Solidity for smart contracts. In this paper, we propose Zecoler, a zero-shot learning approach for code representations. Zecoler is built upon a pre-trained programming language model. In order to elicit knowledge from the pre-trained models efficiently, Zecoler casts the downstream tasks to the same form of pre-training tasks by inserting trainable prompts into the original input. Then, it employs the prompt learning technique which optimizes the pre-trained model by merely adjusting the original input. This enables the representation model to efficiently fit the scarce task-oriented data while reusing pre-trained knowledge. We evaluate Zecoler in three code intelligent tasks in two program languages that have no training samples, namely, Solidity and Go, with model trained in corpora of common languages such as Java. Experimental results show that our approach significantly outperforms baseline models in both zero-shot and few-shot settings.
翻译:代码搜索和代码克隆检测等代码智能任务的核心先决条件是学习程序表达法。如代码搜索和代码克隆检测等最先进的预先培训模式要求提供大规模代码公司。然而,收集培训样本对于智能合同的固态等特定域语言来说成本高且不可行。在本文件中,我们提议Zecoller,这是对代码表达法采取零光学习方法。Zecoller建在预先培训的编程语言模式上。为了从预培训模式中获取知识,Zecoller通过在原始投入中插入可培训提示器,将下游任务投向同样的培训前任务形式。然后,它采用即时学习技术,通过仅仅调整原始投入来优化预培训模式。这让代表模式能够有效地适应稀缺的任务导向数据,同时重新使用经过预先培训的知识。我们用三种代码智能语言对Zecoller进行了评估,这些语言没有经过培训的样本,即固态和高调,同时在普通语言的模范式(如Java)。实验结果显示我们以零光模模式的形式展示了我们的共同语言的模型。