Currently, a growing number of mature natural language processing applications make people's life more convenient. Such applications are built by source code - the language in software engineering. However, the applications for understanding source code language to ease the software engineering process are under-researched. Simultaneously, the transformer model, especially its combination with transfer learning, has been proven to be a powerful technique for natural language processing tasks. These breakthroughs point out a promising direction for process source code and crack software engineering tasks. This paper describes CodeTrans - an encoder-decoder transformer model for tasks in the software engineering domain, that explores the effectiveness of encoder-decoder transformer models for six software engineering tasks, including thirteen sub-tasks. Moreover, we have investigated the effect of different training strategies, including single-task learning, transfer learning, multi-task learning, and multi-task learning with fine-tuning. CodeTrans outperforms the state-of-the-art models on all the tasks. To expedite future works in the software engineering domain, we have published our pre-trained models of CodeTrans. https://github.com/agemagician/CodeTrans
翻译:目前,越来越多的成熟的自然语言处理应用程序使得人们的生活更加方便。这些应用程序是由源代码――软件工程中的语言。然而,用于理解源代码语言以方便软件工程过程的应用程序研究不足。与此同时,变压器模型,特别是它与转移学习相结合,已证明是自然语言处理任务的有力技术。这些突破指出了处理源代码和破碎软件工程任务的有希望的方向。本文件描述了代码Trans――软件工程领域任务的一个编码-解码变压器变压器模型,它探索了包括13个子任务在内的6个软件工程任务的编码变压器模型的有效性。此外,我们研究了不同培训战略的影响,包括单任务学习、转移学习、多任务学习和通过微调进行多任务学习。代码超越了所有任务方面的最新技术模型。为了加快软件工程领域的未来工程,我们出版了我们事先培训过的代码Transer模型。 https://github.com/agegigicopician https://transycrodecoprigician/dogrationian。