Code completion is a valuable topic in both academia and industry. Recently, large-scale mono-programming-lingual (MonoPL) pre-training models have been proposed to boost the performance of code completion. However, the code completion on low-resource programming languages (PL) is difficult for the data-driven paradigm, while there are plenty of developers using low-resource PLs. On the other hand, there are few studies exploring the effects of multi-programming-lingual (MultiPL) pre-training for the code completion, especially the impact on low-resource programming languages. To this end, we propose the MultiCoder to enhance the low-resource code completion via MultiPL pre-training and MultiPL Mixture-of-Experts (MoE) layers. We further propose a novel PL-level MoE routing strategy (PL-MoE) for improving the code completion on all PLs. Experimental results on CodeXGLUE and MultiCC demonstrate that 1) the proposed MultiCoder significantly outperforms the MonoPL baselines on low-resource programming languages, and 2) the PL-MoE module further boosts the performance on six programming languages. In addition, we analyze the effects of the proposed method in details and explore the effectiveness of our method in a variety of scenarios.
翻译:守则的完成是学术界和工业界的一个宝贵议题。最近,提出了大规模单一方案语言(MONPL)培训前模式,以提高完成代码的绩效。然而,低资源方案编制语言(PL)的代码完成对于数据驱动范式来说是困难的,而大量开发者使用低资源平台。另一方面,很少有研究探讨多方案语言(MOLTLE)完成代码的预培训的效果,特别是对低资源方案编制语言的影响。为此,我们建议多科德通过多语言培训前和多资源混合培训(MOE)层加强完成低资源代码的完成。我们进一步提出新的PLE级移动战略(PL-MOE),以改进所有PLS的代码完成。关于DCXGLUE和多CC的实验结果表明:(1) 拟议的多科德语系大大超越了多资源方案编制语言的单项基准。我们提出的PLPL方法在六种语言编程中进一步提升了我们拟议方法的有效性。