There is growing interest in software migration as the development of software and society. Manually migrating projects between languages is error-prone and expensive. In recent years, researchers have begun to explore automatic program translation using supervised deep learning techniques by learning from large-scale parallel code corpus. However, parallel resources are scarce in the programming language domain, and it is costly to collect bilingual data manually. To address this issue, several unsupervised programming translation systems are proposed. However, these systems still rely on huge monolingual source code to train, which is very expensive. Besides, these models cannot perform well for translating the languages that are not seen during the pre-training procedure. In this paper, we propose SDA-Trans, a syntax and domain-aware model for program translation, which leverages the syntax structure and domain knowledge to enhance the cross-lingual transfer ability. SDA-Trans adopts unsupervised training on a smaller-scale corpus, including Python and Java monolingual programs. The experimental results on function translation tasks between Python, Java, and C++ show that SDA-Trans outperforms many large-scale pre-trained models, especially for unseen language translation.
翻译:随着软件和社会的发展,对软件迁移的兴趣日益浓厚。语言之间的人工迁移项目容易出错,费用也很高。近年来,研究人员开始通过从大规模平行代码中学习,探索使用监督深层学习技术的自动程序翻译。然而,在编程语言领域,平行资源稀缺,人工收集双语数据成本很高。为解决这一问题,提出了几个不受监督的编程翻译系统。然而,这些系统仍然依靠巨大的单语源代码进行培训,这非常昂贵。此外,这些模型在翻译培训前程序期间未见的语言时无法很好地发挥作用。我们在此文件中提议,SDA-Trans,这是一个用于程序翻译的同步和域觉悟模式,利用合成税结构和域知识来增强跨语言的传输能力。SDA-Transal采用不受监督的小型程序培训,包括Python和Java 单语程序。Python、Java和C++之间的功能翻译任务实验结果显示,SDA-Transa-Transtrad 超越了许多大规模预先培训模式,特别是用于无形语言翻译。