低资源方案编制语言培训前语文模式的可转让性 (On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages)

A recent study by Ahmed and Devanbu reported that using a corpus of code written in multilingual datasets to fine-tune multilingual Pre-trained Language Models (PLMs) achieves higher performance as opposed to using a corpus of code written in just one programming language. However, no analysis was made with respect to fine-tuning monolingual PLMs. Furthermore, some programming languages are inherently different and code written in one language usually cannot be interchanged with the others, i.e., Ruby and Java code possess very different structure. To better understand how monolingual and multilingual PLMs affect different programming languages, we investigate 1) the performance of PLMs on Ruby for two popular Software Engineering tasks: Code Summarization and Code Search, 2) the strategy (to select programming languages) that works well on fine-tuning multilingual PLMs for Ruby, and 3) the performance of the fine-tuned PLMs on Ruby given different code lengths. In this work, we analyze over a hundred of pre-trained and fine-tuned models. Our results show that 1) multilingual PLMs have a lower Performance-to-Time Ratio (the BLEU, METEOR, or MRR scores over the fine-tuning duration) as compared to monolingual PLMs, 2) our proposed strategy to select target programming languages to fine-tune multilingual PLMs is effective: it reduces the time to fine-tune yet achieves higher performance in Code Summarization and Code Search tasks, and 3) our proposed strategy consistently shows good performance on different code lengths.

翻译：Ahmed和Devanbu最近进行的一项研究报告说,利用多语言数据集中写成的一套代码,微调多语多语多语言先培训语言模型(PLM)比使用仅用一种编程语言的代码得到更高的业绩,但并未对单语PLM进行微调分析;此外,有些编制语言本质上是不同的,用一种语言编写的代码通常无法与其他语言互换,例如Ruby和Jvan代码通常无法与其他语言互换,一种语言的代码通常无法与其他语言互换,即Ruby和Jvanbu代码具有非常不同的结构。为了更好地了解单语和多语言的PLM 如何影响不同的编程语言,我们调查了1个在Ruby Ruby 两种受欢迎的软件工程工程任务中,即:代码Summarmarn 和代码搜索,2 战略(选择程序语言),该战略在为Ruby 微调多语言、调调调调调调调调调调的PLMMLM-规则草案中,多语言的DLM-MLM-规划战略比我们的拟议《规则》,比我们《规则》、高工序、长、高语言-规则-规则-规则-规则-规划、长、长、长、长、长、长、长、长、长、长、长、长、长、长、长、长、长、长、长规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-比、长-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-比-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-比-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-比-比-比-比-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-规则-比-比-比-、