While large pre-trained models have transformed the field of natural language processing (NLP), the high training cost and low cross-lingual availability of such models prevent the new advances from being equally shared by users across all languages, especially the less spoken ones. To promote equal opportunities for all language speakers in NLP research and to reduce energy consumption for sustainability, this study proposes an effective and energy-efficient framework GreenPLM that uses bilingual lexicons to directly translate language models of one language into other languages at (almost) no additional cost. We validate this approach in 18 languages and show that this framework is comparable to, if not better than, other heuristics trained with high cost. In addition, when given a low computational cost (2.5\%), the framework outperforms the original monolingual language models in six out of seven tested languages. We release language models in 50 languages translated from English and the source code here.
翻译:虽然经过培训的大型模式改变了自然语言处理领域,但是由于培训费用高,而且跨语文的可用率低,这些模式使得所有语文的使用者,特别是讲较少的语文的使用者无法平等分享新的进展。为了在国家语言方案的研究中促进所有语言使用者的平等机会,并减少能源消耗以促进可持续性,本研究报告提出了一个有效、节能的框架“绿色语言”方案,利用双语词汇将一种语言的语文模式直接翻译成其他语言,不增加(几乎不增加任何费用)。我们用18种语言验证了这一方法,并表明这一框架与其他高成本培训的超文本类人相似,如果不是更好的话。此外,在计算成本低(2.5 ⁇ )的情况下,该框架比7种经测试的语言中的6种原单一语言模式要好。我们用50种语言将英语和源代码翻译成这里。