Though the pre-trained contextualized language model (PrLM) has made a significant impact on NLP, training PrLMs in languages other than English can be impractical for two reasons: other languages often lack corpora sufficient for training powerful PrLMs, and because of the commonalities among human languages, computationally expensive PrLM training for different languages is somewhat redundant. In this work, building upon the recent works connecting cross-lingual model transferring and neural machine translation, we thus propose a novel cross-lingual model transferring framework for PrLMs: TreLM. To handle the symbol order and sequence length differences between languages, we propose an intermediate ``TRILayer" structure that learns from these differences and creates a better transfer in our primary translation direction, as well as a new cross-lingual language modeling objective for transfer training. Additionally, we showcase an embedding aligning that adversarially adapts a PrLM's non-contextualized embedding space and the TRILayer structure to learn a text transformation network across languages, which addresses the vocabulary difference between languages. Experiments on both language understanding and structure parsing tasks show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency. Moreover, despite an insignificant performance loss compared to pre-training from scratch in resource-rich scenarios, our cross-lingual model transferring framework is significantly more economical.
翻译:尽管经过培训的背景语言模式(PrLM)已经对NLP产生了重大影响,但用英语以外的语言培训 PrLMs可能不切实际,原因有二:其他语言往往缺乏足以培训强大的ProLMs的Corpora, 并且由于人类语言的共同性,计算昂贵的不同语言的PrLM培训有些多余。在这项工作中,根据最近将跨语言模式转让和神经机翻译连接起来的工作,我们提议为PrLMs建立一个新的跨语言模式传输框架:TreLM。为了处理语言之间的符号顺序和顺序差异,我们提议了一个中间的“TRILayer”结构,从这些差异中学习并创造更好的初级翻译方向转移,以及由于新的跨语言模式模式对不同语言的培训,不同语言的培训有些多余。 此外,我们展示了一种嵌入式的结合,即将PrLM的不通俗的嵌入空间和TRILAyer结构用于学习跨语言的文本转换网络,解决语言之间的词汇差异。在语言理解和结构中对语言结构的区分上都进行实验,从语言理解和结构结构的实验,使我们在初级翻译方向进行更好的跨语言框架展示了一种比重度框架。