The longstanding goal of multi-lingual learning has been to develop a universal cross-lingual model that can withstand the changes in multi-lingual data distributions. However, most existing models assume full access to the target languages in advance, whereas in realistic scenarios this is not often the case, as new languages can be incorporated later on. In this paper, we present the Cross-lingual Lifelong Learning (CLL) challenge, where a model is continually fine-tuned to adapt to emerging data from different languages. We provide insights into what makes multilingual sequential learning particularly challenging. To surmount such challenges, we benchmark a representative set of cross-lingual continual learning algorithms and analyze their knowledge preservation, accumulation, and generalization capabilities compared to baselines on carefully curated datastreams. The implications of this analysis include a recipe for how to measure and balance between different cross-lingual continual learning desiderata, which goes beyond conventional transfer learning.
翻译:多语种学习的长期目标是发展一种能够承受多语种数据分配变化的普遍的跨语言模式,然而,大多数现有模式都预先假定能够充分使用目标语言,而在现实情况下,情况往往并非如此,因为新语言可以稍后纳入。在本文件中,我们介绍了跨语言终身学习(CLL)的挑战,在这个挑战中,一个模式不断进行微调,以适应来自不同语言的新兴数据。我们深入了解是什么使多语言连续学习特别具有挑战性。为了克服这些挑战,我们设定了一套跨语言持续学习算法的代表性,并分析了其知识的保存、积累和普及能力,与仔细整理数据流的基线相比。这一分析的影响包括如何衡量和平衡不同语言连续学习之间的差异,这超出了传统的转移学习。