UXLA:零资源跨语言NLP的强力无监督数据增强框架 (UXLA: A Robust Unsupervised Data Augmentation Framework for Zero-Resource Cross-Lingual NLP)

Transfer learning has yielded state-of-the-art (SoTA) results in many supervised NLP tasks. However, annotated data for every target task in every target language is rare, especially for low-resource languages. We propose UXLA, a novel unsupervised data augmentation framework for zero-resource transfer learning scenarios. In particular, UXLA aims to solve cross-lingual adaptation problems from a source language task distribution to an unknown target language task distribution, assuming no training label in the target language. At its core, UXLA performs simultaneous self-training with data augmentation and unsupervised sample selection. To show its effectiveness, we conduct extensive experiments on three diverse zero-resource cross-lingual transfer tasks. UXLA achieves SoTA results in all the tasks, outperforming the baselines by a good margin. With an in-depth framework dissection, we demonstrate the cumulative contributions of different components to its success.

翻译：转让学习在许多受监督的NLP任务中取得了最先进的(SoTA)成果。然而,每种目标语言的每一项目标任务,特别是低资源语言的附加说明数据很少。我们提议UXLA,这是用于零资源转移学习情景的新颖的、不受监督的数据增强框架。特别是,UXLA旨在解决从源语言任务分配到未知目标语言任务分配的跨语言适应问题,假设没有目标语言的培训标签。在核心方面,UXLA同时进行自我培训,同时进行数据增强和不受监督的抽样选择。为了展示其有效性,我们就三种不同的零资源跨语言转移任务进行了广泛的实验。UXLA在所有任务中都取得了SOTA成果,以良好的幅度比基线表现得更好。我们通过深入的框架分解,展示了不同组成部分的累积性贡献。