Multilingual representations pre-trained with monolingual data exhibit considerably unequal task performances across languages. Previous studies address this challenge with resource-intensive contextualized alignment, which assumes the availability of large parallel data, thereby leaving under-represented language communities behind. In this work, we attribute the data hungriness of previous alignment techniques to two limitations: (i) the inability to sufficiently leverage data and (ii) these techniques are not trained properly. To address these issues, we introduce supervised and unsupervised density-based approaches named Real-NVP and GAN-Real-NVP, driven by Normalizing Flow, to perform alignment, both dissecting the alignment of multilingual subspaces into density matching and density modeling. We complement these approaches with our validation criteria in order to guide the training process. Our experiments encompass 16 alignments, including our approaches, evaluated across 6 language pairs, synthetic data and 5 NLP tasks. We demonstrate the effectiveness of our approaches in the scenarios of limited and no parallel data. First, our supervised approach trained on 20k parallel data (sentences) mostly surpasses Joint-Align and InfoXLM trained on over 100k parallel sentences. Second, parallel data can be removed without sacrificing performance when integrating our unsupervised approach in our bootstrapping procedure, which is theoretically motivated to enforce equality of multilingual subspaces. Moreover, we demonstrate the advantages of validation criteria over validation data for guiding supervised training.
翻译:在这项工作中,我们把先前的调整技术中的数据缺乏归因于两个限制:(一) 无法充分利用数据,以及(二) 这些技术没有经过适当的培训。为了解决这些问题,我们采用了监督和不受监督的基于密度的方法,即Real-NVP和GAN-Real-NVP。 首先,我们在正常化流程的驱动下,在20k平行数据(说明)的驱动下,采用监督和不受监督的方法,以进行统一,既要将多语言子空间的调整分解为密度匹配和密度模型,又要将这些方法与我们的验证标准相配合,以指导培训进程。我们的实验包括16个匹配方法,包括我们的方法,在6对语文、合成数据和5项NLP任务中加以评估。为了解决这些问题,我们展示了我们在有限和没有平行数据的情况下采用的方法的有效性。首先,我们经过监督的关于20k平行数据(说明)的处理方法,大多可以超越联合定位和InfoXLM的匹配方法,在100多语言性指导性测试的平行性数据中,在不牺牲我们具有可持续性的平行性测试性测试性数据时,第二个平行程序可以超越我们的平行数据。