While achieving state-of-the-art results in multiple tasks and languages, translation-based cross-lingual transfer is often overlooked in favour of massively multilingual pre-trained encoders. Arguably, this is due to its main limitations: 1) translation errors percolating to the classification phase and 2) the insufficient expressiveness of the maximum-likelihood translation. To remedy this, we propose a new technique that integrates both steps of the traditional pipeline (translation and classification) into a single model, by treating the intermediate translations as a latent random variable. As a result, 1) the neural machine translation system can be fine-tuned with a variant of Minimum Risk Training where the reward is the accuracy of the downstream task classifier. Moreover, 2) multiple samples can be drawn to approximate the expected loss across all possible translations during inference. We evaluate our novel latent translation-based model on a series of multilingual NLU tasks, including commonsense reasoning, paraphrase identification, and natural language inference. We report gains for both zero-shot and few-shot learning setups, up to 2.7 accuracy points on average, which are even more prominent for low-resource languages (e.g., Haitian Creole). Finally, we carry out in-depth analyses comparing different underlying NMT models and assessing the impact of alternative translations on the downstream performance.
翻译:虽然在多种任务和语言方面实现了最新成果,但翻译为基础的跨语言传输往往被忽略,而偏好于大量多语言的经过培训的事先培训的高级编码器。可以说,这主要是因为其主要局限性:(1) 翻译错误与分类阶段有关,(2) 最大类似翻译的清晰度不足。为了纠正这一点,我们提议了一种新技术,将传统管道(翻译和分类)的两个步骤纳入单一模式,将中间翻译作为潜在的随机变量处理。结果,1)神经机器翻译系统可以与最低风险培训的变式进行微调,其中奖赏是下游任务分类员的准确性。此外,2)可以抽取多种样本,以估计所有可能的翻译在推断过程中的预期损失。我们评估了我们关于多种语言的NLU系列任务的新的潜在翻译模式,包括常识推理、语音识别和自然语言推断。我们报告零镜头和几集的学习设置,平均达到2.7个精准点,其中的奖赏是下游任务分类员的准确性。此外,我们最终对低资源模型和下游分析进行更突出。